Project Expansion Engine: Build an AI-Powered Learning Guide Expander

Goal: You will learn how to turn raw project idea lists into complete, teachable, project-based learning guides that read like a mini-book and ship as ready-to-publish Markdown. You will master instructional design fundamentals (backward design, scaffolding, cognitive load), apply them to build structured learning content, and operationalize the process with a retrieval-augmented generation (RAG) pipeline. By the end, you will be able to build an end-to-end expander that analyzes a source file, enriches it with verified sources, generates detailed project deep-dives, and validates the result with a quality rubric. You will also gain the mental models to reason about quality, depth, and correctness, not just text generation.

Introduction: What This Guide Covers

Project Expansion is the process of transforming a short list of project ideas into a complete learning guide with theory, structured scaffolding, and detailed, buildable projects. In practice, it combines instructional design, technical writing, and automated content generation into a single pipeline.

What you will build (by the end of this guide):

A structured expander that converts idea lists into full-length learning guides
A concept map + prerequisite analyzer that drives project sequencing
A RAG-backed citation system that grounds claims in primary sources
A quality linter that enforces completeness and pedagogical depth
An end-to-end CLI that outputs publish-ready Markdown

Scope (what is included):

Instructional design theory used by your expander
A practical architecture for RAG-powered content generation
A complete set of expansion projects with verification steps

Out of scope (for this guide):

Building a full web UI or authoring platform
Fine-tuning LLMs or training custom foundation models
Creating proprietary datasets or copyrighted content

The Big Picture (Mental Model)

Idea List
   |
   v
[Parse + Normalize] -> [Concept Map] -> [Learning Objectives]
   |                        |                 |
   v                        v                 v
[Source Retrieval] ---> [Theory Primer] -> [Project Drafts]
   |                        |                 |
   v                        v                 v
[Quality Linter]  <----- [Rubric + DoD]  -> [Publishable Guide]

Key Terms You Will See Everywhere

Project Expansion: Turning short project ideas into complete learning guides.
Learning Objective: A measurable outcome describing what the learner can do.
Backward Design: Start with outcomes, then define evidence, then design activities.
Scaffolding: Temporary support to help learners master a task they cannot yet do alone.
RAG (Retrieval-Augmented Generation): Combine retrieval of external sources with generation.
Definition of Done (DoD): A checklist of explicit, testable completion criteria.

How to Use This Guide

Read the Theory Primer first. It builds the mental models you will encode into your expander.
Build the projects in order. Each project adds a layer to the pipeline.
Use the checklists. Every project has a Definition of Done and pitfalls.
Iterate with examples. Feed your tool a real idea list and refine the output.
Measure quality. Use the rubric and linting checks to enforce depth and correctness.

Prerequisites & Background Knowledge

Before starting these projects, you should have foundational understanding in these areas:

Essential Prerequisites (Must Have)

Programming Skills:

Proficiency in at least one language (Python, Go, or TypeScript)
Ability to read and write structured text (Markdown, YAML, JSON)
Comfort with CLI workflows and file system operations

Software Design Fundamentals:

Decomposition into modules, interfaces, and data flow
Basic error handling and logging
Recommended Reading: “Clean Architecture” by Robert C. Martin - Ch. 1-5

Algorithms & Data Structures Basics:

Arrays, maps, graphs, and basic traversal
Recommended Reading: “Algorithms, Fourth Edition” by Sedgewick & Wayne - Ch. 1-4

Helpful But Not Required

Natural Language Processing Basics:

Tokenization, embeddings, and similarity search
Can learn during: Project 3, Project 4

Instructional Design Knowledge:

Learning objectives, scaffolding, and assessment
Can learn during: Theory Primer chapters and Project 2

Self-Assessment Questions

Before starting, ask yourself:

Can I parse and transform Markdown reliably?
Can I design a pipeline with clear stages and outputs?
Can I write or evaluate a rubric with measurable criteria?
Do I understand the difference between “content” and “assessment”?
Can I validate outputs with repeatable tests?

If you answered “no” to questions 1-3: Spend 1-2 weeks reviewing software design and basic parsing.

If you answered “yes” to all 5: You’re ready to begin.

Development Environment Setup

To complete these projects, you’ll need:

Required Tools:

A Linux or macOS environment
Python 3.11+ (or Node 20+)
rg (ripgrep) for fast file search
Git for version control

Recommended Tools:

jq for JSON inspection
pandoc for Markdown conversions
A local vector database (SQLite + sqlite-vss, or an in-memory FAISS index)

Testing Your Setup:

$ python --version
Python 3.11.6

$ rg --version
ripgrep 13.0.0

Time Investment

Simple projects (1, 2): Weekend (4-8 hours each)
Moderate projects (3, 4, 5): 1-2 weeks (10-20 hours each)
Complex projects (6, 7): 2+ weeks (20-40 hours each)
Total sprint: 6-10 weeks if doing all projects sequentially

Important Reality Check

This is a system-design and content-quality challenge, not just a coding exercise. Expect to iterate:

First pass: Build a working pipeline
Second pass: Improve completeness and correctness
Third pass: Tighten outputs with rubrics and automated checks
Fourth pass: Refine for clarity, depth, and teaching quality

Big Picture / Mental Model (Diagram First)

INPUTS                                  CORE PIPELINE                              OUTPUTS

Idea List  ->  Parse/Normalize  ->  Concept Graph  ->  Source Retrieval  ->  Draft
    |               |                    |                  |                 |
    v               v                    v                  v                 v
Existing Guides  Metadata Map     Learning Objectives   Evidence Pack     Structured Guide
    |                                                                            |
    v                                                                            v
Quality Rubric   <-------------------------  Linter  ----------------------  Publishable MD

Theory Primer (Mini-Book)

Chapter 1: Project-Based Learning (PBL) Foundations

Definitions & Key Terms

Project-Based Learning (PBL): A teaching method where learners gain knowledge and skills by investigating and responding to an authentic, complex question or challenge over an extended period (see PBLWorks definition and Gold Standard PBL).
Driving Question: The central problem or question that anchors the project.
Public Product: A final artifact shared with a real audience.

Source note: PBLWorks provides a widely used definition and the Gold Standard PBL elements used in this guide.

PBLWorks defines Gold Standard PBL using seven essential project design elements: a challenging problem or question, sustained inquiry, authenticity, student voice and choice, reflection, critique and revision, and a public product. These elements should shape how you expand and structure projects in your guide.

Mental Model Diagram

Driving Question
      |
      v
Sustained Inquiry -> Evidence -> Iteration -> Public Product
      |                               ^
      v                               |
Authenticity + Voice + Reflection + Critique

How It Works (Step-by-Step)

Start with a real problem. The driving question frames the entire project.
Sustain inquiry. Learners research, test, and refine ideas over time.
Create artifacts. Outputs should be tangible (code, docs, demo).
Expose to feedback. Critique and revision raise quality.
Make it public. A public product increases rigor and accountability.

Trade-offs

Pros: Deep learning, strong transfer, higher engagement.
Cons: Higher upfront design cost, harder assessment, risk of scope creep.

Minimal Concrete Example

Project: "Build a CLI that turns idea lists into full learning guides"
Driving Question: "How can we automate the creation of high-quality learning guides?"
Public Product: A publishable Markdown guide plus a demo walkthrough

Common Misconceptions

“Any project is PBL” (false: PBL requires inquiry, authenticity, and public output)
“PBL is unstructured” (false: PBL is structured but learner-driven)

Check-Your-Understanding Questions

What is the difference between a driving question and a task list?
Which PBL element ensures quality improvement over time?

Where You Will Apply It

Project 1 (Guide Inventory)
Project 4 (Guide Generator)
Project 7 (End-to-End CLI)

Chapter 2: Backward Design (Understanding by Design)

Definitions & Key Terms

Backward Design: Start with desired results, determine acceptable evidence, then design learning experiences.
Learning Objectives: Observable, measurable outcomes.

Wiggins and McTighe describe three stages: (1) identify desired results, (2) determine acceptable evidence, (3) plan learning experiences and instruction. This is the core scaffold for your expander (see Understanding by Design / UbD summaries). Source note: The University of Florida IFAS EDIS summary provides a concise description of the UbD three-stage model.

Mental Model Diagram

Desired Results -> Evidence -> Learning Experiences
      |              |             |
      v              v             v
Objectives      Rubrics/DoD     Projects + Theory

How It Works (Step-by-Step)

Write learning objectives. Use action verbs.
Define evidence. Rubrics, checklists, test outputs.
Design activities. Projects that force evidence to appear.

Trade-offs

Pros: Alignment, clarity, measurable outcomes.
Cons: Can feel rigid if objectives are too narrow.

Minimal Concrete Example

Objective: "Learner can design a RAG pipeline and explain trade-offs."
Evidence: A project with retrieval metrics + failure analysis.
Activity: Build a RAG retriever with evaluation harness.

Common Misconceptions

“Objectives are only for teachers” (false: they guide tool design)
“Evidence is just a quiz” (false: evidence can be real artifacts)

Check-Your-Understanding Questions

What evidence would prove a learner can design a pipeline?
How do objectives change project sequencing?

Where You Will Apply It

Project 2 (Concept Map)
Project 6 (Quality Linter)
Project 7 (CLI Orchestrator)

Chapter 3: Bloom’s Revised Taxonomy and Assessment Evidence

Definitions & Key Terms

Bloom’s revised taxonomy orders cognitive processes: Remember, Understand, Apply, Analyze, Evaluate, Create. The revision (Anderson & Krathwohl, 2001) shifted focus toward action verbs and placed “Create” at the top (see revised Bloom taxonomy summaries). Source note: The University of Delaware revised Bloom taxonomy summary is a concise reference for the 2001 update.

Mental Model Diagram

Remember -> Understand -> Apply -> Analyze -> Evaluate -> Create
   |           |           |         |          |          |
   v           v           v         v          v          v
Recall     Explain     Use     Break apart  Judge       Build

How It Works (Step-by-Step)

Tag each section with the intended cognitive level.
Ensure projects reach “Create” by requiring real artifacts.
Align rubrics with the level (e.g., evaluate = justify trade-offs).

Trade-offs

Pros: Ensures depth and progression.
Cons: Can oversimplify messy learning paths.

Minimal Concrete Example

Remember: Define RAG
Apply: Implement a retriever
Analyze: Compare chunking strategies
Evaluate: Justify retrieval metrics
Create: Build an end-to-end expander CLI

Common Misconceptions

“Bloom is linear” (false: learners jump levels)
“Create is only art” (false: design and systems architecture count)

Check-Your-Understanding Questions

Which taxonomy level best fits “compare chunking strategies”?
How would you assess “evaluate” in a project?

Where You Will Apply It

Project 4 (Guide Generator)
Project 6 (Quality Linter)

Chapter 4: Scaffolding, ZPD, and Cognitive Load

Definitions & Key Terms

Scaffolding: Temporary support that enables learners to do tasks they cannot yet do alone. Wood, Bruner, and Ross introduced the term in 1976 (see instructional scaffolding overviews).
Zone of Proximal Development (ZPD): The gap between what a learner can do alone and what they can do with support (see ZPD summaries).
Cognitive Load Theory: Instruction should manage intrinsic load, reduce extraneous load, and promote germane load (see cognitive load theory overviews).

Source note: Cognitive load theory distinguishes intrinsic load (task complexity), extraneous load (inefficient presentation), and germane load (schema construction). Teaching center summaries provide applied guidance for these categories.

Mental Model Diagram

Current Skill -------- ZPD -------- Target Skill
      |                  |                |
      v                  v                v
  Independent      With Scaffolds     Independent

How It Works (Step-by-Step)

Break tasks into steps. Reduce intrinsic load.
Remove distractions. Reduce extraneous load.
Add prompts and examples. Increase germane load.
Fade supports. Learner becomes independent.

Trade-offs

Pros: Prevents overwhelm, improves retention.
Cons: Over-scaffolding can reduce autonomy.

Minimal Concrete Example

Layered hints:
Hint 1: Show file layout
Hint 2: Provide pseudocode
Hint 3: Give a minimal code snippet

Common Misconceptions

“Scaffolding is hand-holding” (false: it is temporary and faded)
“More detail is always better” (false: overload kills learning)

Check-Your-Understanding Questions

What is the difference between intrinsic and extraneous load?
When should scaffolds be removed?

Where You Will Apply It

Project 4 (Guide Generator)
Project 5 (Project Expander)
Project 6 (Quality Linter)

Chapter 5: Experiential Learning Cycle (Kolb)

Definitions & Key Terms

Kolb’s cycle: Concrete Experience -> Reflective Observation -> Abstract Conceptualization -> Active Experimentation (see experiential learning cycle summaries). Source note: University resources on Kolb’s experiential learning cycle provide the four-stage model used here.

Mental Model Diagram

Experience -> Reflect -> Concept -> Experiment -> Experience

How It Works (Step-by-Step)

Experience: Build something concrete.
Reflect: Analyze what worked and failed.
Conceptualize: Extract general principles.
Experiment: Apply new understanding.

Trade-offs

Pros: Strong retention, practical transfer.
Cons: Slower than lecture-only approaches.

Minimal Concrete Example

Build a parser -> Review failures -> Formalize rules -> Improve parser

Common Misconceptions

“Reflection is optional” (false: reflection is core to learning)
“Cycle must be linear” (false: entry can be at any point)

Check-Your-Understanding Questions

Which step turns experience into general knowledge?
How does this cycle map to project iteration?

Where You Will Apply It

Project 7 (End-to-End CLI)
Project 6 (Quality Linter)

Chapter 6: Retrieval-Augmented Generation (RAG) for Grounded Content

Definitions & Key Terms

RAG: Combine retrieval from external sources with generation to produce grounded outputs. RAG was formalized as a model combining parametric and non-parametric memory in 2020 (see the RAG paper and modern RAG pipeline docs).
Indexing: Load, split, embed, and store documents.
Retrieval: Search the index for relevant chunks at runtime.

Source note: The 2020 RAG paper introduced a model that pairs a parametric generator with non-parametric retrieval, while modern RAG guides (e.g., LangChain) describe indexing as load -> split -> embed -> store.

Mental Model Diagram

Query -> Retrieve -> Context Pack -> Generate -> Answer + Citations

How It Works (Step-by-Step)

Index sources. Split documents, store embeddings.
Retrieve at query time. Select top-k relevant chunks.
Generate with context. The model cites provided evidence.
Validate citations. Ensure claims are traceable.

Trade-offs

Pros: Up-to-date, domain-specific, grounded answers.
Cons: Retrieval errors propagate to generation.

Minimal Concrete Example

Input: "Define scaffolding"
Retrieve: Pedagogy sources
Generate: Definition with cited sources

Common Misconceptions

“RAG removes hallucinations” (false: it reduces but does not eliminate)
“More context is always better” (false: context noise reduces quality)

Check-Your-Understanding Questions

Why is chunking strategy critical?
How do you validate a citation is real?

Where You Will Apply It

Project 3 (Source Retrieval)
Project 4 (Guide Generator)
Project 5 (Project Expander)

Chapter 7: Quality Assurance for Learning Guides

Definitions & Key Terms

Rubric: A scored set of criteria to judge completeness and quality.
Linting: Automated validation of structure, sections, and evidence.
Definition of Done (DoD): A checklist that proves completion.

Mental Model Diagram

Draft -> Lint -> Rubric Score -> Fix -> Publish

How It Works (Step-by-Step)

Define rubric criteria. Depth, accuracy, structure.
Write lint checks. Missing sections, empty tables, no outputs.
Score and flag. Enforce minimum thresholds.

Trade-offs

Pros: Consistency, repeatability.
Cons: Can over-penalize creative outputs.

Minimal Concrete Example

Rule: "Every project must include Definition of Done"
Lint: Fail if missing or empty

Common Misconceptions

“Quality is subjective” (false: many aspects are measurable)
“Rubrics reduce creativity” (false: they clarify expectations)

Check-Your-Understanding Questions

What should be in a DoD checklist?
Which errors should block publishing?

Where You Will Apply It

Project 6 (Quality Linter)
Project 7 (End-to-End CLI)

Glossary (High-Signal)

Artifact: A tangible output (code, report, dataset, demo) produced by a project.
Chunking: Splitting documents into smaller pieces for retrieval.
Concept Map: A graph of ideas and dependencies.
Driving Question: The central inquiry that defines a project.
Evidence Pack: Curated sources used to justify factual claims.
Hint Layering: Progressive assistance from broad to specific.
Public Product: Final output shared beyond the author.
Rubric: Scored criteria for evaluating outputs.
Scaffolding: Temporary support that is gradually removed.
ZPD: The skill zone where support enables growth.

Why Project Expansion Matters

Modern project-based learning is highly effective because it aligns authentic work with deep inquiry and public output. A 2023 meta-analysis of 66 experimental and quasi-experimental studies (190 effect sizes) reported positive effects of PBL on student learning outcomes, including academic achievement and thinking skills. The same study reported stronger effects in engineering/technology subjects, in lab-oriented classes, with small group sizes (4-5 learners), and with project durations around 9-18 weeks.

Well-designed PBL uses elements like sustained inquiry, authenticity, critique, and public product to increase engagement and quality. These elements also translate directly into better learning guides.

Old Approach vs New Approach

OLD APPROACH                         NEW APPROACH
Short idea list                      Full learning guide
No objectives                        Clear outcomes + evidence
No scaffolding                       Layered hints + DoD
No sources                           Grounded citations

Sources & Evidence Pack (Suggested)

PBLWorks Gold Standard PBL design elements: https://www.pblworks.org/for/gold_standard_pbl
PBLWorks Gold Standard blog explainer: https://www.pblworks.org/blog/gold_standard_pbl_essential_project_design_elements
Understanding by Design (UbD) stages summary: https://edis.ifas.ufl.edu/publication/WC322
Revised Bloom taxonomy (2001) summary: https://www1.udel.edu/educ/gottfredson/451/revisedbloom
Cognitive Load Theory overview: https://www.sfasu.edu/ctl/resources/learning-design/cognitive-load
Instructional scaffolding overview: https://en.wikipedia.org/wiki/Instructional_scaffolding
Zone of Proximal Development overview: https://www.nysed.gov/bilingual-ed/topic-brief-4-zone-proximal-development-affirmative-perspective-teaching-ells-and-mls
Kolb experiential learning cycle: https://tlc.uthsc.edu/experiential-learning/
RAG paper (Lewis et al., 2020): https://arxiv.org/abs/2005.11401
RAG pipeline concepts: https://python.langchain.com/docs/tutorials/rag/
PBL meta-analysis (Zhang & Ma, 2023): https://doaj.org/article/1b9babf4d78a4868918d0ab4224004a0

Concept Summary Table

Concept	What You Must Internalize	Where It Appears
PBL Elements	Driving question, inquiry, authenticity, public product	Projects 1, 4, 7
Backward Design	Outcomes -> evidence -> activities	Projects 2, 6
Bloom’s Taxonomy	Depth progression and assessment levels	Projects 4, 6
Scaffolding + ZPD	Support and fade strategy	Projects 4, 5
Cognitive Load	Manage intrinsic/extraneous/germane	Projects 4, 5
Kolb Cycle	Experience -> reflection -> concept -> experiment	Projects 6, 7
RAG Pipeline	Index -> retrieve -> generate -> cite	Projects 3, 4, 5
QA Rubrics	Linting + measurable completeness	Projects 6, 7

Project-to-Concept Map

Project	Core Concepts
1. Guide Inventory & Diff Scanner	PBL elements, backward design
2. Concept Graph & Prerequisite Mapper	Backward design, Bloom taxonomy
3. Source Retrieval & Citation Packager	RAG pipeline, grounding
4. Template-Driven Guide Generator	Scaffolding, cognitive load
5. Project Deep-Dive Expander	Scaffolding, Bloom taxonomy
6. Quality Linter & Rubric Scorer	QA, evidence, assessment
7. End-to-End CLI Orchestrator	Kolb cycle, system integration

Deep Dive Reading by Concept

Concept	Book	Chapter(s)	Why This Matters
Learning objectives	“Understanding by Design” (Wiggins & McTighe)	Ch. 1-3	Backward design for objectives and evidence
Scaffolding and load	“How People Learn” (Bransford et al.)	Ch. 2-4	How learners build knowledge
Evaluation and rubrics	“Classroom Assessment Techniques” (Angelo & Cross)	Ch. 1-2	Practical assessment design
Software architecture	“Clean Architecture” (Martin)	Ch. 1-5	Pipeline modularity and boundaries
Tooling workflows	“The Pragmatic Programmer” (Hunt & Thomas)	Ch. 3-5	Automation and feedback loops

Quick Start (First 48 Hours)

Day 1: Understanding the pipeline

Read Chapters 1-3 of the Theory Primer
Sketch the pipeline diagram on paper
Define 3 learning objectives for a sample guide

Day 2: First working output

Build a minimal parser that loads an idea list
Create a single expanded project template
Generate one project with real output and DoD

Recommended Learning Paths

Path A: Educator or Instructional Designer

Chapters 1-5 (PBL, backward design, Bloom, scaffolding)
Projects 1, 2, 4, 6

Path B: LLM Engineer / Tool Builder

Chapters 6-7 (RAG, QA)
Projects 3, 4, 5, 7

Path C: Full-Stack Builder

Read everything
Build projects in order 1-7

Success Metrics

You can generate a guide with all required sections in < 2 minutes
Each project has exact output examples and a DoD checklist
Every factual claim is grounded in a source pack
Linter catches missing sections and empty tables
At least 3 example guides pass the rubric with > 90% score

Optional Appendices

Appendix A: Guide Section Checklist

Goal
Introduction + big picture diagram
How to use this guide
Prerequisites + self-assessment
Theory primer chapters
Glossary
Why topic matters + stats
Concept summary table
Project-to-concept map
Reading list by concept
Quick start + learning paths
Success metrics

Appendix B: Quality Rubric (Sample)

Criterion	Description	Pass Threshold
Completeness	All required sections present	100%
Depth	Each concept has definitions + diagram + example	90%
Grounding	Claims trace to citations	95%
Practicality	Every project has runnable output	100%

Projects

Project 1: Guide Inventory & Diff Scanner

Build a tool that parses a directory of guides and identifies missing or inconsistent sections.

Real World Outcome

You can run a command like:

$ expander scan ./project_based_ideas

[scan] 427 guides found
[scan] 112 missing: "Concept Summary Table"
[scan] 89 missing: "Project-to-Concept Map"
[scan] 37 missing: "Definition of Done" sections
[scan] Report written to reports/guide-audit-2025-12-31.json

The Core Question You’re Answering

How can we systematically detect gaps in learning guides so we know exactly what to expand?

Concepts You Must Understand First

Markdown parsing and ASTs
Section normalization and templates
Backward design evidence mapping
Reading: “Clean Architecture” Ch. 1-3

Questions to Guide Your Design

How will you detect headings with inconsistent names?
What is your minimal schema for a “complete” guide?
How do you handle partial or empty sections?

Thinking Exercise

Sketch a schema that represents required sections as a JSON structure. Then manually map two existing guides into that schema and list the differences.

The Interview Questions They’ll Ask

How did you normalize headings across inconsistent files?
What strategy did you use for partial matches?
How do you minimize false positives in linting?
How would you scale this to thousands of files?

Hints in Layers

Hint 1: Start with a heading index.

if line.startswith("#"):
    headings.append(line.strip("# "))

Hint 2: Normalize text to compare.

def norm(s):
    return " ".join(s.lower().split())

Hint 3: Build a schema validator.

missing = [h for h in REQUIRED if h not in headings]

Books That Will Help

Book	Chapter	Why
“Clean Code”	Ch. 3	Parsing and normalization clarity
“Clean Architecture”	Ch. 1-3	Separation of concerns
“Refactoring”	Ch. 2	Improving parsing logic

Common Pitfalls & Debugging

Problem: “Heading not detected”

Why: Markdown heading uses ## but parser expects #
Fix: Accept 1-6 hash levels
Quick test: Parse a file with multiple heading levels

Problem: “False missing sections”

Why: Section is present but heading is renamed
Fix: Add synonyms mapping
Quick test: Add a custom mapping for “Learning Goals” -> “Goal”

Definition of Done

Scanner reports missing sections with correct counts
Normalizes headings reliably across at least 10 guides
Produces JSON report with file-level diagnostics
Handles empty sections without crashing

Project 2: Concept Graph & Prerequisite Mapper

Build a concept map engine that extracts concepts from a guide and infers prerequisite relationships.

Real World Outcome

$ expander concepts ./project_based_ideas/LEARN_LLM_MEMORY.md

[concepts] 18 concepts found
[graph] 27 edges inferred
[graph] Wrote graph to graphs/learn_llm_memory.dot

The Core Question You’re Answering

How do we decide the order in which concepts should appear to maximize learning?

Concepts You Must Understand First

Learning objectives and backward design
Bloom taxonomy levels
Graph modeling (nodes, edges)
Reading: “Algorithms, Fourth Edition” Ch. 4 (graphs)

Questions to Guide Your Design

How will you extract concept candidates from text?
How will you infer dependencies without hallucination?
How do you represent uncertainty in the graph?

Thinking Exercise

Take 5 concepts (RAG, retrieval, embeddings, chunking, evaluation) and draw a dependency graph. Explain each edge in one sentence.

The Interview Questions They’ll Ask

How do you avoid cycles in prerequisite graphs?
What is your confidence scoring method?
How do you validate the graph is pedagogically sound?
How would you use the graph to reorder sections?

Hints in Layers

Hint 1: Start with keyword extraction.

concepts = set(re.findall(r"\b[A-Z][a-z]+\b", text))

Hint 2: Add a manual synonym map.

ALIASES = {"RAG": "Retrieval Augmented Generation"}

Hint 3: Use heuristics for edges.

if "embeddings" in sentence and "retrieval" in sentence:
    add_edge("Embeddings", "Retrieval")

Books That Will Help

Book	Chapter	Why
“Algorithms, Fourth Edition”	Ch. 4	Graph modeling
“Clean Architecture”	Ch. 4	Dependency rules
“Design Patterns”	Ch. 1	Reusable graph patterns

Common Pitfalls & Debugging

Problem: “Concept explosion”

Why: Overly broad extraction
Fix: Apply a whitelist or frequency threshold
Quick test: Limit to top-30 terms

Problem: “Cycles everywhere”

Why: Naive dependency inference
Fix: Add direction rules and prune low-confidence edges
Quick test: Run cycle detection and log removals

Definition of Done

Extracts concepts with <= 20% noise on sample guides
Produces a DOT or JSON graph with edges
Includes confidence scores per edge
Supports manual overrides for critical concepts

Project 3: Source Retrieval & Citation Packager (RAG)

Build a RAG pipeline that retrieves authoritative sources and bundles them for downstream generation.

Real World Outcome

$ expander sources --topic "project-based learning" --limit 8

[sources] 8 sources retrieved
[sources] 5 primary sources, 3 secondary
[sources] Pack saved: packs/pbl-2025-12-31.json

The Core Question You’re Answering

How do we ground generated content in verifiable sources without overwhelming the model?

Concepts You Must Understand First

RAG pipeline (indexing + retrieval)
Chunking and similarity search
Citation formatting
Reading: “The Pragmatic Programmer” Ch. 3-4

Questions to Guide Your Design

What qualifies as a “primary” source?
How do you prevent duplicate or low-quality sources?
How do you store the evidence pack so it is reusable?

Thinking Exercise

Take three sources about PBL and extract one sentence each that could be cited in a guide. Rank them by authority.

The Interview Questions They’ll Ask

How do you evaluate retrieval quality?
How do you handle outdated or conflicting sources?
What is your chunking strategy and why?
How do you stop citation drift?

Hints in Layers

Hint 1: Store sources with minimal schema.

{"title": "...", "url": "...", "year": 2023, "quotes": []}

Hint 2: Use a two-stage filter.

if domain in TRUSTED and year >= 2018:
    keep(source)

Hint 3: Add per-claim citation links.

claim["citations"].append(source_id)

Books That Will Help

Book	Chapter	Why
“Clean Code”	Ch. 4	Structuring data pipelines
“Refactoring”	Ch. 3	Improving retrieval logic
“Fundamentals of Software Architecture”	Ch. 8	Designing pipelines

Common Pitfalls & Debugging

Problem: “Sources are low quality”

Why: Unfiltered web search
Fix: Add domain allowlist + manual curation
Quick test: Log source domains and counts

Problem: “Citation mismatch”

Why: Claim text doesn’t align to retrieved evidence
Fix: Store snippet offsets with citations
Quick test: Validate citations during generation

Definition of Done

Retrieves at least 5 high-quality sources per topic
Stores a reusable evidence pack in JSON
Includes citations with snippets and dates
Supports domain allowlists and blacklists

Project 4: Template-Driven Guide Generator

Build a generator that converts an idea list + evidence pack into a structured mini-book guide.

Real World Outcome

$ expander generate ./ideas/LLM_MEMORY.md --evidence packs/llm-memory.json

[generate] Sections written: 14
[generate] Projects expanded: 7
[generate] Output: docs/LLM_MEMORY.md

The Core Question You’re Answering

How do we guarantee consistent, high-depth guides regardless of input quality?

Concepts You Must Understand First

Scaffolding and cognitive load
Backward design alignment
Template systems and slot filling
Reading: “Clean Architecture” Ch. 6-8

Questions to Guide Your Design

How do you prevent template sections from becoming generic?
Which sections require sourced facts vs. original synthesis?
How do you enforce exact output examples?

Thinking Exercise

Take one project idea and write the “Real World Outcome” as if the CLI already exists. Include realistic output.

The Interview Questions They’ll Ask

How do you validate that each section has depth?
How do you handle missing prerequisites?
How do you avoid contradictory content across sections?
How do you support different domains (ML vs systems)?

Hints in Layers

Hint 1: Use a strict template.

## Introduction
[definition]
[scope]

Hint 2: Create a section completeness check.

assert "## Glossary" in output

Hint 3: Inject evidence into key claims.

content = insert_citations(content, evidence_pack)

Books That Will Help

Book	Chapter	Why
“Clean Architecture”	Ch. 6-8	Boundary and data flow design
“Code Complete”	Ch. 5	Structured construction
“Refactoring”	Ch. 1	Clean generator code

Common Pitfalls & Debugging

Problem: “Sections are shallow”

Why: Template placeholders not expanded
Fix: Enforce minimum word counts + required subsections
Quick test: Run a length check per section

Problem: “Output missing CLI examples”

Why: No detection of project type
Fix: Classify project type and inject examples
Quick test: Run a diff to ensure examples exist

Definition of Done

Generates all required sections in correct order
Inserts citations for all factual claims
Ensures each project has real-world output examples
Produces publishable Markdown without manual fixes

Project 5: Project Deep-Dive Expander

Build a module that expands each project into a full deep-dive file with theory, steps, and validation.

Real World Outcome

$ expander expand-projects ./ideas/LLM_MEMORY.md

[expand] 7 projects detected
[expand] Wrote: ./ideas/LLM_MEMORY/P01-memory-cache.md
[expand] Wrote: ./ideas/LLM_MEMORY/P02-embeddings-store.md
[expand] ...

The Core Question You’re Answering

How do we turn a short project description into a fully teachable, step-by-step learning unit?

Concepts You Must Understand First

Scaffolding with layered hints
Bloom taxonomy depth
Backward design evidence alignment
Reading: “The Pragmatic Programmer” Ch. 5-7

Questions to Guide Your Design

How many steps should a deep-dive include?
How do you link a project back to the theory chapters?
How do you prevent duplication across projects?

Thinking Exercise

Take a 3-line project idea and expand it into:

A core question
4 milestones
3 pitfalls
A DoD checklist

The Interview Questions They’ll Ask

How do you choose the correct level of detail?
How do you prevent repetition across projects?
How do you structure layered hints?
How would you add diagrams automatically?

Hints in Layers

Hint 1: Use a consistent project scaffold.

## Real World Outcome
## Core Question
## Definition of Done

Hint 2: Add a milestone structure.

M1: Parse
M2: Generate
M3: Validate

Hint 3: Auto-link to primer sections.

project["concepts"].append("RAG Pipeline")

Books That Will Help

Book	Chapter	Why
“Clean Code”	Ch. 7	Clear, reusable templates
“Refactoring”	Ch. 4	Iterative improvement
“Fundamentals of Software Architecture”	Ch. 10	Decomposition strategy

Common Pitfalls & Debugging

Problem: “Projects feel copy-pasted”

Why: Template reuse without adaptation
Fix: Inject project-specific outcomes and tools
Quick test: Compare first 200 words across projects

Problem: “No measurable outcome”

Why: Missing DoD or output example
Fix: Force a Real World Outcome section
Quick test: Lint for output code blocks

Definition of Done

Each project expanded into its own file
Includes concept references and unique outcomes
Contains pitfalls, hints, and DoD
Output is publishable without manual edits

Project 6: Quality Linter & Rubric Scorer

Build a quality checker that scores guides against a rubric and blocks publishing if thresholds fail.

Real World Outcome

$ expander lint ./docs/LLM_MEMORY.md

[lint] Completeness: 100%
[lint] Depth: 92%
[lint] Grounding: 96%
[lint] Practicality: 100%
[lint] PASS (min 90%)

The Core Question You’re Answering

How do we enforce consistent quality across hundreds of generated guides?

Concepts You Must Understand First

Rubric design and scoring
Evidence requirements
Automated validation
Reading: “Clean Architecture” Ch. 12

Questions to Guide Your Design

What is your minimum acceptable score per category?
Which errors should be fatal vs warnings?
How do you measure “depth” automatically?

Thinking Exercise

Define five rubric criteria and write one automatic check for each.

The Interview Questions They’ll Ask

How do you quantify qualitative metrics?
How do you avoid gaming the rubric?
How do you integrate linting into CI?
What do you do when sources conflict?

Hints in Layers

Hint 1: Create a rubric schema.

{"completeness": 0.3, "depth": 0.3, "grounding": 0.2, "practicality": 0.2}

Hint 2: Check for required sections.

for h in REQUIRED:
    if h not in headings: score -= 10

Hint 3: Add per-section length checks.

if len(section_text) < 300: warn("shallow")

Books That Will Help

Book	Chapter	Why
“Refactoring”	Ch. 6	Improving validation logic
“Code Complete”	Ch. 10	Defensive checks
“The Pragmatic Programmer”	Ch. 7	Automation mindset

Common Pitfalls & Debugging

Problem: “False fails”

Why: Overly strict thresholds
Fix: Tune rubric with real examples
Quick test: Re-score 5 known-good guides

Problem: “Shallow but passes”

Why: Only length checks
Fix: Add semantic checks (keywords, subheadings)
Quick test: Require diagrams in theory primer

Definition of Done

Rubric scores are reproducible
Linter blocks missing critical sections
Produces a summary report with diagnostics
Integrates with CLI pipeline

Project 7: End-to-End CLI Orchestrator

Combine all modules into a single CLI that expands guides from idea list to publish-ready output.

Real World Outcome

$ expander run ./ideas/LLM_MEMORY.md --out ./docs

[run] Parsed ideas: 9
[run] Concept graph built
[run] Evidence pack retrieved
[run] Guide generated: docs/LLM_MEMORY.md
[run] 9 projects expanded into ./ideas/LLM_MEMORY/
[run] Quality score: 94% (PASS)

The Core Question You’re Answering

How do we orchestrate a reliable, repeatable pipeline that produces publishable guides at scale?

Concepts You Must Understand First

Pipeline orchestration
Dependency management
Logging and error handling
Reading: “Fundamentals of Software Architecture” Ch. 12

Questions to Guide Your Design

How do you ensure deterministic outputs?
How do you cache results between steps?
What is your rollback strategy when a step fails?

Thinking Exercise

Design the CLI command structure for all pipeline steps. Identify which steps can be skipped when cached.

The Interview Questions They’ll Ask

How do you orchestrate tasks safely?
How do you make the pipeline reproducible?
What metrics do you log for observability?
How do you validate end-to-end correctness?

Hints in Layers

Hint 1: Start with a task graph.

scan -> concepts -> sources -> generate -> expand -> lint

Hint 2: Add caching by file hash.

cache_key = sha256(input_text)

Hint 3: Use structured logging.

log.info("stage=generate status=ok duration_ms=1200")

Books That Will Help

Book	Chapter	Why
“Fundamentals of Software Architecture”	Ch. 12	Pipeline orchestration
“Clean Architecture”	Ch. 14	Dependency direction
“The Pragmatic Programmer”	Ch. 8	Automation + tooling

Common Pitfalls & Debugging

Problem: “Pipeline is flaky”

Why: Uncontrolled randomness in generation
Fix: Fix seeds and store prompts
Quick test: Re-run pipeline and compare hashes

Problem: “Slow retrieval”

Why: No caching or parallelism
Fix: Cache evidence packs and parallelize retrieval
Quick test: Time retrieval with and without cache

Definition of Done

One command runs the entire pipeline
Outputs are deterministic across runs
Each stage logs duration and status
Pipeline fails fast on critical errors