Project 2: Minimal ReAct Agent
Build a ReAct-style agent that loops through Thought -> Action -> Observation, updates state, and stops under explicit termination rules.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 2: Intermediate |
| Time Estimate | 10–16 hours |
| Language | Python or JavaScript |
| Prerequisites | Project 1, tool calling basics |
| Key Topics | agent loops, state, termination, trace logging |
Learning Objectives
By completing this project, you will:
- Implement a closed-loop agent cycle (Think → Act → Observe).
- Maintain explicit agent state across steps.
- Define termination rules to stop safely.
- Summarize observations to avoid context overflow.
- Log a trace that makes reasoning auditable.
The Core Question You’re Answering
“How does an agent use feedback to adapt its next action instead of guessing once?”
This is the minimal loop that separates an agent from a pipeline. Without this loop, there is no real agentic behavior.
Concepts You Must Understand First
| Concept | Why It Matters | Where to Learn |
|---|---|---|
| ReAct pattern | Interleaves reasoning + actions | ReAct paper |
| State vs observation | Prevents repeated actions | Agent design blogs |
| Termination conditions | Stops runaway loops | LangChain docs |
| Tool schemas | Actions must be validated | Pydantic/Zod docs |
Theoretical Foundation
ReAct Loop as Control System
Goal -> Think -> Act -> Observe -> Update State -> Think -> ...
Key properties:
- Feedback-driven: each step uses real observations.
- Bounded: stops on success or step budget.
- Auditable: trace shows why actions happened.
Project Specification
What You’ll Build
A CLI agent that solves multi-step tasks such as:
“Find and summarize the three largest markdown files in /docs.”
Functional Requirements
- Thought → Action → Observation loop
- State object with facts + action history
- Termination rules: success, max steps, loop detection
- Observation summarization
- JSONL trace log
Non-Functional Requirements
- Deterministic mode for tests
- Explicit error handling on tool failures
- Trace replay support
Real World Outcome
Example run:
$ python react_agent.py --goal "Find the three largest markdown files in /docs"
Step 1: list_files -> 47 results
Step 2: get_file_sizes -> top 3 found
Step 3: read_file -> ARCHITECTURE.md
Step 4: read_file -> API_GUIDE.md
Step 5: read_file -> TUTORIAL.md
Step 6: summarize -> report created
Trace entry example:
{"step": 3, "thought": "Read the largest file", "action": "read_file", "observation": "Read 450KB", "state_diff": {"files_read": 1}}
Architecture Overview
┌──────────────┐ thoughts ┌───────────────┐
│ LLM Brain │────────────▶│ Action Plan │
└──────┬───────┘ └──────┬────────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Tool Execute │────────────▶│ Observation │
└──────┬───────┘ └──────┬───────┘
│ │
▼ ▼
┌────────────────┐ ┌─────────────────┐
│ State Update │◀────────│ Trace Logger │
└────────────────┘ └─────────────────┘
Implementation Guide
Phase 1: Loop Skeleton (3–4h)
- Implement a fixed loop with max steps
- Hardcode action selection once
- Checkpoint: loop logs steps
Phase 2: Tool Selection + State (4–6h)
- Generate actions from LLM
- Track action history and facts
- Checkpoint: agent completes 3-step task
Phase 3: Termination + Summaries (3–6h)
- Add loop detection and step budget
- Summarize large observations
- Checkpoint: trace file is replayable
Common Pitfalls & Debugging
| Pitfall | Symptom | Fix |
|---|---|---|
| Infinite loops | repeated actions | detect repeated action history |
| Context overflow | model forgets | summarize observations |
| Silent failures | missing tool errors | log tool error separately |
Interview Questions They’ll Ask
- How does ReAct differ from chain-of-thought prompting?
- Why is explicit state tracking necessary?
- How do you prevent infinite loops?
Hints in Layers
- Hint 1: Implement the loop with a max step budget.
- Hint 2: Store actions_taken and block repeats.
- Hint 3: Summarize observations to preserve context.
- Hint 4: Log each step as JSONL for replay.
Learning Milestones
- Loop Works: three steps logged correctly.
- Stateful: actions depend on observations.
- Safe: termination rules prevent runaway loops.
Submission / Completion Criteria
Minimum Completion
- Working ReAct loop
- Basic trace log
Full Completion
- Termination rules
- Observation summarization
Excellence
- Replay mode
- Parallel tool calls
This guide was generated from project_based_ideas/AI_AGENTS_LLM_RAG/AI_AGENTS_PROJECTS.md.