Project 2: Minimal ReAct Agent

Build a ReAct-style agent that loops through Thought -> Action -> Observation, updates state, and stops under explicit termination rules.


Quick Reference

Attribute Value
Difficulty Level 2: Intermediate
Time Estimate 10–16 hours
Language Python or JavaScript
Prerequisites Project 1, tool calling basics
Key Topics agent loops, state, termination, trace logging

Learning Objectives

By completing this project, you will:

  1. Implement a closed-loop agent cycle (Think → Act → Observe).
  2. Maintain explicit agent state across steps.
  3. Define termination rules to stop safely.
  4. Summarize observations to avoid context overflow.
  5. Log a trace that makes reasoning auditable.

The Core Question You’re Answering

“How does an agent use feedback to adapt its next action instead of guessing once?”

This is the minimal loop that separates an agent from a pipeline. Without this loop, there is no real agentic behavior.


Concepts You Must Understand First

Concept Why It Matters Where to Learn
ReAct pattern Interleaves reasoning + actions ReAct paper
State vs observation Prevents repeated actions Agent design blogs
Termination conditions Stops runaway loops LangChain docs
Tool schemas Actions must be validated Pydantic/Zod docs

Theoretical Foundation

ReAct Loop as Control System

Goal -> Think -> Act -> Observe -> Update State -> Think -> ...

Key properties:

  • Feedback-driven: each step uses real observations.
  • Bounded: stops on success or step budget.
  • Auditable: trace shows why actions happened.

Project Specification

What You’ll Build

A CLI agent that solves multi-step tasks such as:

“Find and summarize the three largest markdown files in /docs.”

Functional Requirements

  1. Thought → Action → Observation loop
  2. State object with facts + action history
  3. Termination rules: success, max steps, loop detection
  4. Observation summarization
  5. JSONL trace log

Non-Functional Requirements

  • Deterministic mode for tests
  • Explicit error handling on tool failures
  • Trace replay support

Real World Outcome

Example run:

$ python react_agent.py --goal "Find the three largest markdown files in /docs"

Step 1: list_files -> 47 results
Step 2: get_file_sizes -> top 3 found
Step 3: read_file -> ARCHITECTURE.md
Step 4: read_file -> API_GUIDE.md
Step 5: read_file -> TUTORIAL.md
Step 6: summarize -> report created

Trace entry example:

{"step": 3, "thought": "Read the largest file", "action": "read_file", "observation": "Read 450KB", "state_diff": {"files_read": 1}}

Architecture Overview

┌──────────────┐   thoughts   ┌───────────────┐
│  LLM Brain   │────────────▶│  Action Plan  │
└──────┬───────┘              └──────┬────────┘
       │                             │
       ▼                             ▼
┌──────────────┐              ┌──────────────┐
│ Tool Execute │────────────▶│ Observation  │
└──────┬───────┘              └──────┬───────┘
       │                             │
       ▼                             ▼
┌────────────────┐          ┌─────────────────┐
│ State Update   │◀────────│ Trace Logger     │
└────────────────┘          └─────────────────┘

Implementation Guide

Phase 1: Loop Skeleton (3–4h)

  • Implement a fixed loop with max steps
  • Hardcode action selection once
  • Checkpoint: loop logs steps

Phase 2: Tool Selection + State (4–6h)

  • Generate actions from LLM
  • Track action history and facts
  • Checkpoint: agent completes 3-step task

Phase 3: Termination + Summaries (3–6h)

  • Add loop detection and step budget
  • Summarize large observations
  • Checkpoint: trace file is replayable

Common Pitfalls & Debugging

Pitfall Symptom Fix
Infinite loops repeated actions detect repeated action history
Context overflow model forgets summarize observations
Silent failures missing tool errors log tool error separately

Interview Questions They’ll Ask

  1. How does ReAct differ from chain-of-thought prompting?
  2. Why is explicit state tracking necessary?
  3. How do you prevent infinite loops?

Hints in Layers

  • Hint 1: Implement the loop with a max step budget.
  • Hint 2: Store actions_taken and block repeats.
  • Hint 3: Summarize observations to preserve context.
  • Hint 4: Log each step as JSONL for replay.

Learning Milestones

  1. Loop Works: three steps logged correctly.
  2. Stateful: actions depend on observations.
  3. Safe: termination rules prevent runaway loops.

Submission / Completion Criteria

Minimum Completion

  • Working ReAct loop
  • Basic trace log

Full Completion

  • Termination rules
  • Observation summarization

Excellence

  • Replay mode
  • Parallel tool calls

This guide was generated from project_based_ideas/AI_AGENTS_LLM_RAG/AI_AGENTS_PROJECTS.md.