Project 6: Tool Router

Project 6: Tool Router

Build a production-grade AI agent system that safely bridges natural language and programmatic function calls

Quick Reference

Attribute Value
Difficulty Expert
Time Estimate 1-2 weeks
Language TypeScript (Alternatives: Python)
Prerequisites Projects 1-2, JSON Schema knowledge, API design experience
Key Topics Tool calling, function schemas, agent loops, error recovery
Knowledge Area Tool Use / Agent Reliability
Software/Tool OpenAI Function Calling / Anthropic Tool Use APIs
Main Book โ€œClean Codeโ€ by Robert Martin (Ch. 8: Boundaries)
Coolness Level Level 4: โ€œOh wow, thatโ€™s realโ€
Business Potential 4. The โ€œOpen Coreโ€ Infrastructure

1. Learning Objectives

By completing this project, you will:

  1. Master Function Calling APIs: Understand how LLMs output structured tool calls instead of text
  2. Design Tool Contracts: Create JSON Schema definitions that act as type-safe interfaces
  3. Implement Agent Loops: Build the observe-reason-act pattern (ReAct)
  4. Handle Argument Validation: Validate every parameter before execution to prevent crashes
  5. Manage Error Recovery: Feed errors back to the LLM so it can self-correct
  6. Prevent Infinite Loops: Detect and stop repeated failing tool calls
  7. Build Production Safety: Add confirmation steps for destructive actions
  8. Create Audit Trails: Log every decision for debugging and compliance
  9. Optimize Token Usage: Filter irrelevant tools to reduce costs
  10. Design Multi-Step Planning: Enable agents to chain multiple tools to complete complex tasks

2. Theoretical Foundation

2.1 Core Concepts

What is Tool Calling?

Tool calling (also called function calling) is a mechanism where the LLM outputs structured function invocations instead of natural language responses.

Without Tool Calling (Text-Based):

User: Order 5 pizzas
Model: I'll help you order pizzas. You can place an order by...

Problem: You can't execute this. It's just text.

With Tool Calling (Structured):

User: Order 5 pizzas
Model: {
  "tool": "create_order",
  "arguments": {
    "item": "pizza",
    "quantity": 5,
    "address": null  // Missing!
  }
}

You: Validate, detect missing argument, send error back
Model: Let me get the address first...
Model: {
  "tool": "get_user_info",
  "arguments": {"user_id": "current_user"}
}

You: Execute get_user_info(), return result
Model: Now I have the address...
Model: {
  "tool": "create_order",
  "arguments": {
    "item": "pizza",
    "quantity": 5,
    "address": "123 Main St, Springfield, IL 62701"
  }
}

You: Execute, return success

Why This Matters:

Tool calling transforms the LLM from a chatbot into an agent that can:

  • Modify databases
  • Call external APIs
  • Execute business logic
  • Chain multiple operations
  • Handle real-world tasks

JSON Schema as Contracts

A contract defines what a function expects (parameters) and returns (output). JSON Schema provides a language-agnostic way to specify these contracts.

Example Tool Definition:

{
  "name": "create_order",
  "description": "Creates a new food delivery order. Use when user wants to order food.",
  "parameters": {
    "type": "object",
    "properties": {
      "item": {
        "type": "string",
        "description": "Food item to order",
        "enum": ["pizza", "burger", "salad", "pasta"]
      },
      "quantity": {
        "type": "integer",
        "description": "Number of items",
        "minimum": 1,
        "maximum": 100
      },
      "address": {
        "type": "string",
        "description": "Full delivery address",
        "pattern": "^.+,.+,.+,\\s*\\d{5}$"
      }
    },
    "required": ["item", "quantity", "address"]
  },
  "returns": {
    "type": "object",
    "properties": {
      "order_id": {"type": "string"},
      "status": {"type": "string"},
      "estimated_time": {"type": "string"}
    }
  }
}

Why Schemas Are Critical:

  1. Type Safety: Catch type errors before execution
  2. Documentation: The schema IS the documentation
  3. Validation: Programmatically verify arguments
  4. Versioning: Track breaking changes to interfaces

The ReAct Pattern (Reasoning + Acting)

ReAct is the fundamental loop for tool-using agents:

LOOP until task complete OR max iterations:
  1. OBSERVE: Receive user input or tool result
  2. REASON: LLM thinks about what to do next
  3. ACT: LLM either:
     a) Calls a tool (action)
     b) Responds to user (finish)

Visual Representation:

User: "Order 5 pizzas to my house"
    โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ OBSERVE: User query + Available   โ”‚
โ”‚          tools + Conversation      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ REASON: Model analyzes intent      โ”‚
โ”‚  - Need to create order            โ”‚
โ”‚  - Need address (don't have it)    โ”‚
โ”‚  - Can get it from get_user_info   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ ACT: Call get_user_info()          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ OBSERVE: Tool result (address)     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ REASON: Now have all data          โ”‚
โ”‚  - Can create order                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ ACT: Call create_order()           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ OBSERVE: Success response          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ REASON: Task complete              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ ACT: Respond to user               โ”‚
โ”‚  "Order placed! ID: ord_789"       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Research Foundation:

The ReAct paper (โ€œReAct: Synergizing Reasoning and Acting in Language Modelsโ€ by Yao et al., 2022) showed that interleaving reasoning traces with actions improves:

  • Success rate on complex tasks (34% โ†’ 67%)
  • Interpretability (you can see the thinking)
  • Error recovery (model can detect its mistakes)

Statelessness in Tool Design

Principle: Tools should be black boxes to the LLM. The model knows the interface (inputs/outputs) but not the implementation.

Good Tool Design:

{
  "name": "get_weather",
  "description": "Gets current weather for a city",
  "parameters": {
    "city": {"type": "string"}
  }
}

Bad Tool Design:

{
  "name": "get_weather",
  "description": "Gets weather by querying the WeatherAPI v3 database at weather.com using API key stored in env.WEATHER_KEY",
  "parameters": {
    "city": {"type": "string"}
  }
}

Why?

Exposing implementation details:

  1. Leaks security information (API keys, endpoints)
  2. Makes the model make assumptions about side effects
  3. Creates coupling between prompt and infrastructure

Correct Abstraction Level:

Model needs to know: WHAT the tool does
Model should NOT know: HOW it's implemented

Error Handling Patterns

Errors are inevitable. The key is making them recoverable.

Types of Errors:

Error Type Example Recovery Strategy
Missing Argument address: null Send error with hint to call get_user_info
Invalid Type quantity: "five" Ask model to convert to integer
Out of Range quantity: 1000 Explain max limit, ask to adjust
Permission Denied User not authorized Inform and suggest login
Tool Not Found Hallucinates send_email List available tools, suggest closest match
API Failure External service down Retry or inform user

Error Message Design for LLMs:

// BAD: Human-oriented message
{
  "error": "Oops! Something went wrong. Please try again."
}

// GOOD: LLM-oriented message
{
  "error_type": "validation_failed",
  "field": "address",
  "expected": "string matching pattern: street, city, state, zip",
  "received": null,
  "suggestion": "Call get_user_info() to retrieve the user's saved address, or ask the user for it.",
  "available_tools": ["get_user_info", "update_user_address"]
}

The LLM can parse structured errors and self-correct!

2.2 Why This Matters

Production Relevance

Problem: Unreliable AI actions cause real damage

Without proper tool calling infrastructure:

# User: "Cancel my order"
# Model outputs: "I've cancelled order #789"

# But did it actually execute?
# Which order? User has 5 active orders!
# No audit trail, no confirmation, no safety check

Solution: Type-safe, validated, audited tool execution

# User: "Cancel my order"
# Router: Detects ambiguity, asks for clarification
# User: "The pizza order from today"
# Router: Resolves to order_id="ord_789"
# Router: Validates user has permission
# Router: Calls cancel_order(order_id="ord_789")
# Router: Logs decision
# Router: Confirms to user: "Order ord_789 cancelled"

Real-World Consequences:

  • Shopify: AI assistant that can modify inventory must validate EVERY parameter
  • Stripe: Payment API calls must be idempotent and logged
  • Notion: AI that edits documents must have undo capability
  • GitHub Copilot: Code execution must be sandboxed

One bug in tool calling can:

  • Charge customers incorrectly
  • Delete production data
  • Violate privacy regulations
  • Cause financial losses

Industry Applications

1. Customer Support Agents (Intercom, Zendesk)

const supportTools = [
  {
    name: "lookup_order",
    description: "Retrieve order details by ID",
    parameters: {
      order_id: { type: "string" }
    }
  },
  {
    name: "issue_refund",
    description: "Issue refund for an order",
    parameters: {
      order_id: { type: "string" },
      amount: { type: "number" },
      reason: { type: "string" }
    },
    requires_confirmation: true  // Safety!
  }
];

// User: "I want a refund for order #123"
// Agent: Calls lookup_order("123")
// Agent: Sees order total is $50
// Agent: Calls issue_refund("123", 50, "customer request")
// System: Asks for confirmation before executing

2. Code Execution Agents (ChatGPT Code Interpreter)

OpenAIโ€™s Code Interpreter uses tool calling to execute Python:

tools = [
  {
    "name": "execute_python",
    "description": "Run Python code in sandboxed environment",
    "parameters": {
      "code": {"type": "string"}
    }
  }
]

# User: "Plot a graph of y=x^2"
# Model: Calls execute_python("import matplotlib...")
# System: Runs in sandbox, returns image
# Model: Shows user the graph

3. Database Query Agents (Text2SQL)

Translating natural language to database queries:

tools = [
  {
    "name": "query_database",
    "description": "Execute SELECT query on customer database",
    "parameters": {
      "sql": {"type": "string", "pattern": "^SELECT .+"},  # Only SELECT
      "limit": {"type": "integer", "maximum": 1000}
    }
  }
]

# User: "Show me customers who signed up last month"
# Model: Calls query_database(sql="SELECT * FROM customers WHERE created_at > ...", limit=100)
# System: Validates (only SELECT, no DROP/DELETE), executes, returns results

2.3 Common Misconceptions

Misconception Reality
โ€œLLMs always output valid JSONโ€ Models hallucinate fields, types, and entire tools
โ€œFunction calling is deterministicโ€ Models make probabilistic decisions; validation is essential
โ€œError messages slow things downโ€ Rich errors enable self-correction, reducing retries
โ€œYou need to show all tools to the modelโ€ Showing 100 tools degrades performance; filter to top 5-10
โ€œTool calling is just parsing JSONโ€ Itโ€™s about bridging fuzzy intent to strict interfaces

3. Project Specification

3.1 What You Will Build

A tool routing system that:

  1. Registers tools with JSON Schema definitions
  2. Routes natural language to appropriate tool calls
  3. Validates arguments before execution
  4. Executes tools safely with error handling
  5. Manages multi-step planning through ReAct loops
  6. Handles disambiguation when intent is unclear
  7. Prevents infinite loops with circuit breakers
  8. Logs all actions for debugging and auditing
  9. Supports confirmations for dangerous operations
  10. Provides developer-friendly errors and metrics

Core Question This System Answers:

โ€œHow do I give an LLM โ€˜handsโ€™ while ensuring it doesnโ€™t break my API?โ€

3.2 Functional Requirements

FR1: Tool Registry Management

  • Load tool definitions from JSON files
  • Validate tool schemas at registration time
  • Support tool versioning (v1, v2)
  • Handle deprecated tools with warnings

FR2: Intent Analysis & Routing

  • Parse user queries to determine intent
  • Handle ambiguous queries (multiple matching tools)
  • Generate clarifying questions when needed
  • Support fuzzy matching for hallucinated tool names

FR3: Argument Validation

Validate all tool arguments:

  • Type checking: string, integer, boolean, arrays, objects
  • Constraint checking: min, max, pattern, enum
  • Required field verification
  • Custom validation rules (e.g., valid email, future date)

FR4: Tool Execution Engine

  • Execute tools with timeout protection
  • Handle synchronous and asynchronous operations
  • Capture return values and errors
  • Support retry logic for transient failures

FR5: Error Recovery Loop

  • Feed errors back to model as structured messages
  • Track error count per tool call
  • Detect error loops (same error repeated 3+ times)
  • Escalate to human when recovery fails

FR6: Multi-Step Planning (ReAct)

  • Maintain conversation context across tool calls
  • Pass tool results to model for next decision
  • Limit maximum iterations (prevent runaway agents)
  • Support parallel tool calls when possible

FR7: Safety & Confirmations

  • Flag destructive actions (delete, payment)
  • Require explicit user confirmation
  • Implement rate limiting per tool
  • Block out-of-scope tools based on user permissions

FR8: Audit & Logging

  • Log every tool call with timestamp
  • Record full arguments and results
  • Track token usage per session
  • Generate session summaries

3.3 Non-Functional Requirements

Requirement Target Rationale
Reliability 99.9% successful tool execution Production-grade agents must be trustworthy
Latency <500ms per tool call (excluding API) User-facing interactions need responsiveness
Safety 0 unauthorized actions Security is non-negotiable
Debuggability Full audit trail for every session Must be able to replay and debug failures
Extensibility Add new tools without code changes Business logic evolves rapidly
Token Efficiency <2000 tokens per multi-step task Control costs while maintaining quality

3.4 Example Usage

Running the router:

$ node router.js --tools ./tools.json --query "Order 5 pizzas to my house"

[Router Init] Loading tool definitions...
[Router Init] Registered 8 tools:
  - create_order (args: item, quantity, address)
  - cancel_order (args: order_id)
  - get_order_status (args: order_id)
  - update_user_address (args: address)
  - search_menu (args: category, dietary_restrictions)
  - get_user_info (args: user_id)
  - send_notification (args: user_id, message)
  - refund_order (args: order_id, reason)

[Intent Analysis]
User query: "Order 5 pizzas to my house"
Analyzing intent...

[LLM Tool Selection]
Model chose tool: create_order
Raw arguments from model:
{
  "item": "pizza",
  "quantity": 5,
  "address": null
}

[Validation Phase 1: Schema Check]
โœ“ Tool 'create_order' exists
โœ“ Argument 'item' is string (valid)
โœ“ Argument 'quantity' is integer (valid)
โœ— Argument 'address' is null (REQUIRED field missing)

[Error Recovery Loop - Attempt 1]
Sending error back to model:
"The 'address' field is required for create_order. You need to either:
1. Ask the user for their address, OR
2. Call get_user_info() to retrieve the saved address."

[LLM Recovery Response]
Model chose new tool: get_user_info
Arguments: { "user_id": "current_user" }

[Validation Phase 2]
โœ“ All arguments valid
Executing get_user_info(user_id='current_user')...

[Tool Execution]
API Response: {
  "user_id": "u_12345",
  "name": "John Doe",
  "address": "123 Main St, Apt 4B, Springfield, IL 62701",
  "phone": "+1-555-0123"
}

[LLM Re-planning]
Model received tool result. Re-attempting original goal...
Model chose tool: create_order
Arguments:
{
  "item": "pizza",
  "quantity": 5,
  "address": "123 Main St, Apt 4B, Springfield, IL 62701"
}

[Validation Phase 3]
โœ“ All arguments valid
โœ“ Address format validated (regex: street, city, state, zip)
โœ“ Quantity within bounds (1-100)
Executing create_order()...

[Tool Execution]
POST /api/orders
Request body: { "item": "pizza", "quantity": 5, "address": "..." }
API Response: {
  "order_id": "ord_789",
  "status": "pending",
  "estimated_time": "35 minutes",
  "total": 87.45
}

[Final Response Generation]
Sending tool result back to model for natural language response...

[Assistant Response]
"I've placed your order for 5 pizzas to be delivered to 123 Main St, Apt 4B, Springfield, IL. Your order ID is ord_789, and it should arrive in about 35 minutes. The total is $87.45."

[Audit Log]
Saved complete interaction to logs/2024-12-27_15-23-11.json
{
  "session_id": "sess_abc123",
  "user_query": "Order 5 pizzas to my house",
  "tools_called": [
    {
      "tool": "get_user_info",
      "args": {"user_id": "current_user"},
      "result": {...},
      "timestamp": "2024-12-27T15:23:11.245Z"
    },
    {
      "tool": "create_order",
      "args": {"item": "pizza", "quantity": 5, "address": "..."},
      "result": {...},
      "timestamp": "2024-12-27T15:23:12.891Z"
    }
  ],
  "errors_encountered": 1,
  "recovery_successful": true,
  "total_api_calls": 3,
  "total_tokens": 1247
}

Handling ambiguous requests:

$ node router.js --query "I want to return my order"

[Intent Analysis]
Ambiguous intent detected. Multiple tools match:
  - cancel_order (similarity: 0.82)
  - refund_order (similarity: 0.79)

[Disambiguation Required]
Model generating clarifying question...

[Assistant]
"I can help you with that. Do you want to:
1. Cancel the order (before it's delivered), or
2. Request a refund (after you received it)?"

[User Response]
"I already got it but the pizza was cold"

[Resolved Intent]
Model chose tool: refund_order
Arguments: { "order_id": null, "reason": "pizza was cold" }

[Validation Error]
โœ— Missing required field: order_id

[Clarification Loop]
Model: "I need your order ID to process the refund. You can find it in your order confirmation email, or I can look up your recent orders. Would you like me to check your recent orders?"

4. Solution Architecture

4.1 High-Level Design

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  User Input      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Intent Analyzer                        โ”‚
โ”‚  - Parse query                          โ”‚
โ”‚  - Detect ambiguity                     โ”‚
โ”‚  - Generate clarifications              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Tool Selector                          โ”‚
โ”‚  - Filter relevant tools                โ”‚
โ”‚  - Match intent to tool                 โ”‚
โ”‚  - Handle hallucinations                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Argument Extractor (LLM)               โ”‚
โ”‚  - Call LLM with tool definitions       โ”‚
โ”‚  - Receive structured tool call         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Validator                              โ”‚
โ”‚  - Schema validation (types)            โ”‚
โ”‚  - Constraint validation (ranges)       โ”‚
โ”‚  - Custom business rules                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
    Valid? โ”€โ”€โ”€โ”€Noโ”€โ”€โ”€โ”€โ–ถ Error Handler โ”€โ”€โ”€โ”
         โ”‚                               โ”‚
        Yes                              โ”‚
         โ”‚                               โ”‚
         โ–ผ                               โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  Execution Engine                   โ”‚  โ”‚
โ”‚  - Call actual function             โ”‚  โ”‚
โ”‚  - Timeout protection               โ”‚  โ”‚
โ”‚  - Capture result/error             โ”‚  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
         โ”‚                               โ”‚
         โ–ผ                               โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  Result Processor                   โ”‚  โ”‚
โ”‚  - Format result for LLM            โ”‚  โ”‚
โ”‚  - Update conversation context      โ”‚  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
         โ”‚                               โ”‚
         โ–ผ                               โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  ReAct Controller                   โ”‚  โ”‚
โ”‚  - Check if task complete           โ”‚โ—€โ”€โ”˜
โ”‚  - Loop for next action             โ”‚
โ”‚  - Prevent infinite loops           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Response Generator                 โ”‚
โ”‚  - LLM formats natural language     โ”‚
โ”‚  - Return to user                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

4.2 Key Components

Component Responsibility Implementation Strategy
ToolRegistry Store and version tool definitions JSON files + in-memory map
IntentAnalyzer Detect what user wants to do Embedding similarity + keyword matching
ToolSelector Choose which tool(s) apply Filter by intent, rank by relevance
ArgumentExtractor Get LLM to output structured call Native function calling API
SchemaValidator Validate against JSON Schema Use library (Ajv, Zod, Pydantic)
ExecutionEngine Run the actual function Strategy pattern per tool type
ErrorHandler Format errors for LLM Structured error templates
ReActController Manage agent loop State machine with max iterations
AuditLogger Record all actions JSON logs + optional DB
ConfirmationGate Require human approval Flag tools as dangerous

4.3 Data Structures

interface ToolDefinition {
  name: string;
  description: string;
  parameters: JSONSchema;
  returns?: JSONSchema;
  requires_confirmation?: boolean;
  dangerous?: boolean;
  rate_limit?: {
    max_calls: number;
    window_seconds: number;
  };
  version?: string;
  deprecated?: boolean;
  metadata?: Record<string, any>;
}

interface ToolCall {
  id: string;  // Unique ID for this call
  tool: string;
  arguments: Record<string, any>;
  timestamp: Date;
}

interface ToolResult {
  call_id: string;
  success: boolean;
  result?: any;
  error?: StructuredError;
  execution_time_ms: number;
  timestamp: Date;
}

interface StructuredError {
  error_type: "validation_failed" | "tool_not_found" | "permission_denied" | "execution_failed";
  message: string;
  field?: string;
  expected?: any;
  received?: any;
  suggestion?: string;
  available_tools?: string[];
}

interface AgentState {
  session_id: string;
  user_query: string;
  conversation_history: Message[];
  tools_called: ToolCall[];
  current_iteration: number;
  max_iterations: number;
  error_count: number;
  status: "running" | "waiting_for_user" | "completed" | "failed";
}

interface Message {
  role: "system" | "user" | "assistant" | "tool";
  content: string;
  tool_call_id?: string;
  tool_calls?: ToolCall[];
}

interface SessionLog {
  session_id: string;
  start_time: Date;
  end_time: Date;
  user_query: string;
  tools_called: ToolCall[];
  results: ToolResult[];
  final_response: string;
  total_tokens: number;
  total_cost: number;
  success: boolean;
}

4.4 Algorithm Overview

Main ReAct Loop Algorithm

async function runAgentLoop(
  userQuery: string,
  availableTools: ToolDefinition[],
  config: AgentConfig
): Promise<AgentResult> {
  const state: AgentState = {
    session_id: generateId(),
    user_query: userQuery,
    conversation_history: [],
    tools_called: [],
    current_iteration: 0,
    max_iterations: config.max_iterations || 10,
    error_count: 0,
    status: "running"
  };

  // Add system message with tool definitions
  state.conversation_history.push({
    role: "system",
    content: buildSystemPrompt(availableTools)
  });

  // Add user query
  state.conversation_history.push({
    role: "user",
    content: userQuery
  });

  // Main loop
  while (state.current_iteration < state.max_iterations) {
    state.current_iteration++;

    // OBSERVE & REASON: Call LLM
    const response = await callLLM(
      state.conversation_history,
      availableTools
    );

    // Check if model wants to finish
    if (response.finish_reason === "stop") {
      state.status = "completed";
      return {
        success: true,
        response: response.content,
        session_log: buildSessionLog(state)
      };
    }

    // ACT: Model wants to call tool(s)
    if (response.tool_calls && response.tool_calls.length > 0) {
      for (const toolCall of response.tool_calls) {
        // Validate tool call
        const validationResult = validateToolCall(toolCall, availableTools);

        if (!validationResult.valid) {
          // Send error back to model
          state.conversation_history.push({
            role: "tool",
            tool_call_id: toolCall.id,
            content: JSON.stringify(validationResult.error)
          });

          state.error_count++;

          // Check for error loop
          if (state.error_count > 3) {
            state.status = "failed";
            return {
              success: false,
              error: "Agent stuck in error loop",
              session_log: buildSessionLog(state)
            };
          }

          continue;  // Let model try again
        }

        // Execute tool
        try {
          const result = await executeTool(
            toolCall.tool,
            toolCall.arguments,
            config.timeout_ms
          );

          // Record successful execution
          state.tools_called.push(toolCall);

          // Add result to conversation
          state.conversation_history.push({
            role: "tool",
            tool_call_id: toolCall.id,
            content: JSON.stringify(result)
          });

          // Reset error count on success
          state.error_count = 0;

        } catch (error) {
          // Handle execution errors
          const structuredError = formatExecutionError(error);

          state.conversation_history.push({
            role: "tool",
            tool_call_id: toolCall.id,
            content: JSON.stringify(structuredError)
          });

          state.error_count++;
        }
      }
    }

    // Safety check: prevent runaway loops
    if (state.current_iteration >= state.max_iterations) {
      state.status = "failed";
      return {
        success: false,
        error: `Max iterations (${state.max_iterations}) exceeded`,
        session_log: buildSessionLog(state)
      };
    }
  }

  // Should not reach here
  throw new Error("Unexpected loop exit");
}

Tool Validation Algorithm

function validateToolCall(
  toolCall: ToolCall,
  availableTools: ToolDefinition[]
): ValidationResult {
  // Step 1: Check if tool exists
  const toolDef = availableTools.find(t => t.name === toolCall.tool);

  if (!toolDef) {
    return {
      valid: false,
      error: {
        error_type: "tool_not_found",
        message: `Tool '${toolCall.tool}' does not exist`,
        available_tools: availableTools.map(t => t.name),
        suggestion: findClosestToolName(toolCall.tool, availableTools)
      }
    };
  }

  // Step 2: Validate against JSON Schema
  const schemaValidator = new JSONSchemaValidator(toolDef.parameters);
  const schemaResult = schemaValidator.validate(toolCall.arguments);

  if (!schemaResult.valid) {
    return {
      valid: false,
      error: {
        error_type: "validation_failed",
        message: "Arguments do not match schema",
        field: schemaResult.errors[0].field,
        expected: schemaResult.errors[0].expected,
        received: schemaResult.errors[0].received,
        suggestion: generateFixSuggestion(schemaResult.errors[0])
      }
    };
  }

  // Step 3: Custom business rule validation
  const customValidation = runCustomValidators(toolCall, toolDef);

  if (!customValidation.valid) {
    return {
      valid: false,
      error: customValidation.error
    };
  }

  // All checks passed
  return { valid: true };
}

Intent Disambiguation Algorithm

async function disambiguateIntent(
  userQuery: string,
  candidateTools: ToolDefinition[]
): Promise<ToolDefinition | null> {
  if (candidateTools.length === 1) {
    return candidateTools[0];
  }

  if (candidateTools.length === 0) {
    return null;
  }

  // Multiple candidates - ask for clarification
  const options = candidateTools.map((tool, i) => ({
    number: i + 1,
    tool: tool.name,
    description: tool.description
  }));

  const clarificationPrompt = `
    Your query could match multiple actions:
    ${options.map(opt => `${opt.number}. ${opt.description}`).join('\n')}

    Which one did you mean?
  `;

  // Send to user, wait for response
  // (This is simplified - real implementation would handle async user input)
  const userChoice = await askUser(clarificationPrompt);

  // Parse user choice (number or description match)
  const selectedIndex = parseInt(userChoice) - 1;

  if (selectedIndex >= 0 && selectedIndex < candidateTools.length) {
    return candidateTools[selectedIndex];
  }

  // Try semantic matching on user's clarification
  return selectBestMatch(userChoice, candidateTools);
}

4.5 Tool Definition Examples

Simple Tool:

{
  "name": "get_weather",
  "description": "Get current weather for a city",
  "parameters": {
    "type": "object",
    "properties": {
      "city": {
        "type": "string",
        "description": "City name"
      },
      "units": {
        "type": "string",
        "enum": ["celsius", "fahrenheit"],
        "default": "celsius"
      }
    },
    "required": ["city"]
  },
  "returns": {
    "type": "object",
    "properties": {
      "temperature": {"type": "number"},
      "condition": {"type": "string"},
      "humidity": {"type": "number"}
    }
  }
}

Complex Tool with Validation:

{
  "name": "transfer_funds",
  "description": "Transfer money between accounts",
  "parameters": {
    "type": "object",
    "properties": {
      "from_account": {
        "type": "string",
        "pattern": "^ACC[0-9]{8}$",
        "description": "Source account ID (format: ACC12345678)"
      },
      "to_account": {
        "type": "string",
        "pattern": "^ACC[0-9]{8}$",
        "description": "Destination account ID"
      },
      "amount": {
        "type": "number",
        "minimum": 0.01,
        "maximum": 10000,
        "description": "Amount to transfer (max $10,000 per transaction)"
      },
      "memo": {
        "type": "string",
        "maxLength": 100
      }
    },
    "required": ["from_account", "to_account", "amount"]
  },
  "requires_confirmation": true,
  "dangerous": true,
  "rate_limit": {
    "max_calls": 5,
    "window_seconds": 3600
  }
}

5. Implementation Guide

Phase 1: Foundation (Days 1-3)

Step 1: Set Up Tool Registry

Create tool definition file (tools/tools.json):

{
  "version": "1.0.0",
  "tools": [
    {
      "name": "get_time",
      "description": "Get current time",
      "parameters": {
        "type": "object",
        "properties": {},
        "required": []
      }
    },
    {
      "name": "get_weather",
      "description": "Get weather for a city",
      "parameters": {
        "type": "object",
        "properties": {
          "city": {"type": "string"}
        },
        "required": ["city"]
      }
    }
  ]
}

Checkpoint 1.1: Can you load and parse this JSON into TypeScript interfaces?

Step 2: Implement Mock Tool Execution

Donโ€™t connect to real APIs yet:

const mockTools = {
  get_time: () => ({ time: new Date().toISOString() }),
  get_weather: (args: { city: string }) => ({
    city: args.city,
    temperature: 72,
    condition: "sunny"
  })
};

function executeTool(toolName: string, args: any): any {
  const fn = mockTools[toolName];
  if (!fn) {
    throw new Error(`Tool ${toolName} not found`);
  }
  return fn(args);
}

Checkpoint 1.2: Can you call executeTool("get_weather", {city: "NYC"}) and get a result?

Step 3: Integrate Native Function Calling API

For OpenAI:

import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function callLLMWithTools(
  messages: Message[],
  tools: ToolDefinition[]
): Promise<any> {
  const response = await client.chat.completions.create({
    model: "gpt-4",
    messages: messages,
    tools: tools.map(tool => ({
      type: "function",
      function: {
        name: tool.name,
        description: tool.description,
        parameters: tool.parameters
      }
    }))
  });

  return response.choices[0].message;
}

Checkpoint 1.3: Send a message asking for the weather and verify the model outputs a tool call.

Phase 2: Validation & Error Handling (Days 4-6)

Step 4: Implement Schema Validation

Using Zod (TypeScript):

import { z } from "zod";

function validateWithZod(args: any, schema: any): ValidationResult {
  try {
    const zodSchema = jsonSchemaToZod(schema);  // Convert JSON Schema to Zod
    zodSchema.parse(args);
    return { valid: true };
  } catch (error) {
    if (error instanceof z.ZodError) {
      return {
        valid: false,
        error: {
          error_type: "validation_failed",
          message: error.errors[0].message,
          field: error.errors[0].path.join('.'),
          expected: error.errors[0].expected,
          received: error.errors[0].received
        }
      };
    }
    throw error;
  }
}

Checkpoint 2.1: Test with invalid arguments. Does validation catch type errors?

Step 5: Build Error Recovery Loop

async function runWithErrorRecovery(
  userQuery: string,
  tools: ToolDefinition[]
): Promise<string> {
  const messages: Message[] = [
    { role: "user", content: userQuery }
  ];

  for (let attempt = 0; attempt < 3; attempt++) {
    const response = await callLLMWithTools(messages, tools);

    if (response.tool_calls) {
      const toolCall = response.tool_calls[0];

      // Validate
      const validation = validateToolCall(toolCall, tools);

      if (!validation.valid) {
        // Send error back to model
        messages.push({
          role: "assistant",
          content: "",
          tool_calls: [toolCall]
        });

        messages.push({
          role: "tool",
          tool_call_id: toolCall.id,
          content: JSON.stringify(validation.error)
        });

        console.log(`Validation failed (attempt ${attempt + 1}): ${validation.error.message}`);
        continue;  // Let model try again
      }

      // Execute
      const result = executeTool(toolCall.function.name, JSON.parse(toolCall.function.arguments));

      return `Tool executed successfully: ${JSON.stringify(result)}`;
    }

    // No tool call, return text response
    return response.content;
  }

  throw new Error("Failed after 3 recovery attempts");
}

Checkpoint 2.2: Test with a query that requires error recovery (e.g., missing required field).

Phase 3: ReAct Loop & Multi-Step (Days 7-10)

Step 6: Implement Full ReAct Loop

async function runReActLoop(
  userQuery: string,
  tools: ToolDefinition[],
  maxIterations: number = 10
): Promise<SessionLog> {
  const sessionId = generateId();
  const messages: Message[] = [
    {
      role: "system",
      content: "You are a helpful assistant with access to tools. Use them to complete tasks."
    },
    {
      role: "user",
      content: userQuery
    }
  ];

  const toolsCalled: ToolCall[] = [];
  let iteration = 0;

  while (iteration < maxIterations) {
    iteration++;
    console.log(`\n[Iteration ${iteration}]`);

    // Call LLM
    const response = await callLLMWithTools(messages, tools);

    // Check if done
    if (response.finish_reason === "stop" && !response.tool_calls) {
      console.log("[Agent] Task complete");
      return {
        session_id: sessionId,
        success: true,
        final_response: response.content,
        tools_called: toolsCalled,
        iterations: iteration
      };
    }

    // Process tool calls
    if (response.tool_calls) {
      // Add assistant message
      messages.push({
        role: "assistant",
        content: response.content || "",
        tool_calls: response.tool_calls
      });

      for (const toolCall of response.tool_calls) {
        console.log(`[Agent] Calling tool: ${toolCall.function.name}`);
        console.log(`[Agent] Arguments: ${toolCall.function.arguments}`);

        // Validate
        const args = JSON.parse(toolCall.function.arguments);
        const validation = validateToolCall(
          { ...toolCall, arguments: args },
          tools
        );

        if (!validation.valid) {
          console.log(`[Validation] Failed: ${validation.error.message}`);
          messages.push({
            role: "tool",
            tool_call_id: toolCall.id,
            content: JSON.stringify(validation.error)
          });
          continue;
        }

        // Execute
        try {
          const result = executeTool(toolCall.function.name, args);
          console.log(`[Execution] Success: ${JSON.stringify(result)}`);

          toolsCalled.push({
            id: toolCall.id,
            tool: toolCall.function.name,
            arguments: args,
            timestamp: new Date()
          });

          messages.push({
            role: "tool",
            tool_call_id: toolCall.id,
            content: JSON.stringify(result)
          });
        } catch (error) {
          console.log(`[Execution] Error: ${error.message}`);
          messages.push({
            role: "tool",
            tool_call_id: toolCall.id,
            content: JSON.stringify({
              error_type: "execution_failed",
              message: error.message
            })
          });
        }
      }
    }
  }

  // Max iterations reached
  return {
    session_id: sessionId,
    success: false,
    error: "Max iterations exceeded",
    tools_called: toolsCalled,
    iterations: iteration
  };
}

Checkpoint 3.1: Test with a multi-step task (e.g., โ€œGet weather for NYC and tell me what time it isโ€).

Step 7: Add Audit Logging

function saveSessionLog(log: SessionLog): void {
  const filename = `logs/${log.session_id}_${Date.now()}.json`;

  const detailedLog = {
    ...log,
    metadata: {
      timestamp: new Date().toISOString(),
      environment: process.env.NODE_ENV,
      model: "gpt-4"
    }
  };

  fs.writeFileSync(filename, JSON.stringify(detailedLog, null, 2));
  console.log(`[Audit] Saved session log: ${filename}`);
}

Checkpoint 3.2: Verify that each session creates a detailed log file.

Phase 4: Production Features (Days 11-14)

Step 8: Implement Confirmation Gates

async function executeWithConfirmation(
  toolCall: ToolCall,
  toolDef: ToolDefinition
): Promise<any> {
  if (toolDef.requires_confirmation) {
    console.log(`\nโš ๏ธ  CONFIRMATION REQUIRED โš ๏ธ`);
    console.log(`Tool: ${toolCall.tool}`);
    console.log(`Arguments: ${JSON.stringify(toolCall.arguments, null, 2)}`);
    console.log(`\nThis action is potentially destructive.`);

    const confirmed = await promptUser("Do you want to proceed? (yes/no): ");

    if (confirmed.toLowerCase() !== "yes") {
      throw new Error("User cancelled operation");
    }
  }

  return executeTool(toolCall.tool, toolCall.arguments);
}

Step 9: Add Rate Limiting

class RateLimiter {
  private callCounts: Map<string, { count: number; windowStart: number }> = new Map();

  checkLimit(toolName: string, limit: { max_calls: number; window_seconds: number }): boolean {
    const now = Date.now();
    const key = toolName;
    const entry = this.callCounts.get(key);

    if (!entry) {
      this.callCounts.set(key, { count: 1, windowStart: now });
      return true;
    }

    const windowElapsed = (now - entry.windowStart) / 1000;

    if (windowElapsed >= limit.window_seconds) {
      // Reset window
      this.callCounts.set(key, { count: 1, windowStart: now });
      return true;
    }

    if (entry.count >= limit.max_calls) {
      return false;  // Rate limit exceeded
    }

    entry.count++;
    return true;
  }
}

Step 10: Build CLI Interface

import { Command } from "commander";

const program = new Command();

program
  .name("router")
  .description("AI Agent Tool Router")
  .version("1.0.0");

program
  .command("run")
  .description("Run the agent")
  .option("-q, --query <query>", "User query")
  .option("-t, --tools <file>", "Tool definitions file", "./tools.json")
  .option("-m, --max-iter <number>", "Max iterations", "10")
  .action(async (options) => {
    const tools = loadTools(options.tools);
    const result = await runReActLoop(
      options.query,
      tools,
      parseInt(options.maxIter)
    );

    console.log("\n" + "=".repeat(60));
    console.log("FINAL RESULT");
    console.log("=".repeat(60));
    console.log(result.final_response);
    console.log(`\nTools called: ${result.tools_called.length}`);
    console.log(`Iterations: ${result.iterations}`);
  });

program.parse();

6. Testing Strategy

6.1 Unit Tests

describe("Tool Validation", () => {
  test("should accept valid arguments", () => {
    const toolDef = {
      name: "test_tool",
      parameters: {
        type: "object",
        properties: {
          name: { type: "string" }
        },
        required: ["name"]
      }
    };

    const result = validateToolCall(
      { tool: "test_tool", arguments: { name: "Alice" } },
      [toolDef]
    );

    expect(result.valid).toBe(true);
  });

  test("should reject missing required field", () => {
    const result = validateToolCall(
      { tool: "test_tool", arguments: {} },
      [toolDef]
    );

    expect(result.valid).toBe(false);
    expect(result.error.error_type).toBe("validation_failed");
  });

  test("should reject invalid type", () => {
    const result = validateToolCall(
      { tool: "test_tool", arguments: { name: 123 } },
      [toolDef]
    );

    expect(result.valid).toBe(false);
  });
});

describe("Error Recovery", () => {
  test("should retry after validation error", async () => {
    const mockLLM = jest.fn()
      .mockResolvedValueOnce({
        // First call: invalid arguments
        tool_calls: [{
          id: "1",
          function: { name: "get_weather", arguments: "{}" }
        }]
      })
      .mockResolvedValueOnce({
        // Second call: valid arguments
        tool_calls: [{
          id: "2",
          function: { name: "get_weather", arguments: '{"city": "NYC"}' }
        }]
      });

    const result = await runWithRecovery(mockLLM, tools);

    expect(mockLLM).toHaveBeenCalledTimes(2);
    expect(result.success).toBe(true);
  });
});

6.2 Integration Tests

describe("Multi-Step Tasks", () => {
  test("should complete task requiring 2 tools", async () => {
    const query = "Get weather for NYC and tell me the time";

    const result = await runReActLoop(query, tools, 10);

    expect(result.success).toBe(true);
    expect(result.tools_called.length).toBe(2);
    expect(result.tools_called.map(t => t.tool)).toContain("get_weather");
    expect(result.tools_called.map(t => t.tool)).toContain("get_time");
  });

  test("should handle disambiguation", async () => {
    // This would require mocking user input
    const query = "I want to return my order";

    // Test that system asks for clarification
    // Then processes user's clarification
    // Then executes correct tool
  });
});

6.3 Performance Tests

describe("Performance", () => {
  test("should complete simple task in <5 seconds", async () => {
    const start = Date.now();

    await runReActLoop("What's the weather?", tools, 10);

    const elapsed = Date.now() - start;
    expect(elapsed).toBeLessThan(5000);
  });

  test("should handle rate limiting", () => {
    const limiter = new RateLimiter();
    const limit = { max_calls: 3, window_seconds: 60 };

    expect(limiter.checkLimit("test_tool", limit)).toBe(true);
    expect(limiter.checkLimit("test_tool", limit)).toBe(true);
    expect(limiter.checkLimit("test_tool", limit)).toBe(true);
    expect(limiter.checkLimit("test_tool", limit)).toBe(false);  // 4th call blocked
  });
});

7. Common Pitfalls & Debugging

7.1 The Hallucinated Tool Problem

Symptom: Model tries to call tools that donโ€™t exist

// Model output
{
  "tool": "send_email",  // This tool doesn't exist!
  "arguments": {...}
}

Solution: Fuzzy matching with suggestions

function findClosestToolName(
  hallucinated: string,
  availableTools: ToolDefinition[]
): string | null {
  const scores = availableTools.map(tool => ({
    name: tool.name,
    score: levenshteinDistance(hallucinated, tool.name)
  }));

  scores.sort((a, b) => a.score - b.score);

  // If closest match is "close enough"
  if (scores[0].score <= 3) {
    return scores[0].name;
  }

  return null;
}

// In error response
{
  "error_type": "tool_not_found",
  "message": `Tool '${hallucinated}' not found. Did you mean '${closest}'?`,
  "available_tools": ["get_weather", "get_time", ...]
}

7.2 The Infinite Loop Problem

Symptom: Agent keeps calling the same failing tool

// Loop detected
[Iteration 1] Call: create_order โ†’ Error: Missing address
[Iteration 2] Call: create_order โ†’ Error: Missing address
[Iteration 3] Call: create_order โ†’ Error: Missing address
...

Solution: Loop detection

class LoopDetector {
  private history: string[] = [];

  detect(toolCall: ToolCall): boolean {
    const signature = `${toolCall.tool}:${JSON.stringify(toolCall.arguments)}`;

    // Check last 3 calls
    const recent = this.history.slice(-3);
    const repeats = recent.filter(sig => sig === signature).length;

    this.history.push(signature);

    // If same call 3 times in a row, it's a loop
    return repeats >= 2;
  }
}

7.3 The Type Confusion Problem

Symptom: Model outputs string instead of integer

// Model output
{
  "quantity": "five"  // Should be 5
}

Solution: Smart type coercion + clear error messages

function coerceTypes(args: any, schema: JSONSchema): any {
  const coerced = { ...args };

  for (const [key, propSchema] of Object.entries(schema.properties)) {
    if (propSchema.type === "integer" && typeof coerced[key] === "string") {
      // Try to parse
      const parsed = parseInt(coerced[key]);

      if (!isNaN(parsed)) {
        coerced[key] = parsed;
        console.warn(`Coerced ${key} from string to integer`);
      } else {
        throw new ValidationError(
          `Cannot convert "${coerced[key]}" to integer. Please provide a numeric value.`
        );
      }
    }
  }

  return coerced;
}

7.4 The Context Window Explosion

Symptom: After 10 tool calls, context is too large

Solution: Summarize old tool results

function compressConversationHistory(
  messages: Message[],
  maxTokens: number
): Message[] {
  let tokenCount = estimateTokens(messages);

  if (tokenCount <= maxTokens) {
    return messages;
  }

  // Keep system message and recent messages
  const system = messages[0];
  const recent = messages.slice(-10);

  // Summarize middle messages
  const middle = messages.slice(1, -10);
  const summary = summarizeToolCalls(middle);

  return [
    system,
    { role: "system", content: `Previous actions: ${summary}` },
    ...recent
  ];
}

8. Extensions

8.1 Beginner Extensions

Extension 1: Tool Usage Analytics

Track which tools are called most frequently:

class ToolAnalytics {
  private stats: Map<string, { count: number; avg_time_ms: number }> = new Map();

  record(toolName: string, executionTimeMs: number) {
    const entry = this.stats.get(toolName) || { count: 0, avg_time_ms: 0 };
    entry.count++;
    entry.avg_time_ms = (entry.avg_time_ms * (entry.count - 1) + executionTimeMs) / entry.count;
    this.stats.set(toolName, entry);
  }

  report() {
    console.log("\nTool Usage Statistics:");
    for (const [tool, stats] of this.stats.entries()) {
      console.log(`  ${tool}: ${stats.count} calls, avg ${stats.avg_time_ms.toFixed(0)}ms`);
    }
  }
}

Extension 2: Tool Filtering by Intent

Donโ€™t show all 100 tools to the model - filter to top 5:

function filterRelevantTools(
  userQuery: string,
  allTools: ToolDefinition[],
  topK: number = 5
): ToolDefinition[] {
  const queryEmbedding = generateEmbedding(userQuery);

  const scored = allTools.map(tool => ({
    tool,
    score: cosineSimilarity(
      queryEmbedding,
      generateEmbedding(tool.description)
    )
  }));

  scored.sort((a, b) => b.score - a.score);

  return scored.slice(0, topK).map(s => s.tool);
}

8.2 Intermediate Extensions

Extension 3: Parallel Tool Execution

Execute independent tools in parallel:

async function executeToolsInParallel(
  toolCalls: ToolCall[]
): Promise<ToolResult[]> {
  // Analyze dependencies
  const groups = groupByDependencies(toolCalls);

  const results: ToolResult[] = [];

  for (const group of groups) {
    // Execute all tools in this group in parallel
    const promises = group.map(tc => executeTool(tc.tool, tc.arguments));
    const groupResults = await Promise.all(promises);
    results.push(...groupResults);
  }

  return results;
}

Extension 4: Tool Versioning

Support multiple versions of the same tool:

interface VersionedTool extends ToolDefinition {
  version: string;
  deprecated?: boolean;
  migration_guide?: string;
}

function selectToolVersion(
  toolName: string,
  requestedVersion: string | "latest"
): VersionedTool {
  const versions = registry.getVersions(toolName);

  if (requestedVersion === "latest") {
    return versions.filter(v => !v.deprecated)[0];
  }

  return versions.find(v => v.version === requestedVersion);
}

8.3 Advanced Extensions

Extension 5: LLM-as-a-Judge for Tool Selection

Use a second LLM call to evaluate if tool selection was correct:

async function judgeToolSelection(
  userQuery: string,
  selectedTool: string,
  allTools: ToolDefinition[]
): Promise<{ correct: boolean; reason: string }> {
  const judgePrompt = `
    User query: "${userQuery}"
    Selected tool: ${selectedTool}
    Available tools: ${allTools.map(t => `${t.name}: ${t.description}`).join('\n')}

    Is this the correct tool? Explain why or why not.
  `;

  const response = await callLLM(judgePrompt);

  // Parse response to determine if correct
  return parseJudgeResponse(response);
}

Extension 6: Automatic Tool Discovery

Automatically generate tool definitions from TypeScript types:

import { z } from "zod";

// Define tool with Zod schema
const CreateOrderSchema = z.object({
  item: z.enum(["pizza", "burger", "salad"]),
  quantity: z.number().int().min(1).max(100),
  address: z.string()
});

// Automatically generate JSON Schema
function generateToolDefinition(
  name: string,
  description: string,
  schema: z.ZodSchema
): ToolDefinition {
  return {
    name,
    description,
    parameters: zodToJsonSchema(schema)
  };
}

const createOrderTool = generateToolDefinition(
  "create_order",
  "Creates a food order",
  CreateOrderSchema
);

Extension 7: Distributed Agent System

Multiple specialized agents working together:

class AgentOrchestrator {
  private agents: Map<string, Agent> = new Map([
    ["weather_agent", new WeatherAgent()],
    ["order_agent", new OrderAgent()],
    ["support_agent", new SupportAgent()]
  ]);

  async routeToAgent(userQuery: string): Promise<string> {
    // Determine which agent is best suited
    const agentName = await selectBestAgent(userQuery, this.agents);

    // Route to that agent
    const agent = this.agents.get(agentName);
    return await agent.handle(userQuery);
  }
}

9. Real-World Connections

9.1 Production Case Studies

1. Shopify Sidekick (E-commerce Assistant)

Shopifyโ€™s AI assistant uses tool calling to:

  • Query product inventory
  • Modify store settings
  • Generate reports
  • Answer merchant questions

Implementation insights:

  • Every write operation requires merchant confirmation
  • Tool calls are rate-limited per merchant
  • Full audit log for compliance
  • Separate tool sets for different permission levels

2. ChatGPT Plugins

OpenAIโ€™s plugin system is tool calling at scale:

  • 1000+ plugins (tools) available
  • Dynamic tool selection based on user intent
  • Sandboxed execution
  • OAuth for user authentication

3. Replit GhostWriter (Code Agent)

Replitโ€™s AI that writes and debugs code uses tools for:

  • Creating files
  • Running tests
  • Installing packages
  • Executing code

Safety measures:

  • Code execution in isolated containers
  • Timeout limits (30s per execution)
  • Resource limits (CPU, memory)
  • User approval for file modifications

9.2 Design Patterns in Production

Pattern 1: Tool Namespacing

Organize tools by domain:

const tools = {
  "orders.create": createOrderTool,
  "orders.cancel": cancelOrderTool,
  "orders.status": getOrderStatusTool,
  "users.get": getUserTool,
  "users.update": updateUserTool
};

// Model can call: "orders.create" or "users.get"

Pattern 2: Capability-Based Access

Filter tools by user permissions:

function getAvailableTools(user: User): ToolDefinition[] {
  const allTools = loadAllTools();

  return allTools.filter(tool => {
    // Check if user has capability
    return user.capabilities.includes(tool.required_capability);
  });
}

Pattern 3: Circuit Breaker

Prevent cascading failures:

class CircuitBreaker {
  private failures = 0;
  private state: "closed" | "open" | "half-open" = "closed";

  async execute(fn: () => Promise<any>): Promise<any> {
    if (this.state === "open") {
      throw new Error("Circuit breaker is open");
    }

    try {
      const result = await fn();
      this.failures = 0;
      this.state = "closed";
      return result;
    } catch (error) {
      this.failures++;

      if (this.failures >= 3) {
        this.state = "open";
        setTimeout(() => { this.state = "half-open"; }, 60000);
      }

      throw error;
    }
  }
}

10. Resources

10.1 Books

Topic Book Chapter
Boundary Design โ€œClean Codeโ€ by Robert Martin Ch. 8 (Boundaries - interfacing with external systems)
Interface Contracts โ€œThe Pragmatic Programmerโ€ by Hunt & Thomas Ch. 5 (Bend or Break - Design by Contract)
JSON Schema โ€œDesigning Data-Intensive Applicationsโ€ by Kleppmann Ch. 4 (Encoding & Schema Evolution)
Error Handling โ€œCode Completeโ€ by McConnell Ch. 8 (Defensive Programming)
API Design โ€œREST API Design Rulebookโ€ by Mark Massรฉ Ch. 2 (Identifier Design) & Ch. 6 (Request Methods)
State Machines โ€œClean Codeโ€ by Robert Martin Ch. 6 (Objects and Data Structures)
Validation Patterns โ€œRefactoringโ€ by Martin Fowler Ch. 11 (Simplifying Conditional Expressions)
Agent Architectures โ€œAI Engineeringโ€ by Chip Huyen Ch. 6 (Agent Patterns & Tool Use)
Type Safety โ€œEffective TypeScriptโ€ by Dan Vanderkam Items 1-10 (Understanding TypeScriptโ€™s Type System)

10.2 Papers

  1. โ€œReAct: Synergizing Reasoning and Acting in Language Modelsโ€ (Yao et al., 2022)
    • Foundation for agent loops
    • https://arxiv.org/abs/2210.03629
  2. โ€œToolformer: Language Models Can Teach Themselves to Use Toolsโ€ (Schick et al., 2023)
    • Self-supervised tool learning
    • https://arxiv.org/abs/2302.04761
  3. โ€œGorilla: Large Language Model Connected with Massive APIsโ€ (Patil et al., 2023)
    • API calling optimization
    • https://arxiv.org/abs/2305.15334

10.3 API Documentation

  • OpenAI Function Calling: https://platform.openai.com/docs/guides/function-calling
  • Anthropic Tool Use: https://docs.anthropic.com/claude/docs/tool-use
  • JSON Schema Specification: https://json-schema.org/

10.4 Libraries

# TypeScript
npm install zod          # Schema validation
npm install ajv          # JSON Schema validator
npm install commander    # CLI framework

# Python
pip install pydantic     # Data validation
pip install jsonschema   # JSON Schema validation
pip install click        # CLI framework

11. Self-Assessment Checklist

Core Understanding

  • I can explain what tool calling is and why it matters
    • Test: Explain to someone the difference between text responses and tool calls
  • I understand JSON Schema and can write schemas by hand
    • Test: Write a schema for a complex nested object with constraints
  • I know the ReAct pattern
    • Test: Draw the observe-reason-act loop on a whiteboard
  • I can identify security risks in tool design
    • Test: List 5 ways tool calling can go wrong in production

Implementation Skills

  • Iโ€™ve implemented argument validation
    • Evidence: Validator catches all invalid types, missing fields, constraint violations
  • Iโ€™ve built an error recovery loop
    • Evidence: System recovers from validation errors automatically
  • Iโ€™ve implemented the ReAct loop
    • Evidence: Agent completes multi-step tasks end-to-end
  • Iโ€™ve added audit logging
    • Evidence: Every session has a complete JSON log
  • Iโ€™ve implemented safety features
    • Evidence: Dangerous tools require confirmation

Production Readiness

  • My system handles edge cases
    • Hallucinated tools
    • Type confusion
    • Infinite loops
    • Rate limiting
  • I have comprehensive error messages
    • Evidence: Errors include suggestions for how to fix
  • I can debug failed sessions
    • Evidence: Audit logs contain enough detail to replay
  • Iโ€™ve tested performance
    • Evidence: Selection + validation + execution <500ms

Growth

  • I can design tool contracts for new domains
    • Application: Design 5 tools for a different use case (e.g., email management)
  • I understand when NOT to use tool calling
    • Give examples where tool calling adds unnecessary complexity
  • I can explain this to stakeholders
    • Practice: 3-minute pitch on why tool calling improves reliability

12. Submission / Completion Criteria

Minimum Viable Completion

  • Can register tools from JSON
    • Loads tool definitions
    • Validates schema format
  • Can call LLM with tool definitions
    • Uses native function calling API
    • Receives structured tool calls
  • Validates tool arguments
    • Checks types and required fields
    • Returns structured errors
  • Executes tools
    • At least 3 mock tools working
    • Captures results

Proof: Screenshot showing validation error + recovery

Full Completion

All minimum criteria plus:

  • Full ReAct loop
    • Multi-step task completion
    • Max iteration limits
    • Success/failure reporting
  • Error recovery
    • Feeds errors back to model
    • Detects infinite loops
    • Handles at least 5 error types
  • Audit logging
    • JSON logs per session
    • Complete conversation history
    • Token usage tracking
  • CLI interface
    • Clear help text
    • Multiple commands
    • Professional output formatting

Proof: Public GitHub repository with README

Excellence

All full completion criteria plus any 3+:

  • Parallel tool execution
  • Tool versioning
  • Automatic schema generation from types
  • Distributed agent system
  • Production deployment (API endpoint)
  • Monitoring dashboard
  • Integration tests with real LLM calls

Proof: Blog post, video demo, or production URL


End of Project 6: Tool Router