Project 6: Tool Router
Project 6: Tool Router
Build a production-grade AI agent system that safely bridges natural language and programmatic function calls
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Expert |
| Time Estimate | 1-2 weeks |
| Language | TypeScript (Alternatives: Python) |
| Prerequisites | Projects 1-2, JSON Schema knowledge, API design experience |
| Key Topics | Tool calling, function schemas, agent loops, error recovery |
| Knowledge Area | Tool Use / Agent Reliability |
| Software/Tool | OpenAI Function Calling / Anthropic Tool Use APIs |
| Main Book | โClean Codeโ by Robert Martin (Ch. 8: Boundaries) |
| Coolness Level | Level 4: โOh wow, thatโs realโ |
| Business Potential | 4. The โOpen Coreโ Infrastructure |
1. Learning Objectives
By completing this project, you will:
- Master Function Calling APIs: Understand how LLMs output structured tool calls instead of text
- Design Tool Contracts: Create JSON Schema definitions that act as type-safe interfaces
- Implement Agent Loops: Build the observe-reason-act pattern (ReAct)
- Handle Argument Validation: Validate every parameter before execution to prevent crashes
- Manage Error Recovery: Feed errors back to the LLM so it can self-correct
- Prevent Infinite Loops: Detect and stop repeated failing tool calls
- Build Production Safety: Add confirmation steps for destructive actions
- Create Audit Trails: Log every decision for debugging and compliance
- Optimize Token Usage: Filter irrelevant tools to reduce costs
- Design Multi-Step Planning: Enable agents to chain multiple tools to complete complex tasks
2. Theoretical Foundation
2.1 Core Concepts
What is Tool Calling?
Tool calling (also called function calling) is a mechanism where the LLM outputs structured function invocations instead of natural language responses.
Without Tool Calling (Text-Based):
User: Order 5 pizzas
Model: I'll help you order pizzas. You can place an order by...
Problem: You can't execute this. It's just text.
With Tool Calling (Structured):
User: Order 5 pizzas
Model: {
"tool": "create_order",
"arguments": {
"item": "pizza",
"quantity": 5,
"address": null // Missing!
}
}
You: Validate, detect missing argument, send error back
Model: Let me get the address first...
Model: {
"tool": "get_user_info",
"arguments": {"user_id": "current_user"}
}
You: Execute get_user_info(), return result
Model: Now I have the address...
Model: {
"tool": "create_order",
"arguments": {
"item": "pizza",
"quantity": 5,
"address": "123 Main St, Springfield, IL 62701"
}
}
You: Execute, return success
Why This Matters:
Tool calling transforms the LLM from a chatbot into an agent that can:
- Modify databases
- Call external APIs
- Execute business logic
- Chain multiple operations
- Handle real-world tasks
JSON Schema as Contracts
A contract defines what a function expects (parameters) and returns (output). JSON Schema provides a language-agnostic way to specify these contracts.
Example Tool Definition:
{
"name": "create_order",
"description": "Creates a new food delivery order. Use when user wants to order food.",
"parameters": {
"type": "object",
"properties": {
"item": {
"type": "string",
"description": "Food item to order",
"enum": ["pizza", "burger", "salad", "pasta"]
},
"quantity": {
"type": "integer",
"description": "Number of items",
"minimum": 1,
"maximum": 100
},
"address": {
"type": "string",
"description": "Full delivery address",
"pattern": "^.+,.+,.+,\\s*\\d{5}$"
}
},
"required": ["item", "quantity", "address"]
},
"returns": {
"type": "object",
"properties": {
"order_id": {"type": "string"},
"status": {"type": "string"},
"estimated_time": {"type": "string"}
}
}
}
Why Schemas Are Critical:
- Type Safety: Catch type errors before execution
- Documentation: The schema IS the documentation
- Validation: Programmatically verify arguments
- Versioning: Track breaking changes to interfaces
The ReAct Pattern (Reasoning + Acting)
ReAct is the fundamental loop for tool-using agents:
LOOP until task complete OR max iterations:
1. OBSERVE: Receive user input or tool result
2. REASON: LLM thinks about what to do next
3. ACT: LLM either:
a) Calls a tool (action)
b) Responds to user (finish)
Visual Representation:
User: "Order 5 pizzas to my house"
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ OBSERVE: User query + Available โ
โ tools + Conversation โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ REASON: Model analyzes intent โ
โ - Need to create order โ
โ - Need address (don't have it) โ
โ - Can get it from get_user_info โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ACT: Call get_user_info() โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ OBSERVE: Tool result (address) โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ REASON: Now have all data โ
โ - Can create order โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ACT: Call create_order() โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ OBSERVE: Success response โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ REASON: Task complete โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ACT: Respond to user โ
โ "Order placed! ID: ord_789" โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Research Foundation:
The ReAct paper (โReAct: Synergizing Reasoning and Acting in Language Modelsโ by Yao et al., 2022) showed that interleaving reasoning traces with actions improves:
- Success rate on complex tasks (34% โ 67%)
- Interpretability (you can see the thinking)
- Error recovery (model can detect its mistakes)
Statelessness in Tool Design
Principle: Tools should be black boxes to the LLM. The model knows the interface (inputs/outputs) but not the implementation.
Good Tool Design:
{
"name": "get_weather",
"description": "Gets current weather for a city",
"parameters": {
"city": {"type": "string"}
}
}
Bad Tool Design:
{
"name": "get_weather",
"description": "Gets weather by querying the WeatherAPI v3 database at weather.com using API key stored in env.WEATHER_KEY",
"parameters": {
"city": {"type": "string"}
}
}
Why?
Exposing implementation details:
- Leaks security information (API keys, endpoints)
- Makes the model make assumptions about side effects
- Creates coupling between prompt and infrastructure
Correct Abstraction Level:
Model needs to know: WHAT the tool does
Model should NOT know: HOW it's implemented
Error Handling Patterns
Errors are inevitable. The key is making them recoverable.
Types of Errors:
| Error Type | Example | Recovery Strategy |
|---|---|---|
| Missing Argument | address: null |
Send error with hint to call get_user_info |
| Invalid Type | quantity: "five" |
Ask model to convert to integer |
| Out of Range | quantity: 1000 |
Explain max limit, ask to adjust |
| Permission Denied | User not authorized | Inform and suggest login |
| Tool Not Found | Hallucinates send_email |
List available tools, suggest closest match |
| API Failure | External service down | Retry or inform user |
Error Message Design for LLMs:
// BAD: Human-oriented message
{
"error": "Oops! Something went wrong. Please try again."
}
// GOOD: LLM-oriented message
{
"error_type": "validation_failed",
"field": "address",
"expected": "string matching pattern: street, city, state, zip",
"received": null,
"suggestion": "Call get_user_info() to retrieve the user's saved address, or ask the user for it.",
"available_tools": ["get_user_info", "update_user_address"]
}
The LLM can parse structured errors and self-correct!
2.2 Why This Matters
Production Relevance
Problem: Unreliable AI actions cause real damage
Without proper tool calling infrastructure:
# User: "Cancel my order"
# Model outputs: "I've cancelled order #789"
# But did it actually execute?
# Which order? User has 5 active orders!
# No audit trail, no confirmation, no safety check
Solution: Type-safe, validated, audited tool execution
# User: "Cancel my order"
# Router: Detects ambiguity, asks for clarification
# User: "The pizza order from today"
# Router: Resolves to order_id="ord_789"
# Router: Validates user has permission
# Router: Calls cancel_order(order_id="ord_789")
# Router: Logs decision
# Router: Confirms to user: "Order ord_789 cancelled"
Real-World Consequences:
- Shopify: AI assistant that can modify inventory must validate EVERY parameter
- Stripe: Payment API calls must be idempotent and logged
- Notion: AI that edits documents must have undo capability
- GitHub Copilot: Code execution must be sandboxed
One bug in tool calling can:
- Charge customers incorrectly
- Delete production data
- Violate privacy regulations
- Cause financial losses
Industry Applications
1. Customer Support Agents (Intercom, Zendesk)
const supportTools = [
{
name: "lookup_order",
description: "Retrieve order details by ID",
parameters: {
order_id: { type: "string" }
}
},
{
name: "issue_refund",
description: "Issue refund for an order",
parameters: {
order_id: { type: "string" },
amount: { type: "number" },
reason: { type: "string" }
},
requires_confirmation: true // Safety!
}
];
// User: "I want a refund for order #123"
// Agent: Calls lookup_order("123")
// Agent: Sees order total is $50
// Agent: Calls issue_refund("123", 50, "customer request")
// System: Asks for confirmation before executing
2. Code Execution Agents (ChatGPT Code Interpreter)
OpenAIโs Code Interpreter uses tool calling to execute Python:
tools = [
{
"name": "execute_python",
"description": "Run Python code in sandboxed environment",
"parameters": {
"code": {"type": "string"}
}
}
]
# User: "Plot a graph of y=x^2"
# Model: Calls execute_python("import matplotlib...")
# System: Runs in sandbox, returns image
# Model: Shows user the graph
3. Database Query Agents (Text2SQL)
Translating natural language to database queries:
tools = [
{
"name": "query_database",
"description": "Execute SELECT query on customer database",
"parameters": {
"sql": {"type": "string", "pattern": "^SELECT .+"}, # Only SELECT
"limit": {"type": "integer", "maximum": 1000}
}
}
]
# User: "Show me customers who signed up last month"
# Model: Calls query_database(sql="SELECT * FROM customers WHERE created_at > ...", limit=100)
# System: Validates (only SELECT, no DROP/DELETE), executes, returns results
2.3 Common Misconceptions
| Misconception | Reality |
|---|---|
| โLLMs always output valid JSONโ | Models hallucinate fields, types, and entire tools |
| โFunction calling is deterministicโ | Models make probabilistic decisions; validation is essential |
| โError messages slow things downโ | Rich errors enable self-correction, reducing retries |
| โYou need to show all tools to the modelโ | Showing 100 tools degrades performance; filter to top 5-10 |
| โTool calling is just parsing JSONโ | Itโs about bridging fuzzy intent to strict interfaces |
3. Project Specification
3.1 What You Will Build
A tool routing system that:
- Registers tools with JSON Schema definitions
- Routes natural language to appropriate tool calls
- Validates arguments before execution
- Executes tools safely with error handling
- Manages multi-step planning through ReAct loops
- Handles disambiguation when intent is unclear
- Prevents infinite loops with circuit breakers
- Logs all actions for debugging and auditing
- Supports confirmations for dangerous operations
- Provides developer-friendly errors and metrics
Core Question This System Answers:
โHow do I give an LLM โhandsโ while ensuring it doesnโt break my API?โ
3.2 Functional Requirements
FR1: Tool Registry Management
- Load tool definitions from JSON files
- Validate tool schemas at registration time
- Support tool versioning (v1, v2)
- Handle deprecated tools with warnings
FR2: Intent Analysis & Routing
- Parse user queries to determine intent
- Handle ambiguous queries (multiple matching tools)
- Generate clarifying questions when needed
- Support fuzzy matching for hallucinated tool names
FR3: Argument Validation
Validate all tool arguments:
- Type checking: string, integer, boolean, arrays, objects
- Constraint checking: min, max, pattern, enum
- Required field verification
- Custom validation rules (e.g., valid email, future date)
FR4: Tool Execution Engine
- Execute tools with timeout protection
- Handle synchronous and asynchronous operations
- Capture return values and errors
- Support retry logic for transient failures
FR5: Error Recovery Loop
- Feed errors back to model as structured messages
- Track error count per tool call
- Detect error loops (same error repeated 3+ times)
- Escalate to human when recovery fails
FR6: Multi-Step Planning (ReAct)
- Maintain conversation context across tool calls
- Pass tool results to model for next decision
- Limit maximum iterations (prevent runaway agents)
- Support parallel tool calls when possible
FR7: Safety & Confirmations
- Flag destructive actions (delete, payment)
- Require explicit user confirmation
- Implement rate limiting per tool
- Block out-of-scope tools based on user permissions
FR8: Audit & Logging
- Log every tool call with timestamp
- Record full arguments and results
- Track token usage per session
- Generate session summaries
3.3 Non-Functional Requirements
| Requirement | Target | Rationale |
|---|---|---|
| Reliability | 99.9% successful tool execution | Production-grade agents must be trustworthy |
| Latency | <500ms per tool call (excluding API) | User-facing interactions need responsiveness |
| Safety | 0 unauthorized actions | Security is non-negotiable |
| Debuggability | Full audit trail for every session | Must be able to replay and debug failures |
| Extensibility | Add new tools without code changes | Business logic evolves rapidly |
| Token Efficiency | <2000 tokens per multi-step task | Control costs while maintaining quality |
3.4 Example Usage
Running the router:
$ node router.js --tools ./tools.json --query "Order 5 pizzas to my house"
[Router Init] Loading tool definitions...
[Router Init] Registered 8 tools:
- create_order (args: item, quantity, address)
- cancel_order (args: order_id)
- get_order_status (args: order_id)
- update_user_address (args: address)
- search_menu (args: category, dietary_restrictions)
- get_user_info (args: user_id)
- send_notification (args: user_id, message)
- refund_order (args: order_id, reason)
[Intent Analysis]
User query: "Order 5 pizzas to my house"
Analyzing intent...
[LLM Tool Selection]
Model chose tool: create_order
Raw arguments from model:
{
"item": "pizza",
"quantity": 5,
"address": null
}
[Validation Phase 1: Schema Check]
โ Tool 'create_order' exists
โ Argument 'item' is string (valid)
โ Argument 'quantity' is integer (valid)
โ Argument 'address' is null (REQUIRED field missing)
[Error Recovery Loop - Attempt 1]
Sending error back to model:
"The 'address' field is required for create_order. You need to either:
1. Ask the user for their address, OR
2. Call get_user_info() to retrieve the saved address."
[LLM Recovery Response]
Model chose new tool: get_user_info
Arguments: { "user_id": "current_user" }
[Validation Phase 2]
โ All arguments valid
Executing get_user_info(user_id='current_user')...
[Tool Execution]
API Response: {
"user_id": "u_12345",
"name": "John Doe",
"address": "123 Main St, Apt 4B, Springfield, IL 62701",
"phone": "+1-555-0123"
}
[LLM Re-planning]
Model received tool result. Re-attempting original goal...
Model chose tool: create_order
Arguments:
{
"item": "pizza",
"quantity": 5,
"address": "123 Main St, Apt 4B, Springfield, IL 62701"
}
[Validation Phase 3]
โ All arguments valid
โ Address format validated (regex: street, city, state, zip)
โ Quantity within bounds (1-100)
Executing create_order()...
[Tool Execution]
POST /api/orders
Request body: { "item": "pizza", "quantity": 5, "address": "..." }
API Response: {
"order_id": "ord_789",
"status": "pending",
"estimated_time": "35 minutes",
"total": 87.45
}
[Final Response Generation]
Sending tool result back to model for natural language response...
[Assistant Response]
"I've placed your order for 5 pizzas to be delivered to 123 Main St, Apt 4B, Springfield, IL. Your order ID is ord_789, and it should arrive in about 35 minutes. The total is $87.45."
[Audit Log]
Saved complete interaction to logs/2024-12-27_15-23-11.json
{
"session_id": "sess_abc123",
"user_query": "Order 5 pizzas to my house",
"tools_called": [
{
"tool": "get_user_info",
"args": {"user_id": "current_user"},
"result": {...},
"timestamp": "2024-12-27T15:23:11.245Z"
},
{
"tool": "create_order",
"args": {"item": "pizza", "quantity": 5, "address": "..."},
"result": {...},
"timestamp": "2024-12-27T15:23:12.891Z"
}
],
"errors_encountered": 1,
"recovery_successful": true,
"total_api_calls": 3,
"total_tokens": 1247
}
Handling ambiguous requests:
$ node router.js --query "I want to return my order"
[Intent Analysis]
Ambiguous intent detected. Multiple tools match:
- cancel_order (similarity: 0.82)
- refund_order (similarity: 0.79)
[Disambiguation Required]
Model generating clarifying question...
[Assistant]
"I can help you with that. Do you want to:
1. Cancel the order (before it's delivered), or
2. Request a refund (after you received it)?"
[User Response]
"I already got it but the pizza was cold"
[Resolved Intent]
Model chose tool: refund_order
Arguments: { "order_id": null, "reason": "pizza was cold" }
[Validation Error]
โ Missing required field: order_id
[Clarification Loop]
Model: "I need your order ID to process the refund. You can find it in your order confirmation email, or I can look up your recent orders. Would you like me to check your recent orders?"
4. Solution Architecture
4.1 High-Level Design
โโโโโโโโโโโโโโโโโโโโ
โ User Input โ
โโโโโโโโโโฌโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Intent Analyzer โ
โ - Parse query โ
โ - Detect ambiguity โ
โ - Generate clarifications โ
โโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Tool Selector โ
โ - Filter relevant tools โ
โ - Match intent to tool โ
โ - Handle hallucinations โ
โโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Argument Extractor (LLM) โ
โ - Call LLM with tool definitions โ
โ - Receive structured tool call โ
โโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Validator โ
โ - Schema validation (types) โ
โ - Constraint validation (ranges) โ
โ - Custom business rules โ
โโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
Valid? โโโโNoโโโโโถ Error Handler โโโโ
โ โ
Yes โ
โ โ
โผ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ Execution Engine โ โ
โ - Call actual function โ โ
โ - Timeout protection โ โ
โ - Capture result/error โ โ
โโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โผ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ Result Processor โ โ
โ - Format result for LLM โ โ
โ - Update conversation context โ โ
โโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โผ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ ReAct Controller โ โ
โ - Check if task complete โโโโ
โ - Loop for next action โ
โ - Prevent infinite loops โ
โโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Response Generator โ
โ - LLM formats natural language โ
โ - Return to user โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
4.2 Key Components
| Component | Responsibility | Implementation Strategy |
|---|---|---|
| ToolRegistry | Store and version tool definitions | JSON files + in-memory map |
| IntentAnalyzer | Detect what user wants to do | Embedding similarity + keyword matching |
| ToolSelector | Choose which tool(s) apply | Filter by intent, rank by relevance |
| ArgumentExtractor | Get LLM to output structured call | Native function calling API |
| SchemaValidator | Validate against JSON Schema | Use library (Ajv, Zod, Pydantic) |
| ExecutionEngine | Run the actual function | Strategy pattern per tool type |
| ErrorHandler | Format errors for LLM | Structured error templates |
| ReActController | Manage agent loop | State machine with max iterations |
| AuditLogger | Record all actions | JSON logs + optional DB |
| ConfirmationGate | Require human approval | Flag tools as dangerous |
4.3 Data Structures
interface ToolDefinition {
name: string;
description: string;
parameters: JSONSchema;
returns?: JSONSchema;
requires_confirmation?: boolean;
dangerous?: boolean;
rate_limit?: {
max_calls: number;
window_seconds: number;
};
version?: string;
deprecated?: boolean;
metadata?: Record<string, any>;
}
interface ToolCall {
id: string; // Unique ID for this call
tool: string;
arguments: Record<string, any>;
timestamp: Date;
}
interface ToolResult {
call_id: string;
success: boolean;
result?: any;
error?: StructuredError;
execution_time_ms: number;
timestamp: Date;
}
interface StructuredError {
error_type: "validation_failed" | "tool_not_found" | "permission_denied" | "execution_failed";
message: string;
field?: string;
expected?: any;
received?: any;
suggestion?: string;
available_tools?: string[];
}
interface AgentState {
session_id: string;
user_query: string;
conversation_history: Message[];
tools_called: ToolCall[];
current_iteration: number;
max_iterations: number;
error_count: number;
status: "running" | "waiting_for_user" | "completed" | "failed";
}
interface Message {
role: "system" | "user" | "assistant" | "tool";
content: string;
tool_call_id?: string;
tool_calls?: ToolCall[];
}
interface SessionLog {
session_id: string;
start_time: Date;
end_time: Date;
user_query: string;
tools_called: ToolCall[];
results: ToolResult[];
final_response: string;
total_tokens: number;
total_cost: number;
success: boolean;
}
4.4 Algorithm Overview
Main ReAct Loop Algorithm
async function runAgentLoop(
userQuery: string,
availableTools: ToolDefinition[],
config: AgentConfig
): Promise<AgentResult> {
const state: AgentState = {
session_id: generateId(),
user_query: userQuery,
conversation_history: [],
tools_called: [],
current_iteration: 0,
max_iterations: config.max_iterations || 10,
error_count: 0,
status: "running"
};
// Add system message with tool definitions
state.conversation_history.push({
role: "system",
content: buildSystemPrompt(availableTools)
});
// Add user query
state.conversation_history.push({
role: "user",
content: userQuery
});
// Main loop
while (state.current_iteration < state.max_iterations) {
state.current_iteration++;
// OBSERVE & REASON: Call LLM
const response = await callLLM(
state.conversation_history,
availableTools
);
// Check if model wants to finish
if (response.finish_reason === "stop") {
state.status = "completed";
return {
success: true,
response: response.content,
session_log: buildSessionLog(state)
};
}
// ACT: Model wants to call tool(s)
if (response.tool_calls && response.tool_calls.length > 0) {
for (const toolCall of response.tool_calls) {
// Validate tool call
const validationResult = validateToolCall(toolCall, availableTools);
if (!validationResult.valid) {
// Send error back to model
state.conversation_history.push({
role: "tool",
tool_call_id: toolCall.id,
content: JSON.stringify(validationResult.error)
});
state.error_count++;
// Check for error loop
if (state.error_count > 3) {
state.status = "failed";
return {
success: false,
error: "Agent stuck in error loop",
session_log: buildSessionLog(state)
};
}
continue; // Let model try again
}
// Execute tool
try {
const result = await executeTool(
toolCall.tool,
toolCall.arguments,
config.timeout_ms
);
// Record successful execution
state.tools_called.push(toolCall);
// Add result to conversation
state.conversation_history.push({
role: "tool",
tool_call_id: toolCall.id,
content: JSON.stringify(result)
});
// Reset error count on success
state.error_count = 0;
} catch (error) {
// Handle execution errors
const structuredError = formatExecutionError(error);
state.conversation_history.push({
role: "tool",
tool_call_id: toolCall.id,
content: JSON.stringify(structuredError)
});
state.error_count++;
}
}
}
// Safety check: prevent runaway loops
if (state.current_iteration >= state.max_iterations) {
state.status = "failed";
return {
success: false,
error: `Max iterations (${state.max_iterations}) exceeded`,
session_log: buildSessionLog(state)
};
}
}
// Should not reach here
throw new Error("Unexpected loop exit");
}
Tool Validation Algorithm
function validateToolCall(
toolCall: ToolCall,
availableTools: ToolDefinition[]
): ValidationResult {
// Step 1: Check if tool exists
const toolDef = availableTools.find(t => t.name === toolCall.tool);
if (!toolDef) {
return {
valid: false,
error: {
error_type: "tool_not_found",
message: `Tool '${toolCall.tool}' does not exist`,
available_tools: availableTools.map(t => t.name),
suggestion: findClosestToolName(toolCall.tool, availableTools)
}
};
}
// Step 2: Validate against JSON Schema
const schemaValidator = new JSONSchemaValidator(toolDef.parameters);
const schemaResult = schemaValidator.validate(toolCall.arguments);
if (!schemaResult.valid) {
return {
valid: false,
error: {
error_type: "validation_failed",
message: "Arguments do not match schema",
field: schemaResult.errors[0].field,
expected: schemaResult.errors[0].expected,
received: schemaResult.errors[0].received,
suggestion: generateFixSuggestion(schemaResult.errors[0])
}
};
}
// Step 3: Custom business rule validation
const customValidation = runCustomValidators(toolCall, toolDef);
if (!customValidation.valid) {
return {
valid: false,
error: customValidation.error
};
}
// All checks passed
return { valid: true };
}
Intent Disambiguation Algorithm
async function disambiguateIntent(
userQuery: string,
candidateTools: ToolDefinition[]
): Promise<ToolDefinition | null> {
if (candidateTools.length === 1) {
return candidateTools[0];
}
if (candidateTools.length === 0) {
return null;
}
// Multiple candidates - ask for clarification
const options = candidateTools.map((tool, i) => ({
number: i + 1,
tool: tool.name,
description: tool.description
}));
const clarificationPrompt = `
Your query could match multiple actions:
${options.map(opt => `${opt.number}. ${opt.description}`).join('\n')}
Which one did you mean?
`;
// Send to user, wait for response
// (This is simplified - real implementation would handle async user input)
const userChoice = await askUser(clarificationPrompt);
// Parse user choice (number or description match)
const selectedIndex = parseInt(userChoice) - 1;
if (selectedIndex >= 0 && selectedIndex < candidateTools.length) {
return candidateTools[selectedIndex];
}
// Try semantic matching on user's clarification
return selectBestMatch(userChoice, candidateTools);
}
4.5 Tool Definition Examples
Simple Tool:
{
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"default": "celsius"
}
},
"required": ["city"]
},
"returns": {
"type": "object",
"properties": {
"temperature": {"type": "number"},
"condition": {"type": "string"},
"humidity": {"type": "number"}
}
}
}
Complex Tool with Validation:
{
"name": "transfer_funds",
"description": "Transfer money between accounts",
"parameters": {
"type": "object",
"properties": {
"from_account": {
"type": "string",
"pattern": "^ACC[0-9]{8}$",
"description": "Source account ID (format: ACC12345678)"
},
"to_account": {
"type": "string",
"pattern": "^ACC[0-9]{8}$",
"description": "Destination account ID"
},
"amount": {
"type": "number",
"minimum": 0.01,
"maximum": 10000,
"description": "Amount to transfer (max $10,000 per transaction)"
},
"memo": {
"type": "string",
"maxLength": 100
}
},
"required": ["from_account", "to_account", "amount"]
},
"requires_confirmation": true,
"dangerous": true,
"rate_limit": {
"max_calls": 5,
"window_seconds": 3600
}
}
5. Implementation Guide
Phase 1: Foundation (Days 1-3)
Step 1: Set Up Tool Registry
Create tool definition file (tools/tools.json):
{
"version": "1.0.0",
"tools": [
{
"name": "get_time",
"description": "Get current time",
"parameters": {
"type": "object",
"properties": {},
"required": []
}
},
{
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}
]
}
Checkpoint 1.1: Can you load and parse this JSON into TypeScript interfaces?
Step 2: Implement Mock Tool Execution
Donโt connect to real APIs yet:
const mockTools = {
get_time: () => ({ time: new Date().toISOString() }),
get_weather: (args: { city: string }) => ({
city: args.city,
temperature: 72,
condition: "sunny"
})
};
function executeTool(toolName: string, args: any): any {
const fn = mockTools[toolName];
if (!fn) {
throw new Error(`Tool ${toolName} not found`);
}
return fn(args);
}
Checkpoint 1.2: Can you call executeTool("get_weather", {city: "NYC"}) and get a result?
Step 3: Integrate Native Function Calling API
For OpenAI:
import OpenAI from "openai";
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
async function callLLMWithTools(
messages: Message[],
tools: ToolDefinition[]
): Promise<any> {
const response = await client.chat.completions.create({
model: "gpt-4",
messages: messages,
tools: tools.map(tool => ({
type: "function",
function: {
name: tool.name,
description: tool.description,
parameters: tool.parameters
}
}))
});
return response.choices[0].message;
}
Checkpoint 1.3: Send a message asking for the weather and verify the model outputs a tool call.
Phase 2: Validation & Error Handling (Days 4-6)
Step 4: Implement Schema Validation
Using Zod (TypeScript):
import { z } from "zod";
function validateWithZod(args: any, schema: any): ValidationResult {
try {
const zodSchema = jsonSchemaToZod(schema); // Convert JSON Schema to Zod
zodSchema.parse(args);
return { valid: true };
} catch (error) {
if (error instanceof z.ZodError) {
return {
valid: false,
error: {
error_type: "validation_failed",
message: error.errors[0].message,
field: error.errors[0].path.join('.'),
expected: error.errors[0].expected,
received: error.errors[0].received
}
};
}
throw error;
}
}
Checkpoint 2.1: Test with invalid arguments. Does validation catch type errors?
Step 5: Build Error Recovery Loop
async function runWithErrorRecovery(
userQuery: string,
tools: ToolDefinition[]
): Promise<string> {
const messages: Message[] = [
{ role: "user", content: userQuery }
];
for (let attempt = 0; attempt < 3; attempt++) {
const response = await callLLMWithTools(messages, tools);
if (response.tool_calls) {
const toolCall = response.tool_calls[0];
// Validate
const validation = validateToolCall(toolCall, tools);
if (!validation.valid) {
// Send error back to model
messages.push({
role: "assistant",
content: "",
tool_calls: [toolCall]
});
messages.push({
role: "tool",
tool_call_id: toolCall.id,
content: JSON.stringify(validation.error)
});
console.log(`Validation failed (attempt ${attempt + 1}): ${validation.error.message}`);
continue; // Let model try again
}
// Execute
const result = executeTool(toolCall.function.name, JSON.parse(toolCall.function.arguments));
return `Tool executed successfully: ${JSON.stringify(result)}`;
}
// No tool call, return text response
return response.content;
}
throw new Error("Failed after 3 recovery attempts");
}
Checkpoint 2.2: Test with a query that requires error recovery (e.g., missing required field).
Phase 3: ReAct Loop & Multi-Step (Days 7-10)
Step 6: Implement Full ReAct Loop
async function runReActLoop(
userQuery: string,
tools: ToolDefinition[],
maxIterations: number = 10
): Promise<SessionLog> {
const sessionId = generateId();
const messages: Message[] = [
{
role: "system",
content: "You are a helpful assistant with access to tools. Use them to complete tasks."
},
{
role: "user",
content: userQuery
}
];
const toolsCalled: ToolCall[] = [];
let iteration = 0;
while (iteration < maxIterations) {
iteration++;
console.log(`\n[Iteration ${iteration}]`);
// Call LLM
const response = await callLLMWithTools(messages, tools);
// Check if done
if (response.finish_reason === "stop" && !response.tool_calls) {
console.log("[Agent] Task complete");
return {
session_id: sessionId,
success: true,
final_response: response.content,
tools_called: toolsCalled,
iterations: iteration
};
}
// Process tool calls
if (response.tool_calls) {
// Add assistant message
messages.push({
role: "assistant",
content: response.content || "",
tool_calls: response.tool_calls
});
for (const toolCall of response.tool_calls) {
console.log(`[Agent] Calling tool: ${toolCall.function.name}`);
console.log(`[Agent] Arguments: ${toolCall.function.arguments}`);
// Validate
const args = JSON.parse(toolCall.function.arguments);
const validation = validateToolCall(
{ ...toolCall, arguments: args },
tools
);
if (!validation.valid) {
console.log(`[Validation] Failed: ${validation.error.message}`);
messages.push({
role: "tool",
tool_call_id: toolCall.id,
content: JSON.stringify(validation.error)
});
continue;
}
// Execute
try {
const result = executeTool(toolCall.function.name, args);
console.log(`[Execution] Success: ${JSON.stringify(result)}`);
toolsCalled.push({
id: toolCall.id,
tool: toolCall.function.name,
arguments: args,
timestamp: new Date()
});
messages.push({
role: "tool",
tool_call_id: toolCall.id,
content: JSON.stringify(result)
});
} catch (error) {
console.log(`[Execution] Error: ${error.message}`);
messages.push({
role: "tool",
tool_call_id: toolCall.id,
content: JSON.stringify({
error_type: "execution_failed",
message: error.message
})
});
}
}
}
}
// Max iterations reached
return {
session_id: sessionId,
success: false,
error: "Max iterations exceeded",
tools_called: toolsCalled,
iterations: iteration
};
}
Checkpoint 3.1: Test with a multi-step task (e.g., โGet weather for NYC and tell me what time it isโ).
Step 7: Add Audit Logging
function saveSessionLog(log: SessionLog): void {
const filename = `logs/${log.session_id}_${Date.now()}.json`;
const detailedLog = {
...log,
metadata: {
timestamp: new Date().toISOString(),
environment: process.env.NODE_ENV,
model: "gpt-4"
}
};
fs.writeFileSync(filename, JSON.stringify(detailedLog, null, 2));
console.log(`[Audit] Saved session log: ${filename}`);
}
Checkpoint 3.2: Verify that each session creates a detailed log file.
Phase 4: Production Features (Days 11-14)
Step 8: Implement Confirmation Gates
async function executeWithConfirmation(
toolCall: ToolCall,
toolDef: ToolDefinition
): Promise<any> {
if (toolDef.requires_confirmation) {
console.log(`\nโ ๏ธ CONFIRMATION REQUIRED โ ๏ธ`);
console.log(`Tool: ${toolCall.tool}`);
console.log(`Arguments: ${JSON.stringify(toolCall.arguments, null, 2)}`);
console.log(`\nThis action is potentially destructive.`);
const confirmed = await promptUser("Do you want to proceed? (yes/no): ");
if (confirmed.toLowerCase() !== "yes") {
throw new Error("User cancelled operation");
}
}
return executeTool(toolCall.tool, toolCall.arguments);
}
Step 9: Add Rate Limiting
class RateLimiter {
private callCounts: Map<string, { count: number; windowStart: number }> = new Map();
checkLimit(toolName: string, limit: { max_calls: number; window_seconds: number }): boolean {
const now = Date.now();
const key = toolName;
const entry = this.callCounts.get(key);
if (!entry) {
this.callCounts.set(key, { count: 1, windowStart: now });
return true;
}
const windowElapsed = (now - entry.windowStart) / 1000;
if (windowElapsed >= limit.window_seconds) {
// Reset window
this.callCounts.set(key, { count: 1, windowStart: now });
return true;
}
if (entry.count >= limit.max_calls) {
return false; // Rate limit exceeded
}
entry.count++;
return true;
}
}
Step 10: Build CLI Interface
import { Command } from "commander";
const program = new Command();
program
.name("router")
.description("AI Agent Tool Router")
.version("1.0.0");
program
.command("run")
.description("Run the agent")
.option("-q, --query <query>", "User query")
.option("-t, --tools <file>", "Tool definitions file", "./tools.json")
.option("-m, --max-iter <number>", "Max iterations", "10")
.action(async (options) => {
const tools = loadTools(options.tools);
const result = await runReActLoop(
options.query,
tools,
parseInt(options.maxIter)
);
console.log("\n" + "=".repeat(60));
console.log("FINAL RESULT");
console.log("=".repeat(60));
console.log(result.final_response);
console.log(`\nTools called: ${result.tools_called.length}`);
console.log(`Iterations: ${result.iterations}`);
});
program.parse();
6. Testing Strategy
6.1 Unit Tests
describe("Tool Validation", () => {
test("should accept valid arguments", () => {
const toolDef = {
name: "test_tool",
parameters: {
type: "object",
properties: {
name: { type: "string" }
},
required: ["name"]
}
};
const result = validateToolCall(
{ tool: "test_tool", arguments: { name: "Alice" } },
[toolDef]
);
expect(result.valid).toBe(true);
});
test("should reject missing required field", () => {
const result = validateToolCall(
{ tool: "test_tool", arguments: {} },
[toolDef]
);
expect(result.valid).toBe(false);
expect(result.error.error_type).toBe("validation_failed");
});
test("should reject invalid type", () => {
const result = validateToolCall(
{ tool: "test_tool", arguments: { name: 123 } },
[toolDef]
);
expect(result.valid).toBe(false);
});
});
describe("Error Recovery", () => {
test("should retry after validation error", async () => {
const mockLLM = jest.fn()
.mockResolvedValueOnce({
// First call: invalid arguments
tool_calls: [{
id: "1",
function: { name: "get_weather", arguments: "{}" }
}]
})
.mockResolvedValueOnce({
// Second call: valid arguments
tool_calls: [{
id: "2",
function: { name: "get_weather", arguments: '{"city": "NYC"}' }
}]
});
const result = await runWithRecovery(mockLLM, tools);
expect(mockLLM).toHaveBeenCalledTimes(2);
expect(result.success).toBe(true);
});
});
6.2 Integration Tests
describe("Multi-Step Tasks", () => {
test("should complete task requiring 2 tools", async () => {
const query = "Get weather for NYC and tell me the time";
const result = await runReActLoop(query, tools, 10);
expect(result.success).toBe(true);
expect(result.tools_called.length).toBe(2);
expect(result.tools_called.map(t => t.tool)).toContain("get_weather");
expect(result.tools_called.map(t => t.tool)).toContain("get_time");
});
test("should handle disambiguation", async () => {
// This would require mocking user input
const query = "I want to return my order";
// Test that system asks for clarification
// Then processes user's clarification
// Then executes correct tool
});
});
6.3 Performance Tests
describe("Performance", () => {
test("should complete simple task in <5 seconds", async () => {
const start = Date.now();
await runReActLoop("What's the weather?", tools, 10);
const elapsed = Date.now() - start;
expect(elapsed).toBeLessThan(5000);
});
test("should handle rate limiting", () => {
const limiter = new RateLimiter();
const limit = { max_calls: 3, window_seconds: 60 };
expect(limiter.checkLimit("test_tool", limit)).toBe(true);
expect(limiter.checkLimit("test_tool", limit)).toBe(true);
expect(limiter.checkLimit("test_tool", limit)).toBe(true);
expect(limiter.checkLimit("test_tool", limit)).toBe(false); // 4th call blocked
});
});
7. Common Pitfalls & Debugging
7.1 The Hallucinated Tool Problem
Symptom: Model tries to call tools that donโt exist
// Model output
{
"tool": "send_email", // This tool doesn't exist!
"arguments": {...}
}
Solution: Fuzzy matching with suggestions
function findClosestToolName(
hallucinated: string,
availableTools: ToolDefinition[]
): string | null {
const scores = availableTools.map(tool => ({
name: tool.name,
score: levenshteinDistance(hallucinated, tool.name)
}));
scores.sort((a, b) => a.score - b.score);
// If closest match is "close enough"
if (scores[0].score <= 3) {
return scores[0].name;
}
return null;
}
// In error response
{
"error_type": "tool_not_found",
"message": `Tool '${hallucinated}' not found. Did you mean '${closest}'?`,
"available_tools": ["get_weather", "get_time", ...]
}
7.2 The Infinite Loop Problem
Symptom: Agent keeps calling the same failing tool
// Loop detected
[Iteration 1] Call: create_order โ Error: Missing address
[Iteration 2] Call: create_order โ Error: Missing address
[Iteration 3] Call: create_order โ Error: Missing address
...
Solution: Loop detection
class LoopDetector {
private history: string[] = [];
detect(toolCall: ToolCall): boolean {
const signature = `${toolCall.tool}:${JSON.stringify(toolCall.arguments)}`;
// Check last 3 calls
const recent = this.history.slice(-3);
const repeats = recent.filter(sig => sig === signature).length;
this.history.push(signature);
// If same call 3 times in a row, it's a loop
return repeats >= 2;
}
}
7.3 The Type Confusion Problem
Symptom: Model outputs string instead of integer
// Model output
{
"quantity": "five" // Should be 5
}
Solution: Smart type coercion + clear error messages
function coerceTypes(args: any, schema: JSONSchema): any {
const coerced = { ...args };
for (const [key, propSchema] of Object.entries(schema.properties)) {
if (propSchema.type === "integer" && typeof coerced[key] === "string") {
// Try to parse
const parsed = parseInt(coerced[key]);
if (!isNaN(parsed)) {
coerced[key] = parsed;
console.warn(`Coerced ${key} from string to integer`);
} else {
throw new ValidationError(
`Cannot convert "${coerced[key]}" to integer. Please provide a numeric value.`
);
}
}
}
return coerced;
}
7.4 The Context Window Explosion
Symptom: After 10 tool calls, context is too large
Solution: Summarize old tool results
function compressConversationHistory(
messages: Message[],
maxTokens: number
): Message[] {
let tokenCount = estimateTokens(messages);
if (tokenCount <= maxTokens) {
return messages;
}
// Keep system message and recent messages
const system = messages[0];
const recent = messages.slice(-10);
// Summarize middle messages
const middle = messages.slice(1, -10);
const summary = summarizeToolCalls(middle);
return [
system,
{ role: "system", content: `Previous actions: ${summary}` },
...recent
];
}
8. Extensions
8.1 Beginner Extensions
Extension 1: Tool Usage Analytics
Track which tools are called most frequently:
class ToolAnalytics {
private stats: Map<string, { count: number; avg_time_ms: number }> = new Map();
record(toolName: string, executionTimeMs: number) {
const entry = this.stats.get(toolName) || { count: 0, avg_time_ms: 0 };
entry.count++;
entry.avg_time_ms = (entry.avg_time_ms * (entry.count - 1) + executionTimeMs) / entry.count;
this.stats.set(toolName, entry);
}
report() {
console.log("\nTool Usage Statistics:");
for (const [tool, stats] of this.stats.entries()) {
console.log(` ${tool}: ${stats.count} calls, avg ${stats.avg_time_ms.toFixed(0)}ms`);
}
}
}
Extension 2: Tool Filtering by Intent
Donโt show all 100 tools to the model - filter to top 5:
function filterRelevantTools(
userQuery: string,
allTools: ToolDefinition[],
topK: number = 5
): ToolDefinition[] {
const queryEmbedding = generateEmbedding(userQuery);
const scored = allTools.map(tool => ({
tool,
score: cosineSimilarity(
queryEmbedding,
generateEmbedding(tool.description)
)
}));
scored.sort((a, b) => b.score - a.score);
return scored.slice(0, topK).map(s => s.tool);
}
8.2 Intermediate Extensions
Extension 3: Parallel Tool Execution
Execute independent tools in parallel:
async function executeToolsInParallel(
toolCalls: ToolCall[]
): Promise<ToolResult[]> {
// Analyze dependencies
const groups = groupByDependencies(toolCalls);
const results: ToolResult[] = [];
for (const group of groups) {
// Execute all tools in this group in parallel
const promises = group.map(tc => executeTool(tc.tool, tc.arguments));
const groupResults = await Promise.all(promises);
results.push(...groupResults);
}
return results;
}
Extension 4: Tool Versioning
Support multiple versions of the same tool:
interface VersionedTool extends ToolDefinition {
version: string;
deprecated?: boolean;
migration_guide?: string;
}
function selectToolVersion(
toolName: string,
requestedVersion: string | "latest"
): VersionedTool {
const versions = registry.getVersions(toolName);
if (requestedVersion === "latest") {
return versions.filter(v => !v.deprecated)[0];
}
return versions.find(v => v.version === requestedVersion);
}
8.3 Advanced Extensions
Extension 5: LLM-as-a-Judge for Tool Selection
Use a second LLM call to evaluate if tool selection was correct:
async function judgeToolSelection(
userQuery: string,
selectedTool: string,
allTools: ToolDefinition[]
): Promise<{ correct: boolean; reason: string }> {
const judgePrompt = `
User query: "${userQuery}"
Selected tool: ${selectedTool}
Available tools: ${allTools.map(t => `${t.name}: ${t.description}`).join('\n')}
Is this the correct tool? Explain why or why not.
`;
const response = await callLLM(judgePrompt);
// Parse response to determine if correct
return parseJudgeResponse(response);
}
Extension 6: Automatic Tool Discovery
Automatically generate tool definitions from TypeScript types:
import { z } from "zod";
// Define tool with Zod schema
const CreateOrderSchema = z.object({
item: z.enum(["pizza", "burger", "salad"]),
quantity: z.number().int().min(1).max(100),
address: z.string()
});
// Automatically generate JSON Schema
function generateToolDefinition(
name: string,
description: string,
schema: z.ZodSchema
): ToolDefinition {
return {
name,
description,
parameters: zodToJsonSchema(schema)
};
}
const createOrderTool = generateToolDefinition(
"create_order",
"Creates a food order",
CreateOrderSchema
);
Extension 7: Distributed Agent System
Multiple specialized agents working together:
class AgentOrchestrator {
private agents: Map<string, Agent> = new Map([
["weather_agent", new WeatherAgent()],
["order_agent", new OrderAgent()],
["support_agent", new SupportAgent()]
]);
async routeToAgent(userQuery: string): Promise<string> {
// Determine which agent is best suited
const agentName = await selectBestAgent(userQuery, this.agents);
// Route to that agent
const agent = this.agents.get(agentName);
return await agent.handle(userQuery);
}
}
9. Real-World Connections
9.1 Production Case Studies
1. Shopify Sidekick (E-commerce Assistant)
Shopifyโs AI assistant uses tool calling to:
- Query product inventory
- Modify store settings
- Generate reports
- Answer merchant questions
Implementation insights:
- Every write operation requires merchant confirmation
- Tool calls are rate-limited per merchant
- Full audit log for compliance
- Separate tool sets for different permission levels
2. ChatGPT Plugins
OpenAIโs plugin system is tool calling at scale:
- 1000+ plugins (tools) available
- Dynamic tool selection based on user intent
- Sandboxed execution
- OAuth for user authentication
3. Replit GhostWriter (Code Agent)
Replitโs AI that writes and debugs code uses tools for:
- Creating files
- Running tests
- Installing packages
- Executing code
Safety measures:
- Code execution in isolated containers
- Timeout limits (30s per execution)
- Resource limits (CPU, memory)
- User approval for file modifications
9.2 Design Patterns in Production
Pattern 1: Tool Namespacing
Organize tools by domain:
const tools = {
"orders.create": createOrderTool,
"orders.cancel": cancelOrderTool,
"orders.status": getOrderStatusTool,
"users.get": getUserTool,
"users.update": updateUserTool
};
// Model can call: "orders.create" or "users.get"
Pattern 2: Capability-Based Access
Filter tools by user permissions:
function getAvailableTools(user: User): ToolDefinition[] {
const allTools = loadAllTools();
return allTools.filter(tool => {
// Check if user has capability
return user.capabilities.includes(tool.required_capability);
});
}
Pattern 3: Circuit Breaker
Prevent cascading failures:
class CircuitBreaker {
private failures = 0;
private state: "closed" | "open" | "half-open" = "closed";
async execute(fn: () => Promise<any>): Promise<any> {
if (this.state === "open") {
throw new Error("Circuit breaker is open");
}
try {
const result = await fn();
this.failures = 0;
this.state = "closed";
return result;
} catch (error) {
this.failures++;
if (this.failures >= 3) {
this.state = "open";
setTimeout(() => { this.state = "half-open"; }, 60000);
}
throw error;
}
}
}
10. Resources
10.1 Books
| Topic | Book | Chapter |
|---|---|---|
| Boundary Design | โClean Codeโ by Robert Martin | Ch. 8 (Boundaries - interfacing with external systems) |
| Interface Contracts | โThe Pragmatic Programmerโ by Hunt & Thomas | Ch. 5 (Bend or Break - Design by Contract) |
| JSON Schema | โDesigning Data-Intensive Applicationsโ by Kleppmann | Ch. 4 (Encoding & Schema Evolution) |
| Error Handling | โCode Completeโ by McConnell | Ch. 8 (Defensive Programming) |
| API Design | โREST API Design Rulebookโ by Mark Massรฉ | Ch. 2 (Identifier Design) & Ch. 6 (Request Methods) |
| State Machines | โClean Codeโ by Robert Martin | Ch. 6 (Objects and Data Structures) |
| Validation Patterns | โRefactoringโ by Martin Fowler | Ch. 11 (Simplifying Conditional Expressions) |
| Agent Architectures | โAI Engineeringโ by Chip Huyen | Ch. 6 (Agent Patterns & Tool Use) |
| Type Safety | โEffective TypeScriptโ by Dan Vanderkam | Items 1-10 (Understanding TypeScriptโs Type System) |
10.2 Papers
- โReAct: Synergizing Reasoning and Acting in Language Modelsโ (Yao et al., 2022)
- Foundation for agent loops
- https://arxiv.org/abs/2210.03629
- โToolformer: Language Models Can Teach Themselves to Use Toolsโ (Schick et al., 2023)
- Self-supervised tool learning
- https://arxiv.org/abs/2302.04761
- โGorilla: Large Language Model Connected with Massive APIsโ (Patil et al., 2023)
- API calling optimization
- https://arxiv.org/abs/2305.15334
10.3 API Documentation
- OpenAI Function Calling: https://platform.openai.com/docs/guides/function-calling
- Anthropic Tool Use: https://docs.anthropic.com/claude/docs/tool-use
- JSON Schema Specification: https://json-schema.org/
10.4 Libraries
# TypeScript
npm install zod # Schema validation
npm install ajv # JSON Schema validator
npm install commander # CLI framework
# Python
pip install pydantic # Data validation
pip install jsonschema # JSON Schema validation
pip install click # CLI framework
11. Self-Assessment Checklist
Core Understanding
- I can explain what tool calling is and why it matters
- Test: Explain to someone the difference between text responses and tool calls
- I understand JSON Schema and can write schemas by hand
- Test: Write a schema for a complex nested object with constraints
- I know the ReAct pattern
- Test: Draw the observe-reason-act loop on a whiteboard
- I can identify security risks in tool design
- Test: List 5 ways tool calling can go wrong in production
Implementation Skills
- Iโve implemented argument validation
- Evidence: Validator catches all invalid types, missing fields, constraint violations
- Iโve built an error recovery loop
- Evidence: System recovers from validation errors automatically
- Iโve implemented the ReAct loop
- Evidence: Agent completes multi-step tasks end-to-end
- Iโve added audit logging
- Evidence: Every session has a complete JSON log
- Iโve implemented safety features
- Evidence: Dangerous tools require confirmation
Production Readiness
- My system handles edge cases
- Hallucinated tools
- Type confusion
- Infinite loops
- Rate limiting
- I have comprehensive error messages
- Evidence: Errors include suggestions for how to fix
- I can debug failed sessions
- Evidence: Audit logs contain enough detail to replay
- Iโve tested performance
- Evidence: Selection + validation + execution <500ms
Growth
- I can design tool contracts for new domains
- Application: Design 5 tools for a different use case (e.g., email management)
- I understand when NOT to use tool calling
- Give examples where tool calling adds unnecessary complexity
- I can explain this to stakeholders
- Practice: 3-minute pitch on why tool calling improves reliability
12. Submission / Completion Criteria
Minimum Viable Completion
- Can register tools from JSON
- Loads tool definitions
- Validates schema format
- Can call LLM with tool definitions
- Uses native function calling API
- Receives structured tool calls
- Validates tool arguments
- Checks types and required fields
- Returns structured errors
- Executes tools
- At least 3 mock tools working
- Captures results
Proof: Screenshot showing validation error + recovery
Full Completion
All minimum criteria plus:
- Full ReAct loop
- Multi-step task completion
- Max iteration limits
- Success/failure reporting
- Error recovery
- Feeds errors back to model
- Detects infinite loops
- Handles at least 5 error types
- Audit logging
- JSON logs per session
- Complete conversation history
- Token usage tracking
- CLI interface
- Clear help text
- Multiple commands
- Professional output formatting
Proof: Public GitHub repository with README
Excellence
All full completion criteria plus any 3+:
- Parallel tool execution
- Tool versioning
- Automatic schema generation from types
- Distributed agent system
- Production deployment (API endpoint)
- Monitoring dashboard
- Integration tests with real LLM calls
Proof: Blog post, video demo, or production URL
End of Project 6: Tool Router