Project 1: Tool Caller Baseline (Non-Agent)

Build a deterministic, single-shot CLI assistant that calls tools with strict schemas, logs tool vs model failures, and produces a reproducible JSON report.


Quick Reference

Attribute Value
Difficulty Level 1: Intro
Time Estimate 6–10 hours (weekend)
Language Python or JavaScript
Prerequisites CLI basics, JSON, simple file I/O
Key Topics tool schemas, validation, determinism, error boundaries

Learning Objectives

By completing this project, you will:

  1. Define tool contracts with strict input/output schemas.
  2. Separate tool failures from model failures in logs and reports.
  3. Guarantee deterministic output for the same inputs.
  4. Build a minimal tool registry and execution pipeline.
  5. Produce machine-verifiable reports that downstream systems can trust.

The Core Question You’re Answering

“What can structured tool calling accomplish without any agent loop, and where does it break?”

This project establishes a baseline. Without planning, memory, or iteration, the system is predictable. That predictability is your control group for all agentic behavior later.


Concepts You Must Understand First

Concept Why It Matters Where to Learn
JSON schema validation Tool I/O must be verifiable Pydantic/Zod docs
Deterministic execution Debugging requires repeatability Any testing guide
Error boundaries Tool failures vs model failures Systems design basics
CLI argument parsing Reproducible inputs argparse / yargs
Structured outputs Enables strict parsing LLM function calling guides

Theoretical Foundation

Single-Shot Tool Calling as a Pipeline

A non-agent tool caller is a straight-line pipeline:

User Input -> Tool Call -> Tool Output -> JSON Report

There is no feedback loop. That means:

  • Pros: deterministic, testable, easy to trace
  • Cons: cannot adapt to errors, cannot plan, cannot recover

Error Boundaries

You must distinguish:

  • Tool errors (file not found, bad input)
  • Model errors (invalid JSON, missing fields)

Blending these will make debugging impossible once you scale.


Project Specification

What You’ll Build

A CLI tool that runs a fixed tool chain (e.g., parse logs -> compute stats) and outputs a strict JSON report.

Functional Requirements

  1. Tool registry with input/output schemas
  2. Validation on tool calls and tool outputs
  3. Deterministic execution (temperature 0, no randomness)
  4. JSON report with metrics and tool logs
  5. Distinct error codes for tool vs model failures

Non-Functional Requirements

  • Reproducible outputs
  • Clear audit logs
  • Safe defaults (no dynamic code execution)

Real World Outcome

When you run the tool, you get deterministic, auditable output:

$ python tool_caller.py analyze --file logs/server.log

Calling tool: parse_log
Tool input: {"file_path": "logs/server.log"}
Tool output received (382 bytes)

Calling tool: summarize_stats
Tool input: {"events": 1523}
Tool output received (128 bytes)

Analysis complete.

Output file analysis_report.json:

{
  "status": "success",
  "input_file": "logs/server.log",
  "summary": {
    "total_lines": 1523,
    "errors": 47,
    "warnings": 132
  },
  "tool_calls": [
    {"name": "parse_log", "duration_ms": 145, "status": "ok"},
    {"name": "summarize_stats", "duration_ms": 23, "status": "ok"}
  ]
}

If a tool fails, the report is explicit:

{
  "status": "error",
  "error_type": "tool_error",
  "tool": "parse_log",
  "message": "File not found: logs/missing.log"
}

Architecture Overview

┌───────────────┐   validate   ┌─────────────────┐
│ CLI Interface │──────────────▶│ Tool Registry   │
└───────┬───────┘               └───────┬─────────┘
        │                               │
        ▼                               ▼
┌───────────────┐               ┌─────────────────┐
│ Tool Executor │──────────────▶│ Tool Implement. │
└───────┬───────┘               └───────┬─────────┘
        │                               │
        ▼                               ▼
┌─────────────────┐           ┌────────────────────┐
│ Report Builder  │◀──────────│ Error Boundary     │
└─────────────────┘           └────────────────────┘

Implementation Guide

Phase 1: Tool Registry + Schemas (2–3h)

  • Define tool schemas with Pydantic/Zod
  • Validate input before execution
  • Checkpoint: invalid input fails fast

Phase 2: Tool Executor + Logging (2–3h)

  • Execute tools in a fixed order
  • Log tool inputs, outputs, timings
  • Checkpoint: trace log is complete

Phase 3: Report Builder (2–4h)

  • Build deterministic JSON output
  • Add error classification
  • Checkpoint: report validates against schema

Common Pitfalls & Debugging

Pitfall Symptom Fix
Mixed error types all failures look identical enforce error_type field
Non-determinism outputs differ run-to-run fix seeds + temp=0
Silent schema drift missing fields in JSON validate outputs strictly

Interview Questions They’ll Ask

  1. Why must tool outputs be validated if tool inputs are already valid?
  2. How do you distinguish model errors from tool errors?
  3. What makes deterministic outputs critical for debugging?

Hints in Layers

  • Hint 1: Start with a single tool and enforce strict input schema validation.
  • Hint 2: Add a tool registry so you can test tools in isolation.
  • Hint 3: Create a JSON report schema and validate it before saving.
  • Hint 4: Add structured error types so failures are unambiguous.

Learning Milestones

  1. Baseline Working: one tool call produces valid JSON.
  2. Observable: logs show tool inputs/outputs clearly.
  3. Reliable: outputs are deterministic and validated.

Submission / Completion Criteria

Minimum Completion

  • Fixed tool chain
  • Schema-validated inputs/outputs

Full Completion

  • JSON report with logs
  • Error classification

Excellence

  • Replay mode for stored runs
  • Metrics export (CSV/JSONL)

This guide was generated from project_based_ideas/AI_AGENTS_LLM_RAG/AI_AGENTS_PROJECTS.md.