Project 1: Tool Caller Baseline (Non-Agent)
Build a deterministic, single-shot CLI assistant that calls tools with strict schemas, logs tool vs model failures, and produces a reproducible JSON report.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 1: Intro |
| Time Estimate | 6–10 hours (weekend) |
| Language | Python or JavaScript |
| Prerequisites | CLI basics, JSON, simple file I/O |
| Key Topics | tool schemas, validation, determinism, error boundaries |
Learning Objectives
By completing this project, you will:
- Define tool contracts with strict input/output schemas.
- Separate tool failures from model failures in logs and reports.
- Guarantee deterministic output for the same inputs.
- Build a minimal tool registry and execution pipeline.
- Produce machine-verifiable reports that downstream systems can trust.
The Core Question You’re Answering
“What can structured tool calling accomplish without any agent loop, and where does it break?”
This project establishes a baseline. Without planning, memory, or iteration, the system is predictable. That predictability is your control group for all agentic behavior later.
Concepts You Must Understand First
| Concept | Why It Matters | Where to Learn |
|---|---|---|
| JSON schema validation | Tool I/O must be verifiable | Pydantic/Zod docs |
| Deterministic execution | Debugging requires repeatability | Any testing guide |
| Error boundaries | Tool failures vs model failures | Systems design basics |
| CLI argument parsing | Reproducible inputs | argparse / yargs |
| Structured outputs | Enables strict parsing | LLM function calling guides |
Theoretical Foundation
Single-Shot Tool Calling as a Pipeline
A non-agent tool caller is a straight-line pipeline:
User Input -> Tool Call -> Tool Output -> JSON Report
There is no feedback loop. That means:
- Pros: deterministic, testable, easy to trace
- Cons: cannot adapt to errors, cannot plan, cannot recover
Error Boundaries
You must distinguish:
- Tool errors (file not found, bad input)
- Model errors (invalid JSON, missing fields)
Blending these will make debugging impossible once you scale.
Project Specification
What You’ll Build
A CLI tool that runs a fixed tool chain (e.g., parse logs -> compute stats) and outputs a strict JSON report.
Functional Requirements
- Tool registry with input/output schemas
- Validation on tool calls and tool outputs
- Deterministic execution (temperature 0, no randomness)
- JSON report with metrics and tool logs
- Distinct error codes for tool vs model failures
Non-Functional Requirements
- Reproducible outputs
- Clear audit logs
- Safe defaults (no dynamic code execution)
Real World Outcome
When you run the tool, you get deterministic, auditable output:
$ python tool_caller.py analyze --file logs/server.log
Calling tool: parse_log
Tool input: {"file_path": "logs/server.log"}
Tool output received (382 bytes)
Calling tool: summarize_stats
Tool input: {"events": 1523}
Tool output received (128 bytes)
Analysis complete.
Output file analysis_report.json:
{
"status": "success",
"input_file": "logs/server.log",
"summary": {
"total_lines": 1523,
"errors": 47,
"warnings": 132
},
"tool_calls": [
{"name": "parse_log", "duration_ms": 145, "status": "ok"},
{"name": "summarize_stats", "duration_ms": 23, "status": "ok"}
]
}
If a tool fails, the report is explicit:
{
"status": "error",
"error_type": "tool_error",
"tool": "parse_log",
"message": "File not found: logs/missing.log"
}
Architecture Overview
┌───────────────┐ validate ┌─────────────────┐
│ CLI Interface │──────────────▶│ Tool Registry │
└───────┬───────┘ └───────┬─────────┘
│ │
▼ ▼
┌───────────────┐ ┌─────────────────┐
│ Tool Executor │──────────────▶│ Tool Implement. │
└───────┬───────┘ └───────┬─────────┘
│ │
▼ ▼
┌─────────────────┐ ┌────────────────────┐
│ Report Builder │◀──────────│ Error Boundary │
└─────────────────┘ └────────────────────┘
Implementation Guide
Phase 1: Tool Registry + Schemas (2–3h)
- Define tool schemas with Pydantic/Zod
- Validate input before execution
- Checkpoint: invalid input fails fast
Phase 2: Tool Executor + Logging (2–3h)
- Execute tools in a fixed order
- Log tool inputs, outputs, timings
- Checkpoint: trace log is complete
Phase 3: Report Builder (2–4h)
- Build deterministic JSON output
- Add error classification
- Checkpoint: report validates against schema
Common Pitfalls & Debugging
| Pitfall | Symptom | Fix |
|---|---|---|
| Mixed error types | all failures look identical | enforce error_type field |
| Non-determinism | outputs differ run-to-run | fix seeds + temp=0 |
| Silent schema drift | missing fields in JSON | validate outputs strictly |
Interview Questions They’ll Ask
- Why must tool outputs be validated if tool inputs are already valid?
- How do you distinguish model errors from tool errors?
- What makes deterministic outputs critical for debugging?
Hints in Layers
- Hint 1: Start with a single tool and enforce strict input schema validation.
- Hint 2: Add a tool registry so you can test tools in isolation.
- Hint 3: Create a JSON report schema and validate it before saving.
- Hint 4: Add structured error types so failures are unambiguous.
Learning Milestones
- Baseline Working: one tool call produces valid JSON.
- Observable: logs show tool inputs/outputs clearly.
- Reliable: outputs are deterministic and validated.
Submission / Completion Criteria
Minimum Completion
- Fixed tool chain
- Schema-validated inputs/outputs
Full Completion
- JSON report with logs
- Error classification
Excellence
- Replay mode for stored runs
- Metrics export (CSV/JSONL)
This guide was generated from project_based_ideas/AI_AGENTS_LLM_RAG/AI_AGENTS_PROJECTS.md.