Project 6: The “Swiss Army” Personal Assistant (Tool-Use Agent)

Project 6: The “Swiss Army” Personal Assistant (Tool-Use Agent)

Build a multi-tool assistant that routes user intent to the right tool(s), chains results across tools, and synthesizes one coherent answer.

Quick Reference

Attribute Value
Difficulty Level 3: Advanced
Time Estimate 25–35 hours
Language Python (Alternatives: Rust, Go)
Prerequisites Function calling basics, JSON schema, robust error handling, basic async/concurrency
Key Topics tool routing, tool schema design, multi-step orchestration, memory pruning, reflexion/self-correction

1. Learning Objectives

By completing this project, you will:

  1. Design tool schemas/descriptions that reliably trigger correct tool selection.
  2. Implement a routing strategy that blends LLM selection with deterministic heuristics.
  3. Orchestrate multi-tool plans where outputs feed into subsequent tool calls.
  4. Add safety policies for sensitive tools (approvals, least privilege, redaction).
  5. Build error recovery (retry, alternate tool, ask user) without infinite loops.
  6. Implement memory strategies (window + summarization) to control context cost.

2. Theoretical Foundation

2.1 Core Concepts

  • Tool routing: You’re building a classifier that maps “intent” → “tool(s) + args”. The LLM can do this, but it needs clear tool interfaces and guardrails.
  • Tool schemas (JSON Schema): Tool names and parameter descriptions are part of the “program” you run inside the model. Ambiguous schemas cause tool misuse.
  • ReAct loops: Multi-step tasks require alternating between reasoning and actions with observations.
  • Reflexion/self-correction: When a tool fails, the agent should read the error and try a corrected call or ask a targeted question.
  • Policy + least privilege: Some tools are “safe” (calculator); some are “dangerous” (smart home, payments). Your system should treat them differently.

2.2 Why This Matters

This is the core architecture of practical assistants: a central brain that can operate multiple “apps” (tools) through one natural language interface. Once you have reliable routing, you can add more capabilities without rewriting the assistant.

2.3 Common Misconceptions

  • “Just add more tools.” More tools increases confusion unless schemas and routing are disciplined.
  • “One mega-tool is simpler.” It’s simpler to implement, but harder to control and test; smaller tools are easier to validate.
  • “The model will handle errors.” It won’t unless you explicitly feed it tool errors and constrain retries.

3. Project Specification

3.1 What You Will Build

A CLI or web chat assistant that supports a registry of tools such as:

  • calculator
  • weather (mock or real API)
  • web_search (stub or real)
  • crypto_price (stub or real)
  • unit_converter
  • reminders (local store)
  • smart_home (mock API)

It must:

  • Select correct tool(s) for mixed requests (“price BTC and compute X and check weather”)
  • Execute tool calls in the right order
  • Combine intermediate results into a final response
  • Show an optional reasoning trace (for debugging and learning)

3.2 Functional Requirements

  1. Tool registry: tools are registered with name, description, schema, handler.
  2. Router: given a user message, choose tool calls (possibly multiple).
  3. Planner: when multiple tools are needed, generate a plan (sequence).
  4. Execution: execute tool calls with timeouts and per-tool error handling.
  5. Policy gates: require confirmation for sensitive tools (smart_home, reminders writes).
  6. Memory: keep only the last K turns + a running summary.
  7. Observability: log tool calls, latency, errors, and token usage per turn.

3.3 Non-Functional Requirements

  • Predictability: stable routing for common intents (math should never trigger web_search).
  • Safety: clear “read-only vs write” separation; no writes without explicit user approval.
  • Extensibility: adding a new tool should not require rewriting the agent.
  • Performance: run independent tool calls concurrently where safe.

3.4 Example Usage / Output

python swiss_army_assistant.py

Example session:

User: Search the price of BTC, then tell me how many I can buy with $100k, and is it raining in London?

Tool: crypto_price(symbol="BTC") -> {"price_usd": 65000}
Tool: calculator(expression="100000 / 65000") -> {"result": 1.538}
Tool: weather(city="London") -> {"condition": "Rain"}

Assistant: BTC is ~$65,000, so $100k buys ~1.54 BTC. London: rain.

4. Solution Architecture

4.1 High-Level Design

┌──────────────┐   msg   ┌──────────────────┐   tool calls   ┌───────────────┐
│ Chat UI       │───────▶│ Agent Loop       │──────────────▶│ Tool Registry  │
└──────────────┘         │ (plan+route)     │◀──────────────│ + Executors    │
                         └───────┬──────────┘   results      └───────┬───────┘
                                 │
                                 ▼
                         ┌──────────────────┐
                         │ Response Builder │
                         └──────────────────┘
                                 │
                                 ▼
                         ┌──────────────────┐
                         │ Memory + Logs     │
                         └──────────────────┘

4.2 Key Components

Component Responsibility Key Decisions
Tool registry store tool definitions keep schemas strict and minimal
Router choose which tools to call hybrid: heuristics + LLM function calling
Planner order and chain calls sequence by dependencies; allow concurrency
Policy engine approve/deny tool calls tool-level permissions + confirmation
Memory keep state manageable window + summary + per-tool artifacts

4.3 Data Structures

from dataclasses import dataclass
from typing import Any

@dataclass(frozen=True)
class ToolCall:
    tool_name: str
    arguments: dict[str, Any]

@dataclass(frozen=True)
class ToolResult:
    tool_name: str
    ok: bool
    output: dict[str, Any] | None
    error: str | None
    latency_ms: int

4.4 Algorithm Overview

Key Algorithm: routing + execution

  1. Pre-route using heuristics (math → calculator, unit conversion → converter).
  2. If ambiguous or multi-intent, ask the LLM to propose tool calls from the available schemas.
  3. Validate proposed calls (args schema, permission, confirmation).
  4. Execute tool calls (sequential for dependencies; concurrent for independent calls).
  5. If a call fails, attempt one bounded repair step; otherwise ask user.
  6. Synthesize final response from tool outputs.

Complexity Analysis:

  • Time: O(number_of_tool_calls) network-bound
  • Space: O(conversation_window + tool_artifacts)

5. Implementation Guide

5.1 Development Environment Setup

python -m venv .venv
source .venv/bin/activate
pip install pydantic python-dotenv rich

5.2 Project Structure

swiss-army-assistant/
├── src/
│   ├── cli.py
│   ├── agent.py
│   ├── memory.py
│   ├── policy.py
│   ├── tools/
│   │   ├── calculator.py
│   │   ├── weather.py
│   │   ├── web_search.py
│   │   └── ...
│   └── telemetry.py
└── data/
    └── assistant_logs.sqlite

5.3 Implementation Phases

Phase 1: Tool registry + 3 tools (6–9h)

Goals:

  • A working assistant with calculator + one API tool + one local tool.

Tasks:

  1. Implement registry, schemas, and a unified execute_tool(call) function.
  2. Add deterministic routing for obvious cases (calculator).
  3. Add a basic CLI loop.

Checkpoint: Mixed requests (“calc + weather”) correctly call multiple tools.

Phase 2: Planner + safety policies (8–12h)

Goals:

  • Reliable multi-step orchestration and safe write operations.

Tasks:

  1. Add a bounded planner loop (max steps).
  2. Add policy gates and explicit confirmations.
  3. Add audit logging and failure recovery.

Checkpoint: “Turn off lights” requires confirmation and logs the action.

Phase 3: Memory + concurrency + polish (8–14h)

Goals:

  • Make it feel “assistant-like” in longer sessions.

Tasks:

  1. Add memory pruning and summarization.
  2. Add concurrent execution for independent tool calls.
  3. Add observability: per-tool latency, per-turn cost summary.

Checkpoint: 30+ turns stays stable in cost and behavior.

5.4 Key Implementation Decisions

Decision Options Recommendation Rationale
Routing LLM-only vs hybrid hybrid determinism for obvious cases
Tool design one big tool vs many small small tools testable and controllable
Safety implicit vs explicit approvals explicit avoid accidental side effects

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit routing/policy calculator always wins for arithmetic
Tool tests tool handlers schema validation, timeout handling
Scenario end-to-end “BTC price + math + weather”

6.2 Critical Test Cases

  1. Routing correctness: “What’s 15% tip on 87.50?” must use calculator, not web_search.
  2. Safety: smart_home tool never executes without an explicit confirmation token.
  3. Recovery: tool failure produces a bounded retry or a clarifying question, not a loop.

7. Common Pitfalls & Debugging

Pitfall Symptom Solution
Vague tool descriptions wrong tool chosen tighten schema + add examples
Tool overlap model picks search for math add heuristic pre-router + policy
Infinite repair loops agent keeps retrying max retries + user escalation
Memory bloat costs explode windowed memory + summarization

Debugging strategies:

  • Log “why this tool” (router decision) and the final plan.
  • Record tool call traces for replay.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add unit_converter and timezone_converter.
  • Add “tool help” command listing available tools and examples.

8.2 Intermediate Extensions

  • Tool plug-ins loaded from a folder (dynamic discovery).
  • Add a “router eval set” (20 prompts with expected tool calls).

8.3 Advanced Extensions

  • Add multi-agent delegation (ties into Project 8).
  • Add policy + permissions per user identity and tool category.

9. Real-World Connections

9.1 Industry Applications

  • Customer support agents that route to CRM/search/ticket tools.
  • IT assistants that run scripts and inspect systems (with approvals).
  • Personal automation assistants (calendar, email, tasks).

9.3 Interview Relevance

  • Tool schema design, routing strategies, agent safety, and observability.

10. Resources

10.1 Essential Reading

  • Building AI Agents (Packt) — tool selection + ReAct loops (Ch. 2, 4)
  • The LLM Engineering Handbook (Paul Iusztin) — prompt design and evals (Ch. 3, 8)

10.3 Tools & Documentation

  • LangChain/LangGraph docs (agent + tool patterns)
  • Provider docs for function calling / tools
  • Previous: Project 5 (web research) — multi-step tool loops and evidence discipline
  • Next: Project 7 (codebase concierge) — domain-specific tools and deeper safety constraints

11. Self-Assessment Checklist

  • I can explain why tool descriptions are part of the “program”.
  • I can add a new tool without changing the core agent loop.
  • I can show a tool-call trace and debug a misroute.
  • I have a bounded retry strategy that avoids runaway loops.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • At least 5 tools (mix of local + API + mock)
  • Reliable routing for simple and mixed requests
  • Tool-call logging and safe defaults

Full Completion:

  • Safety policies with explicit confirmations for write tools
  • Memory pruning and basic recovery behaviors
  • Parallel execution for independent tools

Excellence (Going Above & Beyond):

  • Plug-in tool ecosystem + router eval suite with measurable improvements

This guide was generated from project_based_ideas/AI_PERSONAL_ASSISTANTS_MASTERY.md. For the complete sprint overview, see project_based_ideas/AI_PERSONAL_ASSISTANTS_MASTERY/README.md.