Project 6: The “Swiss Army” Personal Assistant (Tool-Use Agent)

Build a multi-tool assistant that routes user intent to the right tool(s), chains results across tools, and synthesizes one coherent answer.

Quick Reference

Attribute	Value
Difficulty	Level 3: Advanced
Time Estimate	25–35 hours
Language	Python (Alternatives: Rust, Go)
Prerequisites	Function calling basics, JSON schema, robust error handling, basic async/concurrency
Key Topics	tool routing, tool schema design, multi-step orchestration, memory pruning, reflexion/self-correction

1. Learning Objectives

By completing this project, you will:

Design tool schemas/descriptions that reliably trigger correct tool selection.
Implement a routing strategy that blends LLM selection with deterministic heuristics.
Orchestrate multi-tool plans where outputs feed into subsequent tool calls.
Add safety policies for sensitive tools (approvals, least privilege, redaction).
Build error recovery (retry, alternate tool, ask user) without infinite loops.
Implement memory strategies (window + summarization) to control context cost.

2. Theoretical Foundation

2.1 Core Concepts

Tool routing: You’re building a classifier that maps “intent” → “tool(s) + args”. The LLM can do this, but it needs clear tool interfaces and guardrails.
Tool schemas (JSON Schema): Tool names and parameter descriptions are part of the “program” you run inside the model. Ambiguous schemas cause tool misuse.
ReAct loops: Multi-step tasks require alternating between reasoning and actions with observations.
Reflexion/self-correction: When a tool fails, the agent should read the error and try a corrected call or ask a targeted question.
Policy + least privilege: Some tools are “safe” (calculator); some are “dangerous” (smart home, payments). Your system should treat them differently.

2.2 Why This Matters

This is the core architecture of practical assistants: a central brain that can operate multiple “apps” (tools) through one natural language interface. Once you have reliable routing, you can add more capabilities without rewriting the assistant.

2.3 Common Misconceptions

“Just add more tools.” More tools increases confusion unless schemas and routing are disciplined.
“One mega-tool is simpler.” It’s simpler to implement, but harder to control and test; smaller tools are easier to validate.
“The model will handle errors.” It won’t unless you explicitly feed it tool errors and constrain retries.

3. Project Specification

3.1 What You Will Build

A CLI or web chat assistant that supports a registry of tools such as:

calculator
weather (mock or real API)
web_search (stub or real)
crypto_price (stub or real)
unit_converter
reminders (local store)
smart_home (mock API)

It must:

Select correct tool(s) for mixed requests (“price BTC and compute X and check weather”)
Execute tool calls in the right order
Combine intermediate results into a final response
Show an optional reasoning trace (for debugging and learning)

3.2 Functional Requirements

Tool registry: tools are registered with name, description, schema, handler.
Router: given a user message, choose tool calls (possibly multiple).
Planner: when multiple tools are needed, generate a plan (sequence).
Execution: execute tool calls with timeouts and per-tool error handling.
Policy gates: require confirmation for sensitive tools (smart_home, reminders writes).
Memory: keep only the last K turns + a running summary.
Observability: log tool calls, latency, errors, and token usage per turn.

3.3 Non-Functional Requirements

Predictability: stable routing for common intents (math should never trigger web_search).
Safety: clear “read-only vs write” separation; no writes without explicit user approval.
Extensibility: adding a new tool should not require rewriting the agent.
Performance: run independent tool calls concurrently where safe.

3.4 Example Usage / Output

python swiss_army_assistant.py

Example session:

User: Search the price of BTC, then tell me how many I can buy with $100k, and is it raining in London?

Tool: crypto_price(symbol="BTC") -> {"price_usd": 65000}
Tool: calculator(expression="100000 / 65000") -> {"result": 1.538}
Tool: weather(city="London") -> {"condition": "Rain"}

Assistant: BTC is ~$65,000, so $100k buys ~1.54 BTC. London: rain.

4. Solution Architecture

4.1 High-Level Design

┌──────────────┐   msg   ┌──────────────────┐   tool calls   ┌───────────────┐
│ Chat UI       │───────▶│ Agent Loop       │──────────────▶│ Tool Registry  │
└──────────────┘         │ (plan+route)     │◀──────────────│ + Executors    │
                         └───────┬──────────┘   results      └───────┬───────┘
                                 │
                                 ▼
                         ┌──────────────────┐
                         │ Response Builder │
                         └──────────────────┘
                                 │
                                 ▼
                         ┌──────────────────┐
                         │ Memory + Logs     │
                         └──────────────────┘

4.2 Key Components

Component	Responsibility	Key Decisions
Tool registry	store tool definitions	keep schemas strict and minimal
Router	choose which tools to call	hybrid: heuristics + LLM function calling
Planner	order and chain calls	sequence by dependencies; allow concurrency
Policy engine	approve/deny tool calls	tool-level permissions + confirmation
Memory	keep state manageable	window + summary + per-tool artifacts

4.3 Data Structures

from dataclasses import dataclass
from typing import Any

@dataclass(frozen=True)
class ToolCall:
    tool_name: str
    arguments: dict[str, Any]

@dataclass(frozen=True)
class ToolResult:
    tool_name: str
    ok: bool
    output: dict[str, Any] | None
    error: str | None
    latency_ms: int

4.4 Algorithm Overview

Key Algorithm: routing + execution

Pre-route using heuristics (math → calculator, unit conversion → converter).
If ambiguous or multi-intent, ask the LLM to propose tool calls from the available schemas.
Validate proposed calls (args schema, permission, confirmation).
Execute tool calls (sequential for dependencies; concurrent for independent calls).
If a call fails, attempt one bounded repair step; otherwise ask user.
Synthesize final response from tool outputs.

Complexity Analysis:

Time: O(number_of_tool_calls) network-bound
Space: O(conversation_window + tool_artifacts)

5. Implementation Guide

5.1 Development Environment Setup

python -m venv .venv
source .venv/bin/activate
pip install pydantic python-dotenv rich

5.2 Project Structure

swiss-army-assistant/
├── src/
│   ├── cli.py
│   ├── agent.py
│   ├── memory.py
│   ├── policy.py
│   ├── tools/
│   │   ├── calculator.py
│   │   ├── weather.py
│   │   ├── web_search.py
│   │   └── ...
│   └── telemetry.py
└── data/
    └── assistant_logs.sqlite

5.3 Implementation Phases

Phase 1: Tool registry + 3 tools (6–9h)

Goals:

A working assistant with calculator + one API tool + one local tool.

Tasks:

Implement registry, schemas, and a unified execute_tool(call) function.
Add deterministic routing for obvious cases (calculator).
Add a basic CLI loop.

Checkpoint: Mixed requests (“calc + weather”) correctly call multiple tools.

Phase 2: Planner + safety policies (8–12h)

Goals:

Reliable multi-step orchestration and safe write operations.

Tasks:

Add a bounded planner loop (max steps).
Add policy gates and explicit confirmations.
Add audit logging and failure recovery.

Checkpoint: “Turn off lights” requires confirmation and logs the action.

Phase 3: Memory + concurrency + polish (8–14h)

Goals:

Make it feel “assistant-like” in longer sessions.

Tasks:

Add memory pruning and summarization.
Add concurrent execution for independent tool calls.
Add observability: per-tool latency, per-turn cost summary.

Checkpoint: 30+ turns stays stable in cost and behavior.

5.4 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Routing	LLM-only vs hybrid	hybrid	determinism for obvious cases
Tool design	one big tool vs many small	small tools	testable and controllable
Safety	implicit vs explicit approvals	explicit	avoid accidental side effects

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit	routing/policy	calculator always wins for arithmetic
Tool tests	tool handlers	schema validation, timeout handling
Scenario	end-to-end	“BTC price + math + weather”

6.2 Critical Test Cases

Routing correctness: “What’s 15% tip on 87.50?” must use calculator, not web_search.
Safety: smart_home tool never executes without an explicit confirmation token.
Recovery: tool failure produces a bounded retry or a clarifying question, not a loop.

7. Common Pitfalls & Debugging

Pitfall	Symptom	Solution
Vague tool descriptions	wrong tool chosen	tighten schema + add examples
Tool overlap	model picks search for math	add heuristic pre-router + policy
Infinite repair loops	agent keeps retrying	max retries + user escalation
Memory bloat	costs explode	windowed memory + summarization

Debugging strategies:

Log “why this tool” (router decision) and the final plan.
Record tool call traces for replay.

8. Extensions & Challenges

8.1 Beginner Extensions

Add unit_converter and timezone_converter.
Add “tool help” command listing available tools and examples.

8.2 Intermediate Extensions

Tool plug-ins loaded from a folder (dynamic discovery).
Add a “router eval set” (20 prompts with expected tool calls).

8.3 Advanced Extensions

Add multi-agent delegation (ties into Project 8).
Add policy + permissions per user identity and tool category.

9. Real-World Connections

9.1 Industry Applications

Customer support agents that route to CRM/search/ticket tools.
IT assistants that run scripts and inspect systems (with approvals).
Personal automation assistants (calendar, email, tasks).

9.3 Interview Relevance

Tool schema design, routing strategies, agent safety, and observability.

10. Resources

10.1 Essential Reading

Building AI Agents (Packt) — tool selection + ReAct loops (Ch. 2, 4)
The LLM Engineering Handbook (Paul Iusztin) — prompt design and evals (Ch. 3, 8)

10.3 Tools & Documentation

LangChain/LangGraph docs (agent + tool patterns)
Provider docs for function calling / tools

Previous: Project 5 (web research) — multi-step tool loops and evidence discipline
Next: Project 7 (codebase concierge) — domain-specific tools and deeper safety constraints

11. Self-Assessment Checklist

I can explain why tool descriptions are part of the “program”.
I can add a new tool without changing the core agent loop.
I can show a tool-call trace and debug a misroute.
I have a bounded retry strategy that avoids runaway loops.

12. Submission / Completion Criteria

Minimum Viable Completion:

At least 5 tools (mix of local + API + mock)
Reliable routing for simple and mixed requests
Tool-call logging and safe defaults

Full Completion:

Safety policies with explicit confirmations for write tools
Memory pruning and basic recovery behaviors
Parallel execution for independent tools

Excellence (Going Above & Beyond):

Plug-in tool ecosystem + router eval suite with measurable improvements

This guide was generated from project_based_ideas/AI_PERSONAL_ASSISTANTS_MASTERY.md. For the complete sprint overview, see project_based_ideas/AI_PERSONAL_ASSISTANTS_MASTERY/README.md.

Project 6: The “Swiss Army” Personal Assistant (Tool-Use Agent)

Quick Reference

1. Learning Objectives

2. Theoretical Foundation

2.1 Core Concepts

2.2 Why This Matters

2.3 Common Misconceptions

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Non-Functional Requirements

3.4 Example Usage / Output

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.3 Data Structures

4.4 Algorithm Overview

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 Implementation Phases

Phase 1: Tool registry + 3 tools (6–9h)

Phase 2: Planner + safety policies (8–12h)

Phase 3: Memory + concurrency + polish (8–14h)

5.4 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

7. Common Pitfalls & Debugging

8. Extensions & Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.3 Interview Relevance

10. Resources

10.1 Essential Reading

10.3 Tools & Documentation

10.4 Related Projects in This Series

11. Self-Assessment Checklist

12. Submission / Completion Criteria