Project 6: The “Swiss Army” Personal Assistant (Tool-Use Agent)
Project 6: The “Swiss Army” Personal Assistant (Tool-Use Agent)
Build a multi-tool assistant that routes user intent to the right tool(s), chains results across tools, and synthesizes one coherent answer.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 3: Advanced |
| Time Estimate | 25–35 hours |
| Language | Python (Alternatives: Rust, Go) |
| Prerequisites | Function calling basics, JSON schema, robust error handling, basic async/concurrency |
| Key Topics | tool routing, tool schema design, multi-step orchestration, memory pruning, reflexion/self-correction |
1. Learning Objectives
By completing this project, you will:
- Design tool schemas/descriptions that reliably trigger correct tool selection.
- Implement a routing strategy that blends LLM selection with deterministic heuristics.
- Orchestrate multi-tool plans where outputs feed into subsequent tool calls.
- Add safety policies for sensitive tools (approvals, least privilege, redaction).
- Build error recovery (retry, alternate tool, ask user) without infinite loops.
- Implement memory strategies (window + summarization) to control context cost.
2. Theoretical Foundation
2.1 Core Concepts
- Tool routing: You’re building a classifier that maps “intent” → “tool(s) + args”. The LLM can do this, but it needs clear tool interfaces and guardrails.
- Tool schemas (JSON Schema): Tool names and parameter descriptions are part of the “program” you run inside the model. Ambiguous schemas cause tool misuse.
- ReAct loops: Multi-step tasks require alternating between reasoning and actions with observations.
- Reflexion/self-correction: When a tool fails, the agent should read the error and try a corrected call or ask a targeted question.
- Policy + least privilege: Some tools are “safe” (calculator); some are “dangerous” (smart home, payments). Your system should treat them differently.
2.2 Why This Matters
This is the core architecture of practical assistants: a central brain that can operate multiple “apps” (tools) through one natural language interface. Once you have reliable routing, you can add more capabilities without rewriting the assistant.
2.3 Common Misconceptions
- “Just add more tools.” More tools increases confusion unless schemas and routing are disciplined.
- “One mega-tool is simpler.” It’s simpler to implement, but harder to control and test; smaller tools are easier to validate.
- “The model will handle errors.” It won’t unless you explicitly feed it tool errors and constrain retries.
3. Project Specification
3.1 What You Will Build
A CLI or web chat assistant that supports a registry of tools such as:
- calculator
- weather (mock or real API)
- web_search (stub or real)
- crypto_price (stub or real)
- unit_converter
- reminders (local store)
- smart_home (mock API)
It must:
- Select correct tool(s) for mixed requests (“price BTC and compute X and check weather”)
- Execute tool calls in the right order
- Combine intermediate results into a final response
- Show an optional reasoning trace (for debugging and learning)
3.2 Functional Requirements
- Tool registry: tools are registered with
name,description,schema,handler. - Router: given a user message, choose tool calls (possibly multiple).
- Planner: when multiple tools are needed, generate a plan (sequence).
- Execution: execute tool calls with timeouts and per-tool error handling.
- Policy gates: require confirmation for sensitive tools (smart_home, reminders writes).
- Memory: keep only the last K turns + a running summary.
- Observability: log tool calls, latency, errors, and token usage per turn.
3.3 Non-Functional Requirements
- Predictability: stable routing for common intents (math should never trigger web_search).
- Safety: clear “read-only vs write” separation; no writes without explicit user approval.
- Extensibility: adding a new tool should not require rewriting the agent.
- Performance: run independent tool calls concurrently where safe.
3.4 Example Usage / Output
python swiss_army_assistant.py
Example session:
User: Search the price of BTC, then tell me how many I can buy with $100k, and is it raining in London?
Tool: crypto_price(symbol="BTC") -> {"price_usd": 65000}
Tool: calculator(expression="100000 / 65000") -> {"result": 1.538}
Tool: weather(city="London") -> {"condition": "Rain"}
Assistant: BTC is ~$65,000, so $100k buys ~1.54 BTC. London: rain.
4. Solution Architecture
4.1 High-Level Design
┌──────────────┐ msg ┌──────────────────┐ tool calls ┌───────────────┐
│ Chat UI │───────▶│ Agent Loop │──────────────▶│ Tool Registry │
└──────────────┘ │ (plan+route) │◀──────────────│ + Executors │
└───────┬──────────┘ results └───────┬───────┘
│
▼
┌──────────────────┐
│ Response Builder │
└──────────────────┘
│
▼
┌──────────────────┐
│ Memory + Logs │
└──────────────────┘
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Tool registry | store tool definitions | keep schemas strict and minimal |
| Router | choose which tools to call | hybrid: heuristics + LLM function calling |
| Planner | order and chain calls | sequence by dependencies; allow concurrency |
| Policy engine | approve/deny tool calls | tool-level permissions + confirmation |
| Memory | keep state manageable | window + summary + per-tool artifacts |
4.3 Data Structures
from dataclasses import dataclass
from typing import Any
@dataclass(frozen=True)
class ToolCall:
tool_name: str
arguments: dict[str, Any]
@dataclass(frozen=True)
class ToolResult:
tool_name: str
ok: bool
output: dict[str, Any] | None
error: str | None
latency_ms: int
4.4 Algorithm Overview
Key Algorithm: routing + execution
- Pre-route using heuristics (math → calculator, unit conversion → converter).
- If ambiguous or multi-intent, ask the LLM to propose tool calls from the available schemas.
- Validate proposed calls (args schema, permission, confirmation).
- Execute tool calls (sequential for dependencies; concurrent for independent calls).
- If a call fails, attempt one bounded repair step; otherwise ask user.
- Synthesize final response from tool outputs.
Complexity Analysis:
- Time: O(number_of_tool_calls) network-bound
- Space: O(conversation_window + tool_artifacts)
5. Implementation Guide
5.1 Development Environment Setup
python -m venv .venv
source .venv/bin/activate
pip install pydantic python-dotenv rich
5.2 Project Structure
swiss-army-assistant/
├── src/
│ ├── cli.py
│ ├── agent.py
│ ├── memory.py
│ ├── policy.py
│ ├── tools/
│ │ ├── calculator.py
│ │ ├── weather.py
│ │ ├── web_search.py
│ │ └── ...
│ └── telemetry.py
└── data/
└── assistant_logs.sqlite
5.3 Implementation Phases
Phase 1: Tool registry + 3 tools (6–9h)
Goals:
- A working assistant with calculator + one API tool + one local tool.
Tasks:
- Implement registry, schemas, and a unified
execute_tool(call)function. - Add deterministic routing for obvious cases (calculator).
- Add a basic CLI loop.
Checkpoint: Mixed requests (“calc + weather”) correctly call multiple tools.
Phase 2: Planner + safety policies (8–12h)
Goals:
- Reliable multi-step orchestration and safe write operations.
Tasks:
- Add a bounded planner loop (max steps).
- Add policy gates and explicit confirmations.
- Add audit logging and failure recovery.
Checkpoint: “Turn off lights” requires confirmation and logs the action.
Phase 3: Memory + concurrency + polish (8–14h)
Goals:
- Make it feel “assistant-like” in longer sessions.
Tasks:
- Add memory pruning and summarization.
- Add concurrent execution for independent tool calls.
- Add observability: per-tool latency, per-turn cost summary.
Checkpoint: 30+ turns stays stable in cost and behavior.
5.4 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Routing | LLM-only vs hybrid | hybrid | determinism for obvious cases |
| Tool design | one big tool vs many small | small tools | testable and controllable |
| Safety | implicit vs explicit approvals | explicit | avoid accidental side effects |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit | routing/policy | calculator always wins for arithmetic |
| Tool tests | tool handlers | schema validation, timeout handling |
| Scenario | end-to-end | “BTC price + math + weather” |
6.2 Critical Test Cases
- Routing correctness: “What’s 15% tip on 87.50?” must use calculator, not web_search.
- Safety: smart_home tool never executes without an explicit confirmation token.
- Recovery: tool failure produces a bounded retry or a clarifying question, not a loop.
7. Common Pitfalls & Debugging
| Pitfall | Symptom | Solution |
|---|---|---|
| Vague tool descriptions | wrong tool chosen | tighten schema + add examples |
| Tool overlap | model picks search for math | add heuristic pre-router + policy |
| Infinite repair loops | agent keeps retrying | max retries + user escalation |
| Memory bloat | costs explode | windowed memory + summarization |
Debugging strategies:
- Log “why this tool” (router decision) and the final plan.
- Record tool call traces for replay.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add
unit_converterandtimezone_converter. - Add “tool help” command listing available tools and examples.
8.2 Intermediate Extensions
- Tool plug-ins loaded from a folder (dynamic discovery).
- Add a “router eval set” (20 prompts with expected tool calls).
8.3 Advanced Extensions
- Add multi-agent delegation (ties into Project 8).
- Add policy + permissions per user identity and tool category.
9. Real-World Connections
9.1 Industry Applications
- Customer support agents that route to CRM/search/ticket tools.
- IT assistants that run scripts and inspect systems (with approvals).
- Personal automation assistants (calendar, email, tasks).
9.3 Interview Relevance
- Tool schema design, routing strategies, agent safety, and observability.
10. Resources
10.1 Essential Reading
- Building AI Agents (Packt) — tool selection + ReAct loops (Ch. 2, 4)
- The LLM Engineering Handbook (Paul Iusztin) — prompt design and evals (Ch. 3, 8)
10.3 Tools & Documentation
- LangChain/LangGraph docs (agent + tool patterns)
- Provider docs for function calling / tools
10.4 Related Projects in This Series
- Previous: Project 5 (web research) — multi-step tool loops and evidence discipline
- Next: Project 7 (codebase concierge) — domain-specific tools and deeper safety constraints
11. Self-Assessment Checklist
- I can explain why tool descriptions are part of the “program”.
- I can add a new tool without changing the core agent loop.
- I can show a tool-call trace and debug a misroute.
- I have a bounded retry strategy that avoids runaway loops.
12. Submission / Completion Criteria
Minimum Viable Completion:
- At least 5 tools (mix of local + API + mock)
- Reliable routing for simple and mixed requests
- Tool-call logging and safe defaults
Full Completion:
- Safety policies with explicit confirmations for write tools
- Memory pruning and basic recovery behaviors
- Parallel execution for independent tools
Excellence (Going Above & Beyond):
- Plug-in tool ecosystem + router eval suite with measurable improvements
This guide was generated from project_based_ideas/AI_PERSONAL_ASSISTANTS_MASTERY.md. For the complete sprint overview, see project_based_ideas/AI_PERSONAL_ASSISTANTS_MASTERY/README.md.