📖 You are viewing: Conceptual Guide - Learn how tools work and create your ownSee also: API Specification
Source Code:
src/gaia/agents/base/tools.pyWhat You’ll Learn: How the tool system works under the hood, how LLMs “see” and choose which tools to use, how to write tools that LLMs understand correctly, best practices for parameters, return values, and error handling, and common pitfalls and how to avoid them.
What Are Tools?
LLMs are brilliant at language but can’t interact with the real world. Tools are the bridge:| Without Tools | With Tools |
|---|---|
| ”I can’t check the weather” | Calls get_weather() → “It’s 72°F and sunny" |
| "I can’t read files” | Calls read_file() → Shows file contents |
| ”I can’t search your database” | Calls search_db() → Returns matching records |
The Tool Contract: What the LLM Actually Sees
Here’s the key insight: The LLM never sees your Python code. It only sees a “contract” describing the tool. When you write this:_TOOL_REGISTRY dictionary. Today the
registry entry looks like this (source: src/gaia/agents/base/tools.py:40-77):
What’s actually captured: The function name, the full docstring (as
description), and a {type, required} entry per parameter inferred from type
hints (only str, int, float, bool, tuple, dict are recognised —
other annotations fall back to "unknown"). Per-argument descriptions from the
Args: docstring block and parameter default values are not parsed into the
registry today, so write concise, self-contained docstrings to give the LLM all
the context it needs.How LLMs Choose Tools
When a user asks a question, the LLM goes through a decision process: This is why clear docstrings matter! If your description doesn’t match user intent, the LLM won’t choose your tool.Building Effective Tools
Step 1: Start Simple
Begin with the most basic tool structure:- Clear function name (
calculate) matches what it does - Type hint (
str) tells LLM what to pass - Return type (
float) sets expectations - Docstring explains when to use it (“when the user asks to calculate”)
- Args section describes the expected format
Step 2: Add Parameters with Defaults
Make tools flexible with optional parameters:Step 3: Handle Complex Types
For structured data, use type hints to guide the LLM:List[str]→ needs to pass a list of stringsOptional[List[str]]→ can be omitted or set to null- The Args descriptions show example formats
Return Value Patterns
What you return matters—the LLM uses it to form responses.Pattern 1: Success with Data
{"status": "success", "user": {"id": 123, "name": "Alice", ...}}
LLM responds: “Alice ([email protected]) has the Admin role.”
Pattern 2: Error with Guidance
{"status": "error", "error": "No user found...", "suggestion": "..."}
LLM responds: “I couldn’t find a user with ID 999. Would you like to search by name instead?”
Pattern 3: Partial Results
Standard Response Format
All GAIA tools should return responses in a standardized format that helps the LLM understand the result and how to use it. This is especially important for tools that return structured JSON data.The GAIA Response Pattern
Why This Format Matters
When tools return raw JSON without context, the LLM may:- Echo the JSON directly instead of summarizing it
- Return structured data in its answer instead of human-readable text
- Misunderstand how to interpret the data
instruction field explicitly tells the LLM what to do with the data.
Field Reference
| Field | Required | Type | Description |
|---|---|---|---|
status | Yes | string | "success", "error", or "partial" |
message | Recommended | string | Brief human-readable summary of what happened |
data | When applicable | object | Structured data returned by the tool |
instruction | Recommended | string | Guidance for LLM on how to interpret/use the data |
error | On error | string | Error description when status is “error” |
suggestion | Optional | string | Recovery hint on errors |
Complete Example
“Your system is running well. CPU usage is at 23%, memory is using 8.2 GB of 16 GB (51%), and you have 245 GB of free disk space.”Instead of echoing the JSON:
{"cpu_percent": 23, "memory_gb_used": 8.2, ...}
MCP Tools: When using
MCPClientMixin, external MCP tool responses are automatically wrapped in this GAIA format with status, message, data, and instruction fields.The Power of Good Docstrings
Your docstring teaches the LLM when and how to use your tool. Compare:- Vague Docstring
- Clear Docstring
q parameter name is unclear, no guidance on when to use it, might conflict with other search tools.Docstring Anatomy
Error Handling: Return, Don’t Raise
Why?
When a tool raises an exception:- The agent’s reasoning loop may crash
- The LLM doesn’t get useful error information
- The user sees a technical error instead of helpful guidance
The Pattern
Common Pitfalls and Solutions
Pitfall 1: LLM calls wrong tool
Pitfall 1: LLM calls wrong tool
Symptom: User asks to search code, but LLM calls
search_web.Cause: Tool descriptions are too similar or vague.Solution: Add explicit differentiation:Pitfall 2: LLM passes wrong parameter type
Pitfall 2: LLM passes wrong parameter type
Symptom: Tool expects integer, receives string like “5”.Cause: Missing or unclear type hints.Solution: Always use type hints and validate:
Pitfall 3: Tool returns too much data
Pitfall 3: Tool returns too much data
Symptom: Agent becomes slow or gives inconsistent answers.Cause: Tool returns massive amounts of data that overflow context.Solution: Limit and summarize output:
Pitfall 4: Tool not being discovered
Pitfall 4: Tool not being discovered
Symptom: LLM says “I don’t have a tool for that” when you do.Cause: Tool is defined but not registered with the agent.Solution: Make sure tool is inside
_register_tools():Pitfall 5: Tool with side effects runs unexpectedly
Pitfall 5: Tool with side effects runs unexpectedly
Symptom: Emails sent, files deleted, or data modified when user was just asking a question.Cause: Destructive tools need safeguards.Solution: Add confirmation or dry-run modes:Now the LLM will preview first:
Practice Challenge
Build a Database Query Tool
Create a tool that: (1) Accepts a natural language query about users, (2) Translates it to a database operation, (3) Returns structured results, (4) Handles errors gracefully.Requirements: Clear docstring with usage examples, type hints on all parameters, graceful error handling, reasonable result limits.
Why this solution works:
Hints
Hints
Use a dict to simulate database records. Include examples in the docstring to guide the LLM. Return both data and metadata (count, any filtering applied).
Solution
Solution
- Clear docstring with examples: LLM knows exactly how to map user requests to parameters
- Optional parameters: LLM can use any combination of filters
- Input validation: Invalid values return helpful error messages
- Graceful limits: Prevents returning too much data
- Rich metadata: LLM knows total count, filters used, and has suggestions
- Error handling: Catches unexpected errors with helpful messages
Deep Dive: Tool Schema Generation
How @tool Generates the Tool Entry
How @tool Generates the Tool Entry
When you decorate a function with Notable implementation details:
@tool, Python introspection extracts a
minimal schema (see src/gaia/agents/base/tools.py:19-88 for the authoritative
implementation):- The full docstring is stored once as
description. There’s no per-argument docstring parsing today — the LLM sees the whole docstring together with a{type, required}map. - Default values are not stored on the parameter entries.
requiredis a boolean derived from whether the parameter has a default. - Generic annotations like
list[str]orOptional[int]are not specialised — only the bare primitives (str,int,float,bool,tuple,dict) map to a known JSON type; anything else is reported as"unknown". @tool(atomic=True)marks a tool as non-decomposable so the planner will not try to split it further.
Key Takeaways
LLM Sees the Contract
Function name, type hints, and docstring are all the LLM knows. Write them for the LLM, not just humans.
Docstrings Drive Selection
“Use this tool when…” phrases directly influence when the LLM chooses your tool.
Return Errors as Data
Never raise exceptions. Return structured error info so the LLM can respond helpfully.
Limit Output Size
Large returns overflow context. Truncate, summarize, and indicate when there’s more.
Next Steps
Agent System
Understand how agents use tools in the reasoning loop
Tool Mixins
Use pre-built tool collections for common tasks
Best Practices
Advanced patterns for production tools