Source Code:
src/gaia/chat/sdk.py | src/gaia/rag/sdk.py- Time to complete: 15-20 minutes
- What you’ll learn: Agent intelligence, advanced patterns, production deployment, and performance optimization
Understanding the Agent’s Intelligence
How the Agent “Thinks”
When you ask a question, the agent goes through a reasoning loop. Here’s a real example: User: “Find the oil manual on my drive and tell me the vision statement” Agent’s Internal Reasoning (this is what the LLM decides):- System prompt (defines agent behavior)
- Tool schemas (available functions and their parameters)
- User input (the query to process)
Search Key Generation (Smart Retrieval)
The agent generates multiple variations of your query to improve retrieval. User asks: “What is the vision of the oil & gas regulator?” Agent generates multiple search keys:- Original: “What is the vision of the oil & gas regulator?”
- Keywords: “vision oil gas regulator”
- Reformulated: “oil gas regulator vision definition”
- Alternate: “oil gas regulator vision explanation”
- Increases recall by matching different phrasings
- Compensates for keyword mismatches
- Trade-off: More compute (multiple searches) for better coverage
Advanced Patterns
Pattern 1: Multi-Document Synthesis
- Example
- Execution
- Implementation
Pattern 2: Contextual Follow-ups
- Example
- How It Works
Pattern 3: Progressive Discovery
- Example
- Pattern
Deployment Options
As a Web API
- Server
- Client
- Deploy
api_server.py
As a CLI Tool
- Implementation
- Package Config
- Usage
cli.py
Troubleshooting
No documents indexed
No documents indexed
Symptom: Agent responds with “No documents indexed”Debugging:Common causes:
- Missing RAG dependencies (
uv pip install -e ".[rag]") - Incorrect file path
- File not readable
Poor retrieval quality
Poor retrieval quality
Symptom: Agent returns irrelevant informationTuning options:Check retrieval scores:
File watching not working
File watching not working
Symptom: New files not auto-indexedSolution:Verify:
Slow indexing performance
Slow indexing performance
Symptom: Indexing takes excessive timeOptimizations:Benchmark:Typical performance on AI PC with Ryzen AI:
- Small PDF (10 pages): ~2-3 seconds
- Medium PDF (50 pages): ~8-12 seconds
- Large PDF (200 pages): ~30-45 seconds
- Embedding generation: ~50-100 chunks/second on NPU
Best Practices
Start Small
Index a few representative documents first, then scale. Full drive indexing is rarely necessary.
Tune for Document Type
Technical docs need larger chunks (600-800 tokens). FAQs work well with smaller chunks (300-400 tokens).
Use Sessions
Session persistence avoids re-indexing on every restart. Critical for large document sets.
Monitor Selectively
Watch only actively changing directories. Static archives don’t benefit from monitoring.
Security
Set
allowed_paths in production. Prevents path traversal and unauthorized access.Incremental Development
Build step-by-step. Test each component before combining them.
Performance Tuning
- Indexing Speed
- Memory Management
- Query Optimization
| Profile | chunk_size | max_chunks | use_llm_chunking | Use Case | AI PC Performance |
|---|---|---|---|---|---|
| Fast | 300 | 3 | False | Development, quick FAQs | ~2-3s per PDF (NPU) |
| Balanced | 500 | 5 | False | Production default | ~5-7s per PDF (NPU) |
| Accurate | 800 | 7 | True | Complex technical queries | ~10-15s per PDF (NPU+LLM) |
What You’ve Learned
Agent Architecture
The reasoning loop pattern: query → tool selection → execution → synthesis
Tool System
Using
@tool decorator to register Python functions as LLM-invocable capabilitiesRAG Integration
Vector embeddings, FAISS indexing, and semantic retrieval
Mixin Pattern
Composing agent functionality via multiple inheritance
File Monitoring
Watchdog-based reactive indexing with debouncing
Session Management
JSON serialization for state persistence
Security
Path validation, symlink detection, and access control
Performance
Tuning chunk size, retrieval count, and chunking strategies
Next Steps
Extend with Voice
Add Whisper (ASR) and Kokoro (TTS) for voice interaction.See: Voice-Enabled Assistant (coming soon)
Build Code Agent
Create an agent with code generation and debugging capabilities.See: Code Generation Agent (coming soon)
Multi-Agent Orchestration
Route queries to specialized agents based on intent.See: Multi-Agent System (coming soon)
Package for Distribution
Package as Windows installer or Electron app.See: Deployment Guide
Source Code Reference
ChatAgent
Main agent implementation with session management and file monitoring
RAGToolsMixin
Document indexing and query tools
FileToolsMixin
Directory monitoring implementation
FileSearchToolsMixin
Multi-phase file discovery across drives