def _register_tools(self): self.register_rag_tools() self.register_file_search_tools() self.register_file_tools() # All mixin tools now available to LLM
What you have: An agent using GAIA’s built-in mixins. This reduces code duplication and provides tested tool implementations.
Add file system monitoring to automatically reindex documents when they change.
Copy
from gaia.agents.chat.agent import ChatAgent, ChatAgentConfig# The full ChatAgent includes file monitoring!config = ChatAgentConfig( rag_documents=["./manual.pdf"], watch_directories=["./documents"], # Monitor this folder chunk_size=500, max_chunks=5)agent = ChatAgent(config)# Now:# 1. Manual.pdf is indexed# 2. ./documents folder is being watched# 3. If you add a new PDF to documents/ → Auto-indexed!# 4. If you modify an existing PDF → Auto-reindexed!agent.process_query("What's in the manual?")# Add a new file to ./documents/new_report.pdf# Agent automatically indexes it in the background!
Under the Hood: File Watching
Watchdog library implementation:
Copy
# Simplified file monitoring logicfrom watchdog.observers import Observerfrom watchdog.events import FileSystemEventHandlerclass FileChangeHandler(FileSystemEventHandler): def on_created(self, event): if event.src_path.endswith('.pdf'): agent.rag.index_document(event.src_path) def on_modified(self, event): if event.src_path.endswith('.pdf'): agent.rag.reindex_document(event.src_path)# Observer runs in separate threadobserver = Observer()observer.schedule(handler, "./documents", recursive=True)observer.start()
Implementation details:
File events are async (non-blocking)
Detection latency: ~1 second
Runs in dedicated thread
Debouncing: 2-second window per file
Memory: LRU eviction at 1000 tracked files
Supported types: .pdf, .txt, .md, .py, .json, etc.
What you have: Reactive file monitoring. The index automatically updates when documents are created or modified.
Implement session persistence to avoid re-indexing on every restart.
Copy
from gaia.agents.chat.agent import ChatAgent, ChatAgentConfigconfig = ChatAgentConfig( rag_documents=["./manual.pdf"])agent = ChatAgent(config)# Use the agentresult = agent.process_query("What does the manual say about installation?")print(result)# Save session (includes indexed documents, conversation history)if agent.save_current_session(): print(f"✓ Session saved! ID: {agent.current_session.session_id}")
The ChatAgent class combines all components. Here’s how to configure and use it:
Full Configuration
Usage
Configuration Reference
complete_agent.py
Copy
from gaia.agents.chat.agent import ChatAgent, ChatAgentConfigfrom pathlib import Path# Complete configurationconfig = ChatAgentConfig(# Documents to index on startup rag_documents=[ "./manuals/user_guide.pdf", "./manuals/technical_spec.pdf" ], # Directories to monitor watch_directories=["./documents"], # RAG configuration chunk_size=500, chunk_overlap=100, max_chunks=5, use_llm_chunking=False, # LLM settings (AMD-optimized models) model_id="Qwen3-Coder-30B-A3B-Instruct-GGUF", # Runs on Ryzen AI max_steps=10, # Output show_stats=True, debug=True, silent_mode=False, # Security allowed_paths=[ str(Path.home() / "Documents"), str(Path.home() / "Work"), str(Path.cwd()) ])agent = ChatAgent(config)agent.save_current_session()
Copy
# Query examplesexamples = [ "What documents are indexed?", "Find the safety manual and tell me about emergency procedures", "What are the system requirements?", "Index all PDFs in my Downloads folder",]for question in examples: result = agent.process_query(question) print(f"Q: {question}") print(f"A: {result.get('answer')}\n")
Parameter
Default
Purpose
chunk_size
500
Tokens per chunk
chunk_overlap
100
Boundary context
max_chunks
5
Top-k retrieval
use_llm_chunking
False
Semantic vs structural
max_steps
10
Reasoning iterations
show_stats
False
Token usage display
debug
False
Retrieval details
Run it:
Copy
python complete_agent.py
Under the Hood: Complete System Flow
Full execution sequence:
Initialization (Startup):
Copy
Load config→ Create RAG SDK (FAISS vector index)→ Index initial documents (extract, chunk, embed)→ Start file watchers on ./documents→ Create session for persistence→ Agent ready!
First Query: “What documents are indexed?”
Copy
User input→ Agent analyzes→ Calls list_indexed_documents tool→ Returns: ["user_guide.pdf", "technical_spec.pdf"]→ Agent formats answer→ User sees list
Second Query: “Find safety manual…”
Copy
User input→ Agent thinks: "Need to find + search file"→ Call search_file("safety manual") ├─ Search Documents folder ├─ Search Downloads folder └─ Returns: ["Safety-Manual.pdf"]→ Call index_document("Safety-Manual.pdf") ├─ Extract text from PDF ├─ Split into chunks ├─ Generate embeddings └─ Add to vector index→ Call query_documents("emergency procedures") ├─ Generate query embedding ├─ Search vector index (cosine similarity) ├─ Retrieve top 5 chunks └─ Return chunks→ Agent reads chunks→ Agent formulates answer→ User sees: "According to Safety-Manual.pdf, emergency procedures are..."
Background: File Added to ./documents/
Copy
File watcher detects new PDF→ Check if .pdf extension→ Debounce (wait 2 seconds for more changes)→ Auto-index the file→ Update agent's system prompt→ Ready for queries on new file!
The file watcher handles this automatically. Note that initial indexing still requires manual setup.
from gaia.agents.chat.agent import ChatAgent, ChatAgentConfigfrom gaia.agents.base.tools import toolclass CustomDocAgent(ChatAgent): """Chat agent with custom domain tools.""" def _register_tools(self): # Register all standard tools first super()._register_tools() # Add custom tools @tool def analyze_sentiment(document_name: str) -> dict: """Analyze document sentiment.""" # Your implementation return { "sentiment": "positive", "confidence": 0.85, "document": document_name } @tool def compare_documents(doc1: str, doc2: str) -> dict: """Compare two documents.""" # Your implementation return { "differences": ["Section 2 differs"], "similarity_score": 0.72 }
Copy
agent = CustomDocAgent()# LLM can now use your custom toolsresult = agent.process_query( "Analyze sentiment of annual report and compare with last year's")# Agent automatically:# 1. Calls analyze_sentiment("annual-report.pdf")# 2. Calls compare_documents("2024-report.pdf", "2023-report.pdf")# 3. Synthesizes comparison
Copy
# Check what tools are availableagent.list_tools(verbose=True)# Output shows:# - Standard RAG tools (query_documents, index_document, ...)# - Standard file tools (search_file, add_watch_directory, ...)# - YOUR custom tools (analyze_sentiment, compare_documents)
Override system prompts to create domain-specific behavior.
Research Agent
Support Agent
Pattern
research_agent.py
Copy
from gaia.agents.chat.agent import ChatAgentclass ResearchAgent(ChatAgent): """Academic research specialist.""" def _get_system_prompt(self) -> str: base = super()._get_system_prompt() return base + """**Research Mode:**- Cite sources with page numbers- Provide direct quotations- Compare findings across papers- Note contradictions or consensus"""# Usageagent = ResearchAgent()agent.process_query("What do papers say about attention mechanisms?")
support_agent.py
Copy
from gaia.agents.chat.agent import ChatAgentclass CustomerSupportAgent(ChatAgent): """Customer support specialist.""" def _get_system_prompt(self) -> str: base = super()._get_system_prompt() return base + """**Support Mode:**- Search product manuals first- Provide step-by-step instructions- Include troubleshooting steps- Link to related documentation"""# Usageagent = CustomerSupportAgent()agent.process_query("How do I install the software?")
Use case: Shared document search across organizational documentation.
Setup
Team Usage
Security Note
knowledge_base.py
Copy
from gaia.agents.chat.agent import ChatAgent, ChatAgentConfigimport osdocs_root = "/company/shared/documentation"config = ChatAgentConfig( watch_directories=[ f"{docs_root}/policies", f"{docs_root}/procedures", f"{docs_root}/guides" ], show_stats=False, silent_mode=False, allowed_paths=[docs_root] # Restrict to company docs)agent = ChatAgent(config)agent.save_current_session()print(f"Session ID: {agent.current_session.session_id}")
load_shared_session.py
Copy
from gaia.agents.chat.agent import ChatAgent# Team member loads shared sessionagent = ChatAgent()agent.load_session("session-2024-12-11-abc123")# Indexed documents are restored# No re-indexing requiredresult = agent.process_query("What's our remote work policy?")
Use case: Interactive CLI for querying personal documents.
Implementation
Usage
Configuration
personal_assistant.py
Copy
from gaia.agents.chat.agent import ChatAgent, ChatAgentConfigfrom pathlib import Pathconfig = ChatAgentConfig( watch_directories=[ str(Path.home() / "Documents"), str(Path.home() / "Downloads"), ], chunk_size=400, max_chunks=4, debug=False, show_stats=True)agent = ChatAgent(config)print("Document Q&A Agent")print("Type 'quit' to exit\n")while True: question = input("You: ") if question.lower() in ['quit', 'exit']: agent.save_current_session() print(f"Session saved: {agent.current_session.session_id}") break result = agent.process_query(question) print(f"Agent: {result.get('answer', result)}\n")
Copy
python personal_assistant.py
Example interaction:
Copy
You: What documents are indexed?Agent: Currently monitoring Documents and Downloads folders. Found 15 PDFs. Type questions to search them.You: What does my tax return say about deductions?Agent: According to 2024-Tax-Return.pdf, your deductions include...
Why these settings:
chunk_size=400 - Balanced for mixed document types
max_chunks=4 - Faster responses, good for personal use