Skip to main content
This is Part 3 of 3. Complete Part 1 and Part 2 first: Part 1 | Part 2
  • Time to complete: 15-20 minutes
  • What you’ll learn: Agent intelligence, advanced patterns, production deployment, and performance optimization

Understanding the Agent’s Intelligence

How the Agent “Thinks”

When you ask a question, the agent goes through a reasoning loop. Here’s a real example: User: “Find the oil manual on my drive and tell me the vision statement” Agent’s Internal Reasoning (this is what the LLM decides):
// Step 1: Analyze what's needed
{
  "thought": "User wants oil manual. Need to find it first.",
  "goal": "Locate oil manual on drive",
  "tool": "search_file",
  "tool_args": {"file_pattern": "oil manual"}
}

// Result: Found "Oil-Gas-Regulatory-Manual.pdf"

// Step 2: Index it
{
  "thought": "Found the file. Need to index before querying.",
  "goal": "Index the oil manual",
  "tool": "index_document",
  "tool_args": {"file_path": "C:/Users/user/Documents/Oil-Gas-Regulatory-Manual.pdf"}
}

// Result: Successfully indexed, 150 chunks created

// Step 3: Search for vision
{
  "thought": "Document indexed. Now search for vision statement.",
  "goal": "Find vision statement in oil manual",
  "tool": "query_specific_file",
  "tool_args": {
    "file_path": "C:/Users/user/Documents/Oil-Gas-Regulatory-Manual.pdf",
    "query": "vision statement"
  }
}

// Result: Retrieved 5 chunks, top score: 0.92
// Chunk: "The vision is to be recognized as..."

// Step 4: Formulate answer
{
  "thought": "I have the vision statement from the document",
  "goal": "Provide answer with citation",
  "answer": "According to the Oil & Gas Regulatory Manual, the vision is to be recognized as the leading regulatory authority in energy sector governance... (Page 3)"
}
Note: This orchestration is handled by the LLM at runtime based on:
  1. System prompt (defines agent behavior)
  2. Tool schemas (available functions and their parameters)
  3. User input (the query to process)

Search Key Generation (Smart Retrieval)

The agent generates multiple variations of your query to improve retrieval. User asks: “What is the vision of the oil & gas regulator?” Agent generates multiple search keys:
  1. Original: “What is the vision of the oil & gas regulator?”
  2. Keywords: “vision oil gas regulator”
  3. Reformulated: “oil gas regulator vision definition”
  4. Alternate: “oil gas regulator vision explanation”
Then searches with ALL of them and combines results!
# This happens in _generate_search_keys method
def _generate_search_keys(self, query: str) -> List[str]:
    keys = [query]  # Original

    # Extract keywords (remove stop words)
    words = query.lower().split()
    keywords = [w for w in words if w not in STOP_WORDS and len(w) > 2]
    keys.append(" ".join(keywords))

    # Add reformulations
    if query.lower().startswith("what is"):
        topic = query[8:].strip("?").strip()
        keys.append(f"{topic} definition")
        keys.append(f"{topic} explanation")

    return keys
Impact on retrieval:
  • Increases recall by matching different phrasings
  • Compensates for keyword mismatches
  • Trade-off: More compute (multiple searches) for better coverage

Advanced Patterns

Pattern 1: Multi-Document Synthesis

agent.process_query(
    "Compare safety protocols in oil manual vs gas manual. "
    "What are the key differences?"
)

Pattern 2: Contextual Follow-ups

# Initial query
agent.process_query("What are the installation requirements?")
# Response: "Python 3.10+, 8GB RAM, 50GB disk..."

# Follow-up (implicit context)
agent.process_query("What about for production?")
# Agent understands context: still discussing installation
# Retrieves production-specific requirements

Pattern 3: Progressive Discovery

# Query 1
agent.process_query("What documents do you have about AI?")
# Agent: Searches filesystem, finds AI PDFs, indexes them

# Query 2
agent.process_query("Tell me about neural networks")
# Agent: Searches newly-indexed AI documents

# Query 3
agent.process_query("Are there any documents about transformers?")
# Agent: Searches again, finds more, indexes on-demand

Deployment Options

As a Web API

api_server.py
from gaia.agents.chat.agent import ChatAgent, ChatAgentConfig
from gaia.api.openai_server import create_app
from gaia.api.agent_registry import registry
import uvicorn

# Configure agent
config = ChatAgentConfig(
    watch_directories=["./company_docs"],
    silent_mode=True  # Disable console output for API
)

agent = ChatAgent(config)

# Create OpenAI-compatible API
app = create_app()

# Register agent
registry.register("doc-qa", lambda: agent)

# Run server
if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8080)

As a CLI Tool

cli.py
#!/usr/bin/env python3
import sys
from gaia.agents.chat.agent import ChatAgent, ChatAgentConfig

def main():
    if len(sys.argv) < 2:
        print("Usage: doc-qa <question>")
        print("       doc-qa --interactive")
        sys.exit(1)

    config = ChatAgentConfig(
        watch_directories=["./documents"],
        silent_mode=False
    )
    agent = ChatAgent(config)

    if sys.argv[1] == "--interactive":
        # Interactive mode
        while True:
            question = input("You: ")
            if question.lower() in ['quit', 'exit']:
                break
            result = agent.process_query(question)
            print(f"Agent: {result.get('answer')}\n")
    else:
        # One-shot query
        question = " ".join(sys.argv[1:])
        result = agent.process_query(question)
        print(result.get('answer'))

if __name__ == "__main__":
    main()

Troubleshooting

Symptom: Agent responds with “No documents indexed”Debugging:
# Verify RAG dependencies
uv pip install -e ".[rag]"

# Check document exists
import os
print(os.path.exists("./manual.pdf"))  # Should be True

# Check indexed state
print(f"Indexed: {agent.indexed_files}")
print(f"RAG files: {agent.rag.indexed_files}")
Common causes:
  • Missing RAG dependencies (uv pip install -e ".[rag]")
  • Incorrect file path
  • File not readable
Symptom: Agent returns irrelevant informationTuning options:
# Retrieve more chunks
config = ChatAgentConfig(max_chunks=10)

# Use semantic chunking (slower, more accurate)
config = ChatAgentConfig(use_llm_chunking=True)

# Larger chunks for more context
config = ChatAgentConfig(chunk_size=800)

# Debug what's being retrieved
config = ChatAgentConfig(debug=True)
Check retrieval scores:
response = agent.rag.query("your question")
print(f"Scores: {response.chunk_scores}")
# Scores < 0.5 indicate weak matches
Symptom: New files not auto-indexedSolution:
# Install watchdog dependency
uv pip install "watchdog>=2.1.0"
Verify:
# Check watchers are active
print(f"Watching: {agent.watch_directories}")
print(f"Observers: {len(agent.observers)}")  # Should be > 0

# Check file handler telemetry
if agent.file_handlers:
    telemetry = agent.file_handlers[0].get_telemetry()
    print(f"Events: {telemetry}")
Symptom: Indexing takes excessive timeOptimizations:
# Smaller chunks = faster indexing
config = ChatAgentConfig(chunk_size=300)

# Disable LLM chunking (use fast heuristic)
config = ChatAgentConfig(use_llm_chunking=False)

# Index incrementally, not all at once
# Let file watching handle gradual indexing
Benchmark:
import time
start = time.time()
agent.rag.index_document("large.pdf")
print(f"Indexing took: {time.time() - start:.2f}s")
Typical performance on AI PC with Ryzen AI:
  • Small PDF (10 pages): ~2-3 seconds
  • Medium PDF (50 pages): ~8-12 seconds
  • Large PDF (200 pages): ~30-45 seconds
  • Embedding generation: ~50-100 chunks/second on NPU

Best Practices

Start Small

Index a few representative documents first, then scale. Full drive indexing is rarely necessary.

Tune for Document Type

Technical docs need larger chunks (600-800 tokens). FAQs work well with smaller chunks (300-400 tokens).

Use Sessions

Session persistence avoids re-indexing on every restart. Critical for large document sets.

Monitor Selectively

Watch only actively changing directories. Static archives don’t benefit from monitoring.

Security

Set allowed_paths in production. Prevents path traversal and unauthorized access.

Incremental Development

Build step-by-step. Test each component before combining them.

Performance Tuning

Profilechunk_sizemax_chunksuse_llm_chunkingUse CaseAI PC Performance
Fast3003FalseDevelopment, quick FAQs~2-3s per PDF (NPU)
Balanced5005FalseProduction default~5-7s per PDF (NPU)
Accurate8007TrueComplex technical queries~10-15s per PDF (NPU+LLM)
# Development (fast)
config = ChatAgentConfig(
    chunk_size=300,
    max_chunks=3,
    use_llm_chunking=False
)

# Production (balanced)
config = ChatAgentConfig(
    chunk_size=500,
    max_chunks=5,
    use_llm_chunking=False
)

# Research (accurate)
config = ChatAgentConfig(
    chunk_size=800,
    max_chunks=7,
    use_llm_chunking=True
)

What You’ve Learned

Agent Architecture

The reasoning loop pattern: query → tool selection → execution → synthesis

Tool System

Using @tool decorator to register Python functions as LLM-invocable capabilities

RAG Integration

Vector embeddings, FAISS indexing, and semantic retrieval

Mixin Pattern

Composing agent functionality via multiple inheritance

File Monitoring

Watchdog-based reactive indexing with debouncing

Session Management

JSON serialization for state persistence

Security

Path validation, symlink detection, and access control

Performance

Tuning chunk size, retrieval count, and chunking strategies

Next Steps

Extend with Voice

Add Whisper (ASR) and Kokoro (TTS) for voice interaction.See: Voice-Enabled Assistant (coming soon)

Build Code Agent

Create an agent with code generation and debugging capabilities.See: Code Generation Agent (coming soon)

Multi-Agent Orchestration

Route queries to specialized agents based on intent.See: Multi-Agent System (coming soon)

Package for Distribution

Package as Windows installer or Electron app.See: Deployment Guide

Source Code Reference


Resources