Part 1: Getting Started with Document Q&A Agents

Source Code: src/gaia/chat/sdk.py | src/gaia/rag/sdk.py

Time to complete: 15-20 minutes
What you’ll build: A basic chat agent with RAG and file discovery
What you’ll learn: Agent architecture, RAG integration, and the tool system
Platform: Runs locally on AI PCs with Ryzen AI (NPU/iGPU acceleration)

Why Build This Agent?

Privacy-First AI: This agent runs entirely on your AI PC with Ryzen AI. All processing happens locally—document content never leaves your machine.

If you have hundreds of PDF documents spread across your system—manuals, reports, specifications—finding specific information is tedious. You need to remember which file contains what, open each one, search manually, and piece together information from multiple sources. This agent automates that process entirely on your AI PC:

Finds relevant documents on your drive
Indexes them with vector embeddings (NPU-accelerated on Ryzen AI)
Searches semantically using cosine similarity
Returns specific information with source citations
Runs completely locally—no cloud, no data leaving your machine

What you’re building: A chat agent that combines:

Agent reasoning - LLM-based tool selection and orchestration
RAG (Retrieval-Augmented Generation) - Vector search over document chunks
File discovery - Automated search across common directories
File monitoring - Watches folders and re-indexes on changes
Session persistence - Saves indexed state across restarts
Local execution - Runs entirely on your AI PC using Ryzen AI NPU/iGPU acceleration

The Architecture (What You’re Building)

Flow:

User query → ChatAgent (orchestrator)
Agent selects tool → RAGToolsMixin
RAGToolsMixin calls → RAG SDK
RAG SDK retrieves chunks from vector index → back to Agent
Agent synthesizes answer → User receives response

Quick Start (5 Minutes)

Get a working agent running to understand the basic flow.

Set up your project

Choose your installation path:

PyPI (Recommended)
Developer (Editable Install)

Create a new project folder and install from PyPI:

mkdir my-chat-agent
cd my-chat-agent
uv venv .venv
source .venv/bin/activate  # On Windows: .\.venv\Scripts\Activate.ps1
uv pip install "amd-gaia[rag]"

This is the recommended path for most users. You’ll create your agent scripts in this folder.

Clone the repository for development or to access examples:

git clone https://github.com/amd/gaia.git
cd gaia
uv venv .venv
source .venv/bin/activate  # On Windows: .\.venv\Scripts\Activate.ps1
uv pip install -e ".[rag]"

Use this path if you want to modify GAIA source code or contribute to the project.

Start Lemonade Server

# Start local LLM server with AMD NPU/iGPU acceleration
lemonade-server serve

Lemonade Server provides AMD-optimized inference for AI PCs with Ryzen AI. Models run on your NPU or iGPU for fast, private processing.

Create your first agent

Create my_chat_agent.py in your project folder:

my_chat_agent.py

import json
from gaia.agents.chat.agent import ChatAgent, ChatAgentConfig

# Create agent with a document
config = ChatAgentConfig(
    rag_documents=["./manual.pdf"]  # Your document here
)
agent = ChatAgent(config)

# Ask a question
result = agent.process_query("What does the manual say about installation?")
print(json.dumps(result, indent=2))

Run it

python my_chat_agent.py

What happens:

PDF text extraction (PyMuPDF)
Chunking into 500-token segments
Embedding generation (nomic-embed running on NPU/iGPU)
FAISS index creation
Query processing via vector search
LLM generates answer using retrieved chunks (Ryzen AI acceleration)

If you don’t have a PDF, the agent operates in general conversation mode using the LLM’s training data.

no_documents.py

agent = ChatAgent()  # No documents specified
result = agent.process_query("What is Python?")
# No RAG retrieval, uses general knowledge

Understanding the Components

Before we build step-by-step, let’s understand what each piece does under the hood.

1. The Agent Base Class

Code
Purpose

from gaia.agents.base.agent import Agent

The base Agent class implements a reasoning loop that orchestrates tool calls based on LLM decisions.

Processing flow when you call process_query(): Trade-off: You delegate control flow to the LLM. This is flexible but less deterministic than traditional programming.

2. The Tool Decorator

Code
What It Does
How LLM Sees It

from gaia.agents.base.tools import tool

@tool
def search_documents(query: str) -> dict:
    """Search for documents."""
    # Your implementation
    return {"results": [...]}

The @tool decorator:

Introspects function signature and docstring
Generates JSON schema for parameters
Registers function with agent’s tool registry
Adds tool to LLM’s context
Enables LLM to invoke your function

{
  "name": "search_documents",
  "description": "Search for documents.",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {"type": "string"}
    },
    "required": ["query"]
  }
}

Implementation: The LLM receives tool schemas as context and decides when to invoke each function based on the user’s query and conversation state.

3. RAG SDK (Document Retrieval)

Import
Configuration
Usage

from gaia.rag.sdk import RAGSDK, RAGConfig

config = RAGConfig(
    chunk_size=500,        # Tokens per chunk
    chunk_overlap=100,     # Context preservation
    max_chunks=5,          # Top-k retrieval
    embedding_model="nomic-embed-text-v2-moe-GGUF"  # Runs on NPU/iGPU
)

On AI PCs with Ryzen AI, embedding generation runs on the NPU for efficient, low-power processing.

rag = RAGSDK(config)
rag.index_document("document.pdf")
response = rag.query("your question")

RAG pipeline: Why chunks?

LLMs have context window limits (e.g., 4096 tokens)
Smaller chunks enable more precise semantic matching
Overlap preserves context at chunk boundaries and reduces information loss

AMD Hardware Acceleration: Embedding generation runs on the Ryzen AI NPU/iGPU, enabling fast document indexing with minimal CPU usage on AI PCs.

4. Tool Mixins

Mixins are classes that provide reusable tool sets via Python multiple inheritance.

Usage
Available Mixins
Benefits

class ChatAgent(Agent, RAGToolsMixin, FileToolsMixin):
    # Inherits tools from all mixins
    pass

from gaia.agents.base.agent import Agent
from gaia.agents.chat.tools import RAGToolsMixin, FileToolsMixin

class MyAgent(Agent, RAGToolsMixin, FileToolsMixin):
    def _register_tools(self):
        self.register_rag_tools()    # Gets all RAG tools
        self.register_file_tools()   # Gets all file tools
        # Agent now has ~15 tools available

Building It: The Step-by-Step Journey

Now let’s build this agent incrementally, understanding each piece as we go.

Step 1: The Simplest Possible Agent

Start with a minimal agent implementation to understand the core structure.

Implementation
Run
What You Have

step1_basic.py

from gaia.agents.base.agent import Agent
from gaia.agents.base.console import AgentConsole

class SimpleChatAgent(Agent):
    """Minimal chat agent with no tools."""

    def _get_system_prompt(self) -> str:
        return "You are a helpful AI assistant."

    def _create_console(self):
        return AgentConsole()

    def _register_tools(self):
        # No tools registered yet
        pass

# Use it
agent = SimpleChatAgent()
result = agent.process_query("Hello! How are you?")
print(result)

python step1_basic.py

Expected output:

Agent: Hello! I'm doing well, thank you for asking. How can I help you today?

Under the Hood: Execution Flow

Initialization:

SimpleChatAgent()
  → Agent.__init__()
  → Initialize LLM client
  → Load system prompt
  → Create console
  → Register tools (none in this case)

Query processing:

process_query("Hello! How are you?")
  → Construct messages: [system_prompt, user_message]
  → Send to LLM (Lemonade Server)
  → LLM generates response
  → Display via AgentConsole
  → Return result dict

Limitations:

No tools = no ability to execute actions
Cannot access external data sources
Relies solely on LLM training data

This basic agent cannot retrieve information from documents or perform actions. It’s limited to conversation using the LLM’s pre-trained knowledge.

Step 2: Add RAG (Document Understanding)

Add RAG capability to enable document search and retrieval.

Implementation
Usage
Output Example

step2_with_rag.py

from gaia.agents.base.agent import Agent
from gaia.agents.base.console import AgentConsole
from gaia.agents.base.tools import tool
from gaia.rag.sdk import RAGSDK, RAGConfig

class DocQAAgent(Agent):
    """Agent with document Q&A capability."""

    def __init__(self, documents=None, **kwargs):
        # Initialize RAG SDK first
        rag_config = RAGConfig(
            chunk_size=500,
            max_chunks=5,
            chunk_overlap=100
        )
        self.rag = RAGSDK(rag_config)
        self.indexed_files = set()

        # Index documents
        if documents:
            for doc in documents:
                self.rag.index_document(doc)
                self.indexed_files.add(doc)
                print(f"✓ Indexed: {doc}")

        super().__init__(**kwargs)

    def _get_system_prompt(self) -> str:
        indexed = "\n".join(f"- {doc}" for doc in self.indexed_files)
        return f"""You are a document Q&A assistant.

Currently indexed:
{indexed}

Use query_documents to search for information."""

    def _create_console(self):
        return AgentConsole()

    def _register_tools(self):
        @tool
        def query_documents(query: str) -> dict:
            """Search indexed documents."""
            if not self.rag.indexed_files:
                return {"error": "No documents indexed"}

            response = self.rag.query(query)
            return {
                "chunks": response.chunks,
                "scores": response.chunk_scores,
                "sources": response.source_files,
                "answer": response.text
            }

# Use it
agent = DocQAAgent(documents=["./manual.pdf"])
response = agent.process_query("What does the manual say about installation?")
print(response)

# Create agent with document
agent = DocQAAgent(documents=["./manual.pdf"])

# Query it
result = agent.process_query("What are the system requirements?")
print(result)

{
  "answer": "According to the manual, system requirements are: Python 3.10+, 8GB RAM, 50GB disk space...",
  "sources": ["manual.pdf"],
  "steps": 2,
  "tools_used": ["query_documents"]
}

Under the Hood: Indexing and Retrieval

Indexing phase (__init__):

DocQAAgent(documents=["manual.pdf"])
  → RAGSDK instance created
  → For each PDF:
      → PyMuPDF extracts text
      → Text split into 500-token chunks (100 overlap)
      → nomic-embed generates embeddings (384 dimensions, runs on NPU/iGPU)
      → Embeddings stored in FAISS index (in-memory)
  → Ready for queries

On AI PCs with Ryzen AI, the embedding model runs on the NPU for efficient processing.

Query phase (process_query):

process_query("What are the system requirements?")
  → Agent decides to use query_documents tool
  → Tool execution:
      → Generate query embedding (nomic-embed on NPU)
      → Compute cosine similarity vs all chunk embeddings
      → Sort by similarity score
      → Return top 5 chunks
  → Agent receives chunks
  → LLM synthesizes answer using chunks as context (Ryzen AI acceleration)
  → Return result

Key benefit: Grounding LLM responses in actual document text reduces hallucination for domain-specific content.

What you have: Document retrieval via vector search. The agent can answer questions using your specific documents rather than general knowledge.

Step 3: Add Smart File Discovery

Add file discovery to avoid hardcoding document paths.

from gaia.agents.base.agent import Agent
from gaia.agents.base.console import AgentConsole
from gaia.agents.base.tools import tool
from gaia.rag.sdk import RAGSDK, RAGConfig
from fnmatch import fnmatch
from pathlib import Path

class SmartDocAgent(Agent):
    """Agent with smart document discovery."""

    def __init__(self, **kwargs):
        self.rag = RAGSDK(RAGConfig())
        self.indexed_files = set()
        super().__init__(**kwargs)

    def _get_system_prompt(self) -> str:
        indexed_docs = "\n".join(f"- {Path(f).name}" for f in self.indexed_files)

        return f"""You are an intelligent document assistant.

Indexed documents:
{indexed_docs or "None yet"}

**Smart Discovery Workflow:**
When user asks about something (e.g., "oil & gas manual"):
1. Use search_files to find it
2. Index it automatically
3. Then query to answer their question

This creates a more natural user experience where file paths don't need to be specified."""

    def _create_console(self):
        return AgentConsole()

    def _register_tools(self):
        @tool
        def search_files(pattern: str) -> dict:
            """Find files matching a pattern (searches common locations)."""
            # Search common directories
            search_paths = [
                Path.home() / "Documents",
                Path.home() / "Downloads",
                Path.home() / "Desktop",
                Path.cwd(),
            ]

            # Support "*" wildcards (agent sends patterns with them) and fall back to
            # substring matching by wrapping non-wildcard patterns with "*".
            normalized_pattern = pattern.lower()
            if "*" not in normalized_pattern and "?" not in normalized_pattern:
                normalized_pattern = f"*{normalized_pattern}*"

            found_files = []
            for search_path in search_paths:
                if search_path.exists():
                    # Find PDF files matching pattern
                    for pdf in search_path.rglob("*.pdf"):
                        if fnmatch(pdf.name.lower(), normalized_pattern):
                            found_files.append(str(pdf))

            return {
                "files": found_files,
                "count": len(found_files),
                "message": f"Found {len(found_files)} file(s)"
            }

        @tool
        def index_document(file_path: str) -> dict:
            """Index a document for searching."""
            if not Path(file_path).exists():
                return {"error": f"File not found: {file_path}"}

            result = self.rag.index_document(file_path)
            if result.get("success"):
                self.indexed_files.add(file_path)
                return {
                    "status": "success",
                    "chunks": result.get("num_chunks"),
                    "file": Path(file_path).name
                }
            return {"error": result.get("error")}

        @tool
        def query_documents(query: str) -> dict:
            """Search all indexed documents."""
            if not self.indexed_files:
                return {"error": "No documents indexed"}

            response = self.rag.query(query)
            return {
                "answer": response.text,
                "sources": [Path(f).name for f in response.source_files],
                "scores": response.chunk_scores
            }

# Use it
agent = SmartDocAgent()

# User can now ask naturally!
result = agent.process_query("Find and search the user manual for installation steps")
# Agent will:
# 1. Call search_files("user manual")
# 2. Call index_document(found_file)
# 3. Call query_documents("installation steps")
# 4. Return the answer!

Under the Hood: Smart Discovery Pattern

Example query: “Find the oil manual and tell me about safety”LLM orchestration (automatic):

// Step 1: Locate file
{
  "thought": "Need to find oil manual first",
  "tool": "search_files",
  "tool_args": {"pattern": "oil manual"}
}
// Returns: ["C:/Docs/Oil-Gas-Manual.pdf"]

// Step 2: Index document
{
  "thought": "Found file, index it for searching",
  "tool": "index_document",
  "tool_args": {"file_path": "C:/Docs/Oil-Gas-Manual.pdf"}
}
// Returns: {"status": "success", "chunks": 150}

// Step 3: Query for safety info
{
  "thought": "Document indexed, search for safety",
  "tool": "query_documents",
  "tool_args": {"query": "safety"}
}
// Returns: Relevant chunks about safety protocols

// Step 4: Synthesize answer
{
  "answer": "According to the Oil & Gas Manual, safety protocols include..."
}

Implementation note: You define Python functions. The LLM decides when to call them based on tool schemas and user intent.

What you have: File discovery without hardcoded paths. The agent can locate, index, and query documents based on pattern matching.

Next Steps

You’ve built a functional document Q&A agent with RAG and file discovery! Continue with Part 2 to add more capabilities:

Part 2: Advanced Features & Customization

Add tool mixins, file monitoring, session persistence, and learn how to customize your agent for specific use cases.

Getting Started

User Guides

Playbooks

SDK Reference

Part 1: Getting Started with Document Q&A Agents

Why Build This Agent?

The Architecture (What You’re Building)

Quick Start (5 Minutes)

Understanding the Components

1. The Agent Base Class

2. The Tool Decorator

3. RAG SDK (Document Retrieval)

4. Tool Mixins

Building It: The Step-by-Step Journey

Step 1: The Simplest Possible Agent

Step 2: Add RAG (Document Understanding)

Step 3: Add Smart File Discovery

Next Steps

Part 2: Advanced Features & Customization

Getting Started

User Guides

Playbooks

SDK Reference

​Why Build This Agent?

​The Architecture (What You’re Building)

​Quick Start (5 Minutes)

​Understanding the Components

​1. The Agent Base Class

​2. The Tool Decorator

​3. RAG SDK (Document Retrieval)

​4. Tool Mixins

​Building It: The Step-by-Step Journey

​Step 1: The Simplest Possible Agent

​Step 2: Add RAG (Document Understanding)

​Step 3: Add Smart File Discovery

​Next Steps

Part 2: Advanced Features & Customization

Why Build This Agent?

The Architecture (What You’re Building)

Quick Start (5 Minutes)

Understanding the Components

1. The Agent Base Class

2. The Tool Decorator

3. RAG SDK (Document Retrieval)

4. Tool Mixins

Building It: The Step-by-Step Journey

Step 1: The Simplest Possible Agent

Step 2: Add RAG (Document Understanding)

Step 3: Add Smart File Discovery

Next Steps