Glossary

This glossary defines technical terms, acronyms, and concepts used throughout the GAIA documentation. Terms are organized alphabetically for easy reference.

A

Activation Script

A shell script that activates a Python virtual environment, making its packages available in the current terminal session.

Agent

An AI system that can autonomously plan, reason, and use tools to accomplish tasks. In GAIA, agents extend the base Agent class, register tools via _register_tools(), and follow an iterative loop: think about the task, act by calling tools, observe results, and reason about next steps. Built-in agents include ChatAgent, CodeAgent, JiraAgent, and BlenderAgent. See also: Agent Loop, Tool, Mixin.

Agent Loop

The cyclic process an agent follows: thinking about the task, acting by calling tools, observing the results, and reasoning about next steps.

Agent State

The current phase of agent processing, such as planning, executing, error recovery, or completion.

AgentConsole

GAIA’s colorful command-line interface that provides formatted output for agent operations, making it easier to follow agent reasoning.

Agentic RAG

Retrieval-Augmented Generation with multi-step reasoning capabilities, where the agent can iteratively refine queries and synthesize information from multiple sources. Unlike traditional RAG (one search, one answer), Agentic RAG can make multiple retrieval calls, refine queries based on initial results, and know when documents don’t contain the needed information.

ANTHROPIC_API_KEY

Environment variable containing the API key for accessing Anthropic’s Claude models via their API.

ApiAgent

A GAIA base class for agents exposed via OpenAI-compatible REST API. Subclasses implement get_model_id() to define their model name (shown in /v1/models) and get_model_info() to describe capabilities. See also: MCP Server, OpenAI-compatible API.

API Endpoint

A specific URL path on a server that handles particular requests, such as /v1/chat/completions for chat interactions.

ASR (Automatic Speech Recognition)

Technology that converts spoken audio into text. GAIA uses OpenAI’s Whisper model for speech recognition.

Audio Chunk

A segment of audio data processed at one time, typically measured in milliseconds or samples.

Audio Device Index

A numerical identifier for microphone or speaker hardware used by audio processing libraries.

ATLASSIAN_API_KEY

Environment variable containing the API token for authenticating with Atlassian Jira. Required for JiraAgent operations.

Auto-Discovery

The automatic detection and configuration of external service capabilities. Used by JiraAgent to discover available projects, issue types, statuses, and priorities from a Jira instance.

AWQ (Activation-Aware Weight Quantization)

An advanced quantization technique that reduces model size while preserving accuracy by considering activation patterns.

B

Base URL

The root address of an API server (e.g., http://localhost:8080), used as the foundation for all API endpoint paths.

Batch Experiment

Running evaluation tests on multiple inputs simultaneously to measure AI performance across diverse scenarios.

BlenderAgent

GAIA’s specialized agent for 3D content creation and Blender automation. Communicates with Blender via MCP to create objects, apply materials, and manage scenes through natural language commands.

C

Cache Directory

A folder where processed documents, embeddings, or other computed data are stored for faster retrieval.

Chat Completions Endpoint

An OpenAI-compatible API endpoint (/v1/chat/completions) that processes conversation history as a list of messages.

ChatAgent

GAIA’s agent for conversational interactions. Maintains conversation history, supports RAG for document Q&A, integrates with voice (Talk mode), and provides interactive commands like /clear, /history, and /stats. See also: ChatSDK, ChatSession.

ChatSDK

High-level interface for building chat applications in GAIA, providing conversation management, history, and memory features.

ChatSession

A manager for multi-context conversations, allowing switching between different conversation topics while maintaining history.

Chunk Overlap

The number of tokens that appear in both the end of one text chunk and the beginning of the next, providing context continuity.

Chunk Size

The number of tokens in each piece when splitting text for processing, typically ranging from 500-2000 tokens.

CLI (Command Line Interface)

A text-based interface for interacting with software through terminal commands, such as gaia chat or gaia talk.

CodeAgent

GAIA’s specialized agent for full-stack Next.js application generation. Creates complete projects with Prisma data models, REST API routes with Zod validation, React pages with Tailwind styling, and iterative TypeScript error fixing.

Command-line Parameter

Arguments passed to commands when executing them, such as --model or --debug.

Completions Endpoint

An OpenAI-compatible API endpoint (/v1/completions) that processes pre-formatted prompt strings directly.

Confidence Threshold

The minimum relevance score (typically 0.6-0.9) required to trust RAG retrieval results. Results below this threshold suggest the documents may not contain the needed information.

Configuration File

A JSON or YAML file (like settings.json) containing application settings and preferences.

Connection Pooling

Reusing network connections for multiple requests instead of creating new connections each time, improving performance.

Content Hashing

Generating a unique identifier for document content to detect when files have changed and need reprocessing.

Context Preservation

Maintaining important information when splitting text across chunks, ensuring coherent answers.

Context Size

The number of tokens allocated for LLM processing, configured via --ctx-size parameter when starting Lemonade Server. Larger values (e.g., 32768) allow processing more input but require more memory. Not the same as Context Window (the model’s inherent limit). See also: Context Window, Max Tokens.

Context Window

The maximum number of tokens an LLM can process at once, including both input and output. Modern models range from 4K to 1M+ tokens.

Conversation History

The record of past user and assistant messages in a chat session, used for context in subsequent responses.

Conversation Pair

A single exchange consisting of a user message and the corresponding assistant response.

Cosine Similarity

A mathematical measure of similarity between two vectors, ranging from -1 to 1, commonly used in semantic search.

Cost Tracking

Monitoring API usage and associated costs, especially important when using cloud-based LLMs.

D

Debug Mode

A verbose logging setting that provides detailed information about system operations for troubleshooting.

Disambiguation

The process of clarifying ambiguous user requests through follow-up questions. The RoutingAgent uses disambiguation to determine programming language and project type when not specified.

Document Chunking

The process of splitting large documents into smaller, manageable pieces for processing by LLMs or embedding models.

Document Indexing

The process of preparing documents for RAG by extracting text, splitting into chunks, generating embeddings, and storing in a vector database like FAISS.

E

Editable Install

Installing a Python package in development mode (pip install -e .) so code changes take effect immediately without reinstallation.

Editable Mode

See Editable Install.

Electron

A cross-platform desktop application framework using web technologies (HTML, CSS, JavaScript). GAIA uses Electron for desktop applications including the Jira WebUI and Chat WebUI, enabling native Windows/Linux apps with web-based interfaces.

Embeddings

Dense vector representations of text that capture semantic meaning. Text with similar meanings produces similar vectors, enabling semantic search even when exact words differ. GAIA uses embedding models like nomic-embed-text-v2-moe-GGUF for RAG. See also: Vector Search, Cosine Similarity, FAISS.

Environment Variable

A system-level configuration setting, such as LEMONADE_BASE_URL or ANTHROPIC_API_KEY.

Error Recovery

The ability of an agent to handle failures gracefully and continue operation, potentially retrying or using alternative approaches.

Evaluation Framework

A system for systematically testing AI performance against known correct answers or expected behaviors.

Exponential Backoff

A retry strategy that increases the wait time between retries exponentially (e.g., 1s, 2s, 4s, 8s).

F

FAISS (Facebook AI Similarity Search)

A library for efficient similarity search and clustering of dense vectors, commonly used for RAG systems.

FastAPI

A modern, high-performance Python web framework for building APIs. GAIA uses FastAPI for the OpenAI-compatible API server (gaia api start), providing endpoints like /v1/chat/completions and /v1/models. See also: Uvicorn, OpenAI-compatible API.

Few-shot Learning

Providing an LLM with a few example inputs and outputs to teach it how to perform a task.

FileToolsMixin

A GAIA mixin providing file operation tools (read, write, edit, search) that agents can use.

G

GEMM (General Matrix Multiply)

A fundamental mathematical operation in neural networks, crucial for AI computations and often hardware-accelerated.

GGUF (GPT-Generated Unified Format)

A file format for quantized LLM models, optimized for efficient loading and inference.

Ground Truth

Known correct answers used to evaluate AI system performance. In GAIA, generate ground truth via gaia groundtruth command, then compare model outputs against it using gaia eval. Essential for measuring accuracy, detecting regressions, and benchmarking models. See also: Evaluation Framework, Batch Experiment.

Grounding

Anchoring LLM responses in factual data or retrieved documents to reduce hallucinations.

H

Hallucination

When an LLM generates plausible-sounding but factually incorrect or nonsensical information.

Hardware Acceleration

Using specialized processors (NPU, GPU) instead of general-purpose CPUs for faster AI computations.

Hybrid Mode

Using both NPU and iGPU together to maximize AI performance on AMD Ryzen AI processors. The NPU handles efficient matrix operations while the iGPU provides additional compute capacity. Enabled automatically by Lemonade Server with compatible models. See also: NPU, iGPU, Lemonade Server.

I

iGPU (Integrated GPU)

A graphics processing unit built into the CPU chip, capable of accelerating AI workloads on AMD processors.

Image Extraction

Extracting images embedded in documents like PDFs for processing by vision models.

Index Persistence

Saving vector search indexes to disk so they can be loaded quickly without recomputing embeddings.

Inference

The process of running an AI model to generate predictions or outputs from input data.

Installation Directory

The folder where GAIA and its dependencies are installed on your system.

J

JiraAgent

GAIA’s specialized agent for Atlassian Jira integration. Provides natural language interface for searching, creating, and updating issues via the Jira REST API with auto-discovery of project configuration.

JQL (Jira Query Language)

A domain-specific query language for searching Jira issues. JiraAgent translates natural language queries into JQL for execution.

JSON Schema

A standard format for describing the structure and validation rules of JSON data, used for tool parameter definitions.

K

Kokoro TTS

A lightweight, high-quality text-to-speech engine used by GAIA for voice output in Talk mode. Runs locally without cloud dependencies. See also: TTS (Text-to-Speech), ASR.

L

Language Detection

The RoutingAgent’s ability to identify programming languages and frameworks from natural language prompts, used to configure specialized agents appropriately.

Lemonade Server

AMD’s optimized LLM serving platform providing hardware-accelerated inference on Ryzen AI processors. Supports NPU/iGPU hybrid mode, model management, and an OpenAI-compatible API. Start with lemonade-server serve. Required for most GAIA operations. See also: NPU, Hybrid Mode, LEMONADE_BASE_URL.

LEMONADE_BASE_URL

Environment variable specifying the address of the Lemonade Server (e.g., http://localhost:8080).

LLM (Large Language Model)

AI models trained on vast text corpora that can understand and generate human-like text (e.g., Qwen, Claude, GPT).

LRU Eviction (Least Recently Used)

A memory management strategy that removes the least recently accessed items when space is needed.

M

Max File Size

A configured limit on the size of documents that can be processed, typically to prevent memory issues.

Max History Length

The number of conversation pairs retained in chat history before older messages are removed.

Max Tokens

The maximum length of an LLM response, measured in tokens.

MCP (Model Context Protocol)

A standardized protocol for integrating AI agents with external tools and services. Enables GAIA agents to be used from VSCode, Claude Desktop, and other MCP-compatible clients. GAIA’s BlenderAgent uses MCP to communicate with Blender. See also: MCPAgent, MCP Server.

MCPAgent

A GAIA base class for agents compatible with the Model Context Protocol. Subclasses implement get_mcp_tool_definitions() for tool schemas, execute_mcp_tool() for tool execution, and optionally get_mcp_resources() to expose data URIs. Enables integration with VSCode, Claude Desktop, and other MCP clients. See also: MCP Server, MCP Tools.

MCP Server

A service that exposes agent capabilities through the Model Context Protocol.

MCP Tools

Functions and capabilities exposed through an MCP server that agents can call.

Mixin

A reusable class that provides a set of related tools to agents, following Python’s mixin pattern. Examples include FileToolsMixin (file operations), ShellToolsMixin (shell commands), and RAGToolsMixin (document Q&A). Agents inherit from multiple mixins to combine capabilities. See also: Tool, Tool Registry.

Model ID

A unique identifier for a specific LLM, such as Qwen3-0.6B-GGUF.

Multi-step Reasoning

An agent’s ability to break complex tasks into steps and execute them sequentially or adaptively.

Multi-Turn Conversation

A conversation with multiple exchanges where context from previous turns is preserved and used to inform responses.

N

Next.js

A React framework for building full-stack web applications. GAIA’s CodeAgent generates Next.js projects with TypeScript, Prisma, and Tailwind CSS.

NPU (Neural Processing Unit)

A dedicated AI accelerator in AMD Ryzen AI processors, optimized for running neural networks efficiently.

NPU Offload

Running AI workloads on the NPU instead of the CPU for better performance and energy efficiency.

NSIS (Nullsoft Scriptable Install System)

An open-source system for creating Windows installers, used for GAIA’s Windows installation package.

O

OCR (Optical Character Recognition)

Technology that converts images of text (like scanned documents) into machine-readable text.

OGA (ONNX Runtime GenAI)

Microsoft’s ONNX-based inference runtime for generative AI, optimized for AMD hardware.

ONNX (Open Neural Network Exchange)

An open standard format for representing machine learning models, enabling cross-platform deployment.

OPENAI_API_KEY

Environment variable containing the API key for accessing OpenAI’s API services.

OpenAI-compatible API

An API that follows OpenAI’s endpoint structure and request/response format, allowing tool compatibility.

P

Page Boundary

The point where one PDF page ends and another begins, important for maintaining context in document processing.

PATH Environment Variable

A system variable listing directories where the operating system searches for executable programs.

PDF Extraction

The process of extracting text, images, and structure from PDF documents.

Per-file Indexing

Creating separate vector search indexes for each document rather than one combined index.

Perplexity API

An external web search API that can be integrated with GAIA agents for real-time information retrieval beyond indexed documents.

Performance Metrics

Quantitative measurements of system behavior, such as tokens per second or time to first token.

Prisma

A TypeScript/JavaScript ORM (Object-Relational Mapping) for database access. GAIA’s CodeAgent generates Prisma schemas with SQLite for data persistence. Includes automatic ID generation, timestamps, and type-safe queries. See also: Next.js, Zod.

Project Type

Classification of code projects by their architecture: frontend (UI only), backend (API only), fullstack (both), or script (utilities/CLI). The RoutingAgent uses this to configure CodeAgent appropriately.

Prompt

The input text provided to an LLM, including instructions, context, and the user’s question or request.

Prompt Engineering

The practice of crafting effective prompts to elicit desired behaviors from LLMs.

PyPI (Python Package Index)

The official repository for distributing Python packages, accessible via pip install.

Q

Quantization

Reducing the precision of model weights (e.g., from float32 to int4) to decrease model size and increase inference speed.

Quick RAG

A GAIA convenience function (quick_rag()) that indexes documents and answers a question in a single call. The index is temporary (not persisted), making it ideal for one-off queries or prototyping. For repeated queries, use RAGSDK with a persistent cache. See also: RAGSDK, RAGConfig.

R

RAG (Retrieval-Augmented Generation)

A technique combining document search with LLM generation, allowing models to answer questions using retrieved information. Process: (1) chunk documents, (2) generate embeddings, (3) store in vector database, (4) search for relevant chunks, (5) include in LLM prompt. Solves the knowledge cutoff problem. See also: RAGSDK, Agentic RAG, Document Indexing.

RAGConfig

Configuration dataclass for RAGSDK settings including chunk_size, chunk_overlap, max_chunks, embedding model, and cache directory.

RAGToolsMixin

A GAIA mixin providing document Q&A capabilities to agents through RAG functionality.

RAGSDK

GAIA’s high-level interface for document Q&A. Handles document indexing, embedding generation, vector storage (FAISS), and query processing. Supports persistent indexes, configurable chunking, and relevance scoring. Configure via RAGConfig. See also: Quick RAG, Agentic RAG.

Relevance Score

A numerical measure (0.0-1.0) of how well a retrieved document chunk matches a search query. Scores above 0.7 indicate strong matches; below 0.5 suggests the documents may not contain the needed information. Accessible via response.chunk_scores in RAGSDK. See also: Confidence Threshold, Cosine Similarity.

Resource Cleanup

Properly freeing memory, closing connections, and releasing system resources when no longer needed.

Response Truncation

Cutting off LLM output when it exceeds maximum token limits or becomes excessively long.

REST API

An API following Representational State Transfer principles, using HTTP methods (GET, POST, etc.) for operations.

Retry Logic

Automatic retry mechanisms when operations fail, often with exponential backoff.

RoutingAgent

A GAIA agent that analyzes requests and intelligently routes them to specialized agents. Uses LLM-powered language detection to identify programming languages and project types, asks disambiguating questions when confidence is low, and configures target agents appropriately.

Ryzen AI

AMD’s brand for processors featuring integrated NPU hardware for AI acceleration.

Ryzen AI Driver

Software that enables NPU functionality on AMD Ryzen AI processors.

S

Sample Rate

The audio quality measurement (e.g., 16kHz, 24kHz) indicating how many samples per second are captured.

SDK (Software Development Kit)

A collection of tools, libraries, and documentation for building applications with a platform.

Semantic Boundary

Natural break points in text (like paragraph or section breaks) used for intelligent document chunking.

Semantic Chunking

Splitting text while preserving semantic meaning, using sentence or paragraph boundaries rather than arbitrary character counts.

Semantic Search

Finding information based on meaning rather than keyword matching, using embeddings and vector similarity.

Server-Sent Events (SSE)

A protocol for servers to push real-time updates to clients, commonly used for streaming LLM responses.

Session Management

Handling multiple concurrent conversations or user sessions, each with independent state and history.

ShellToolsMixin

A GAIA mixin providing shell command execution capabilities to agents.

Show Stats

A configuration option to display performance metrics like tokens per second and processing time.

Silent Installation

Installing software without user interface prompts, using flags like /S in Windows installers.

SilentConsole

A GAIA console implementation that suppresses all output, useful for programmatic agent usage or API server contexts.

Silent Mode

Suppressing agent console output, enabled by passing silent_mode=True to agent constructors. Internally uses SilentConsole. Useful for API servers, testing, or programmatic usage. See also: SilentConsole, AgentConsole.

Silence Threshold

The audio level sensitivity setting for detecting when a user starts or stops speaking.

SimpleChat

A lightweight chat wrapper in GAIA for basic conversational interactions without session management.

Source Triangulation

An Agentic RAG pattern that verifies information by checking if it appears in multiple independent documents. For example, confirming a policy detail appears in both the handbook and FAQ increases confidence. Useful for critical information where accuracy matters. See also: Agentic RAG, Confidence Threshold.

SSEOutputHandler

A GAIA output handler for Server-Sent Events streaming, used by the API server to deliver real-time agent responses to clients.

State Management

Tracking and managing an agent’s current progress, variables, and execution context.

Streaming

Real-time token-by-token delivery of LLM responses as they’re generated, rather than waiting for completion.

Synthetic Data

Artificially generated test data used for evaluation when real-world data is unavailable or insufficient.

System Prompt

Instructions that shape an LLM’s behavior, persona, and response style. In GAIA, agents define system prompts via _get_system_prompt(). Typically includes role definition, capabilities, constraints, and output format. View with /system command in ChatAgent. See also: Prompt, Prompt Engineering.

T

Tailwind CSS

A utility-first CSS framework for rapid UI development. GAIA’s CodeAgent generates Tailwind-styled Next.js applications.

Temperature

A parameter (typically 0.0-2.0) controlling randomness in LLM output. Lower values produce more deterministic responses.

Time to First Token (TTFT)

The latency between sending a request and receiving the first token of a response, measuring perceived responsiveness.

Timeout

The maximum time to wait for an operation before considering it failed.

Token

The basic unit of text processed by LLMs, roughly equivalent to 3/4 of an English word.

Tokens per Second

A performance metric measuring how quickly an LLM generates text output.

Tool

A Python function decorated with @tool that agents can call to perform actions. The decorator automatically generates a JSON schema from the function signature and docstring. Examples include file operations, database queries, API calls, and shell commands. See also: Tool Decorator, Tool Contract, Tool Registry.

Tool Calling

The LLM’s ability to invoke predefined functions to take actions or retrieve information.

Tool Contract

The JSON schema that describes a tool’s interface to the LLM, including parameter types, descriptions, and required fields. Generated automatically from Python function signatures.

Tool Decorator (@tool)

A Python decorator (@tool) that marks functions as callable by agents, automatically generating schemas.

Tool Execution

The process of running a tool function with provided arguments and returning results to the agent.

Tool Registry

The collection of tools registered with an agent, accessible for LLM tool calling. Built automatically when agents inherit from mixins or define tools in _register_tools().

Tool Schema

See Tool Contract. These terms are used interchangeably to describe the JSON interface definition for tools.

Transcription Queue

A buffer that stores speech-to-text output as it’s being processed, before delivery to the application.

TTS (Text-to-Speech)

Technology that converts written text into spoken audio. GAIA uses the Kokoro TTS system.

U

UV

A modern Python package manager written in Rust, offering 10-100x faster installation compared to pip.

Uvicorn

A lightning-fast ASGI server for Python web applications. Used by GAIA to serve the FastAPI-based API server.

V

Vector Search

Finding similar items by comparing their vector embeddings using distance metrics like cosine similarity.

Virtual Environment (.venv)

An isolated Python environment with its own packages, preventing conflicts between project dependencies.

VLM (Vision Language Model)

AI models that can process both images and text, enabling tasks like image captioning or visual question answering.

VLM Enhancement

Using vision models to extract text from images or scanned documents, improving OCR quality.

Voice Activity Detection (VAD)

Technology that detects when a user is actively speaking versus silence or background noise.

Voice Chat

Speech-based conversation where users speak instead of typing and receive spoken responses.

Voice Model Size

The variant of an ASR model (base, small, medium, large), trading accuracy for speed and memory usage.

W

Webhook

An HTTP callback that sends real-time data to a specified URL when events occur.

Whisper

OpenAI’s open-source speech recognition model, used by GAIA for ASR in Talk mode. Available in multiple sizes (tiny, base, small, medium, large) trading accuracy for speed. Configure with --whisper-model-size. See also: ASR, Voice Model Size, Kokoro TTS.

Z

Zero-shot

An LLM performing a task without any training examples, relying solely on its pre-training knowledge.

Zod

A TypeScript-first schema validation library. GAIA’s CodeAgent generates Zod schemas for API endpoint request validation.

Getting Started - Set up GAIA and run your first agent
SDK Reference - Comprehensive SDK documentation
FAQ - Frequently asked questions
Development Guide - Building and extending GAIA

Command Line

API & Development

Help

​Glossary

​A

​Activation Script

​Agent

​Agent Loop

​Agent State

​AgentConsole

​Agentic RAG

​ANTHROPIC_API_KEY

​ApiAgent

​API Endpoint

​ASR (Automatic Speech Recognition)

​Audio Chunk

​Audio Device Index

​ATLASSIAN_API_KEY

​Auto-Discovery

​AWQ (Activation-Aware Weight Quantization)

​B

​Base URL

​Batch Experiment

​BlenderAgent

​C

​Cache Directory

​Chat Completions Endpoint

​ChatAgent

​ChatSDK

​ChatSession

​Chunk Overlap

​Chunk Size

​CLI (Command Line Interface)

​CodeAgent

​Command-line Parameter

​Completions Endpoint

​Confidence Threshold

​Configuration File

​Connection Pooling

​Content Hashing

​Context Preservation

​Context Size

​Context Window

​Conversation History

​Conversation Pair

​Cosine Similarity

​Cost Tracking

​D

​Debug Mode

​Disambiguation

​Document Chunking

​Document Indexing

​E

​Editable Install

​Editable Mode

​Electron

​Embeddings

​Environment Variable

​Error Recovery

​Evaluation Framework

​Exponential Backoff

​F

​FAISS (Facebook AI Similarity Search)

​FastAPI

​Few-shot Learning

​FileToolsMixin

​G

​GEMM (General Matrix Multiply)

​GGUF (GPT-Generated Unified Format)

​Ground Truth

​Grounding

​H

​Hallucination

​Hardware Acceleration

​Hybrid Mode

​I

​iGPU (Integrated GPU)

​Image Extraction

​Index Persistence

​Inference