Test Utilities

Source Code: src/gaia/testing/

Component: Test Utilities (MockLLMProvider, MockVLMClient, fixtures) Module: gaia.testing (planned location) Import: from gaia.testing import MockLLMProvider, MockVLMClient, create_test_agent, temp_database

Overview

Test Utilities provide fixtures, mocks, and helper functions for testing GAIA agents without requiring real LLMs, databases, or network resources. This enables fast, reliable unit testing for both GAIA framework and third-party agents. Key Features:

Mock LLM provider with configurable responses
Mock VLM client for image processing
Temporary database fixtures with auto-cleanup
Agent test factory with mocked dependencies
Assertion helpers for common test patterns

Requirements

Functional Requirements

Mock LLM Provider
- Simulate LLM responses without API calls
- Configurable response sequences
- Track call history for assertions
Mock VLM Client
- Simulate image text extraction
- Configurable extracted text
- Track calls for testing
Temporary Database
- In-memory SQLite for testing
- Automatic cleanup
- Isolation between tests
Agent Test Factory
- Create agents with mocked dependencies
- Simplified agent instantiation
- Pre-configured for testing
Assertion Helpers
- Verify tool calls
- Check agent state
- Validate responses

Non-Functional Requirements

Performance
- Fast test execution (no real API calls)
- Minimal setup overhead
Ease of Use
- Simple, intuitive API
- Good defaults
- Clear error messages
Isolation
- Tests don’t interfere with each other
- Clean state between tests

API Specification

File Locations

src/gaia/testing/__init__.py
src/gaia/testing/fixtures.py
src/gaia/testing/assertions.py

Public Interface

fixtures.py

from typing import List, Dict, Any, Optional, Callable
from contextlib import contextmanager
import tempfile
from pathlib import Path

class MockLLMProvider:
    """
    Mock LLM provider for testing agents.

    Returns pre-configured responses instead of calling real LLM.
    Tracks all calls for test assertions.

    Usage:
        mock_llm = MockLLMProvider(
            responses=["First response", "Second response"]
        )

        # Inject into agent
        agent = MyAgent()
        agent.llm_provider = mock_llm

        # Test
        result = agent.process_query("Test")

        # Verify
        assert len(mock_llm.call_history) == 1
    """

    def __init__(self, responses: Optional[List[str]] = None):
        """
        Initialize mock LLM provider.

        Args:
            responses: List of responses to return in sequence
                      (cycles if more calls than responses)
        """
        self.responses = responses or ["Mock response"]
        self.call_history: List[Dict] = []
        self._response_index = 0

    def generate(self, prompt: str, **kwargs) -> str:
        """
        Generate mock response.

        Args:
            prompt: Input prompt (recorded but not used)
            **kwargs: Additional parameters (recorded)

        Returns:
            Next response from response list
        """
        self.call_history.append({
            "prompt": prompt,
            "kwargs": kwargs,
            "timestamp": time.time(),
        })

        response = self.responses[self._response_index % len(self.responses)]
        self._response_index += 1

        return response

    def complete(self, prompt: str, **kwargs) -> str:
        """Alias for generate() for compatibility."""
        return self.generate(prompt, **kwargs)

    def reset(self) -> None:
        """Reset call history and response index."""
        self.call_history = []
        self._response_index = 0

class MockVLMClient:
    """
    Mock VLM client for testing image processing.

    Returns pre-configured text instead of processing images.

    Usage:
        mock_vlm = MockVLMClient(
            extracted_text='{"name": "John", "dob": "1990-01-01"}'
        )

        # Inject into agent
        agent = MyAgent()
        agent.vlm = mock_vlm

        # Test
        result = agent.extract_form("test.png")

        # Verify
        assert mock_vlm.was_called
    """

    def __init__(self, extracted_text: str = "Mock extracted text"):
        """
        Initialize mock VLM.

        Args:
            extracted_text: Text to return from extract_from_image()
        """
        self.extracted_text = extracted_text
        self.call_history: List[Dict] = []

    def check_availability(self) -> bool:
        """Always return True for testing."""
        return True

    def extract_from_image(
        self,
        image_bytes: bytes,
        prompt: Optional[str] = None
    ) -> str:
        """
        Mock image extraction.

        Args:
            image_bytes: Image data (recorded but not processed)
            prompt: Extraction prompt (recorded)

        Returns:
            Pre-configured extracted text
        """
        self.call_history.append({
            "image_size": len(image_bytes),
            "prompt": prompt,
            "timestamp": time.time(),
        })

        return self.extracted_text

    @property
    def was_called(self) -> bool:
        """Check if extract_from_image was called."""
        return len(self.call_history) > 0

    def reset(self) -> None:
        """Reset call history."""
        self.call_history = []

def create_test_agent(
    agent_class: Type,
    mock_responses: Optional[List[str]] = None,
    **agent_kwargs
):
    """
    Create agent instance with mocked LLM for testing.

    Args:
        agent_class: Agent class to instantiate
        mock_responses: Responses for mock LLM
        **agent_kwargs: Additional arguments for agent constructor

    Returns:
        Agent instance with mocked LLM

    Usage:
        agent = create_test_agent(
            MyAgent,
            mock_responses=["I'll use the search tool"],
            custom_param="value"
        )

        result = agent.process_query("Test")
        # Uses mock LLM instead of real one
    """
    # Create mock LLM
    mock_llm = MockLLMProvider(responses=mock_responses)

    # Create agent with silent mode
    agent = agent_class(silent_mode=True, **agent_kwargs)

    # Inject mock LLM
    # Note: This requires agent to have llm_provider attribute
    # or similar. Adjust based on actual Agent implementation.
    if hasattr(agent, 'chat'):
        agent.chat = mock_llm
    elif hasattr(agent, 'llm_provider'):
        agent.llm_provider = mock_llm

    return agent

@contextmanager
def temp_database(schema_file: Optional[str] = None):
    """
    Create temporary SQLite database for testing.

    Auto-deletes after test completes.

    Args:
        schema_file: Optional SQL file to execute

    Yields:
        Database URL string

    Usage:
        with temp_database() as db_url:
            agent = MyAgent(db_url=db_url)
            # Test with real database
            agent.execute_insert("users", {"name": "Test"})

        # Database automatically deleted
    """
    with tempfile.NamedTemporaryFile(suffix='.db', delete=False) as f:
        db_path = f.name

    db_url = f"sqlite:///{db_path}"

    try:
        # Initialize database if schema provided
        if schema_file:
            import sqlite3
            conn = sqlite3.connect(db_path)
            with open(schema_file) as schema:
                conn.executescript(schema.read())
            conn.close()

        yield db_url

    finally:
        # Cleanup
        Path(db_path).unlink(missing_ok=True)

@contextmanager
def temp_directory():
    """
    Create temporary directory for testing.

    Auto-deletes after test completes.

    Yields:
        Path to temporary directory

    Usage:
        with temp_directory() as tmp_dir:
            # Create test files
            (tmp_dir / "test.txt").write_text("content")

            # Test agent
            agent = MyAgent(data_dir=str(tmp_dir))

        # Directory automatically deleted
    """
    with tempfile.TemporaryDirectory() as tmp_dir:
        yield Path(tmp_dir)

assertions.py

from typing import Any, Dict
import logging

logger = logging.getLogger(__name__)

def assert_tool_called(agent, tool_name: str, times: int = 1) -> None:
    """
    Assert that a tool was called specific number of times.

    Args:
        agent: Agent instance
        tool_name: Name of the tool
        times: Expected call count

    Raises:
        AssertionError: If tool was not called expected times

    Usage:
        agent = create_test_agent(MyAgent)
        agent.process_query("Search for data")

        assert_tool_called(agent, "search_data", times=1)
    """
    # Implementation depends on how agents track tool calls
    # This is a placeholder showing the intended API
    pass

def assert_tool_result(result: Dict, expected_keys: List[str]) -> None:
    """
    Assert that tool result has expected structure.

    Args:
        result: Tool result dictionary
        expected_keys: Keys that must be present

    Raises:
        AssertionError: If keys missing or result invalid

    Usage:
        result = agent.execute_tool("search", {"query": "test"})
        assert_tool_result(result, ["results", "count"])
    """
    assert isinstance(result, dict), f"Result must be dict, got {type(result)}"

    for key in expected_keys:
        assert key in result, f"Missing expected key: '{key}'"

def assert_agent_completed(result: Dict) -> None:
    """
    Assert that agent completed successfully.

    Args:
        result: Agent process_query result

    Raises:
        AssertionError: If agent did not complete

    Usage:
        result = agent.process_query("Test")
        assert_agent_completed(result)
    """
    assert isinstance(result, dict), "Agent should return dict"
    # Add more assertions based on Agent result structure

Testing Requirements

Unit Tests

File: tests/sdk/test_testing_utilities.py

import pytest
from gaia.testing import (
    MockLLMProvider,
    MockVLMClient,
    create_test_agent,
    temp_database,
    temp_directory,
)
from gaia import Agent

def test_mock_llm_can_be_imported():
    """Verify MockLLMProvider can be imported."""
    from gaia.testing import MockLLMProvider
    assert MockLLMProvider is not None

def test_mock_llm_returns_responses():
    """Test MockLLMProvider returns configured responses."""
    mock = MockLLMProvider(responses=["Response 1", "Response 2"])

    assert mock.generate("test") == "Response 1"
    assert mock.generate("test") == "Response 2"
    assert mock.generate("test") == "Response 1"  # Cycles

def test_mock_llm_tracks_calls():
    """Test MockLLMProvider tracks call history."""
    mock = MockLLMProvider()

    mock.generate("prompt 1", temperature=0.7)
    mock.generate("prompt 2", max_tokens=100)

    assert len(mock.call_history) == 2
    assert mock.call_history[0]["prompt"] == "prompt 1"
    assert mock.call_history[1]["kwargs"]["max_tokens"] == 100

def test_mock_vlm_can_be_imported():
    """Verify MockVLMClient can be imported."""
    from gaia.testing import MockVLMClient
    assert MockVLMClient is not None

def test_mock_vlm_returns_text():
    """Test MockVLMClient returns configured text."""
    mock = MockVLMClient(extracted_text="Extracted data")

    result = mock.extract_from_image(b"fake image bytes", prompt="Extract text")

    assert result == "Extracted data"
    assert mock.was_called

def test_temp_database_creates_and_cleans_up():
    """Test temp_database fixture."""
    import sqlite3

    with temp_database() as db_url:
        # Database should exist
        assert "sqlite:///" in db_url

        # Should be usable
        db_path = db_url.replace("sqlite:///", "")
        conn = sqlite3.connect(db_path)
        conn.execute("CREATE TABLE test (id INTEGER)")
        conn.close()

        # File exists
        assert Path(db_path).exists()

    # After context, file should be deleted
    assert not Path(db_path).exists()

def test_temp_database_with_schema(tmp_path):
    """Test temp_database with schema file."""
    schema_file = tmp_path / "schema.sql"
    schema_file.write_text("CREATE TABLE users (id INTEGER PRIMARY KEY);")

    with temp_database(schema_file=str(schema_file)) as db_url:
        import sqlite3
        db_path = db_url.replace("sqlite:///", "")
        conn = sqlite3.connect(db_path)

        # Table should exist from schema
        cursor = conn.execute("SELECT name FROM sqlite_master WHERE type='table'")
        tables = [row[0] for row in cursor.fetchall()]

        assert "users" in tables
        conn.close()

def test_temp_directory_creates_and_cleans_up():
    """Test temp_directory fixture."""
    created_path = None

    with temp_directory() as tmp_dir:
        created_path = tmp_dir

        # Directory should exist
        assert tmp_dir.exists()
        assert tmp_dir.is_dir()

        # Should be usable
        test_file = tmp_dir / "test.txt"
        test_file.write_text("content")
        assert test_file.exists()

    # After context, directory should be deleted
    assert not created_path.exists()

def test_create_test_agent():
    """Test create_test_agent helper."""
    class TestAgent(Agent):
        def _get_system_prompt(self): return "Test"
        def _create_console(self):
            from gaia import SilentConsole
            return SilentConsole()
        def _register_tools(self): pass

    agent = create_test_agent(
        TestAgent,
        mock_responses=["Response from mock"]
    )

    assert agent is not None
    assert isinstance(agent, TestAgent)
    # Should have mocked LLM (verify based on implementation)

Usage Examples

Example 1: Testing Agent with Mock LLM

import pytest
from gaia.testing import create_test_agent, MockLLMProvider
from my_agent import MyAgent

def test_agent_processes_query():
    """Test agent with mocked LLM."""
    agent = create_test_agent(
        MyAgent,
        mock_responses=[
            '{"tool": "search", "args": {"query": "test"}}',
            "Found results from search."
        ]
    )

    result = agent.process_query("Find test data")

    assert "answer" in result
    # LLM mock was used instead of real API

Example 2: Testing Database Agent

import pytest
from gaia.testing import temp_database
from gaia import Agent, DatabaseMixin

def test_customer_agent_creates_record(tmp_path):
    """Test agent can create database records."""
    # Create schema
    schema = tmp_path / "schema.sql"
    schema.write_text("""
        CREATE TABLE customers (
            id INTEGER PRIMARY KEY,
            name TEXT NOT NULL
        );
    """)

    # Test with temporary database
    with temp_database(schema_file=str(schema)) as db_url:
        agent = CustomerAgent(db_url=db_url)

        # Create customer
        result = agent.execute_tool("create_customer", {
            "name": "Test Customer"
        })

        assert result["status"] == "created"

        # Verify in database
        customers = agent.execute_query("SELECT * FROM customers")
        assert len(customers) == 1
        assert customers[0]["name"] == "Test Customer"

    # Database automatically cleaned up

Example 3: Testing VLM Integration

import pytest
from gaia.testing import MockVLMClient
from my_agent import FormProcessingAgent

def test_form_extraction():
    """Test form extraction with mock VLM."""
    agent = FormProcessingAgent()

    # Mock VLM response
    agent.vlm = MockVLMClient(
        extracted_text='{"first_name": "John", "last_name": "Doe"}'
    )

    # Test extraction
    result = agent.execute_tool("extract_form", {"image_path": "fake.png"})

    # Verify VLM was called
    assert agent.vlm.was_called
    assert len(agent.vlm.call_history) == 1

    # Verify result
    assert "first_name" in result
    assert result["first_name"] == "John"

Example 4: Testing File Watching

import pytest
from gaia.testing import temp_directory
import time

def test_agent_processes_new_files():
    """Test agent processes files dropped in watched directory."""
    with temp_directory() as tmp_dir:
        agent = FileWatchAgent(watch_dir=str(tmp_dir))

        # Drop a file
        test_file = tmp_dir / "test.pdf"
        test_file.write_text("content")

        # Wait for processing
        time.sleep(1)

        # Verify agent processed it
        assert agent.processed_count == 1

    # Directory cleaned up automatically

pytest Fixtures

Fixture Exports

File: tests/conftest.py (for GAIA framework tests)

import pytest
from gaia.testing import temp_database, temp_directory, MockLLMProvider

@pytest.fixture
def mock_llm():
    """Provide MockLLMProvider fixture."""
    return MockLLMProvider()

@pytest.fixture
def temp_db():
    """Provide temporary database fixture."""
    with temp_database() as db_url:
        yield db_url

@pytest.fixture
def tmp_dir():
    """Provide temporary directory fixture."""
    with temp_directory() as tmp_path:
        yield tmp_path

Third-party agents can create similar fixtures in their test suites.

Documentation Updates Required

SDK.md

Add new section:

## 11. Testing

### Testing Your Agents

**Import:** `from gaia.testing import MockLLMProvider, temp_database, create_test_agent`

**Purpose:** Test agents without real LLMs, databases, or network calls.

#### Mock LLM Provider

```python
from gaia.testing import create_test_agent

agent = create_test_agent(
    MyAgent,
    mock_responses=["I'll search for that", "Here are the results"]
)

result = agent.process_query("Find data")
# Uses mocked responses instead of real LLM

[Full documentation with more examples]

---

## Dependencies

```toml
# pyproject.toml
[project.optional-dependencies]
dev = [
    "pytest>=7.0",
    "pytest-asyncio",
]

Acceptance Criteria

Implementation Checklist

Step 1: Create Files

Create src/gaia/testing/ directory
Create src/gaia/testing/__init__.py
Create src/gaia/testing/fixtures.py
Create src/gaia/testing/assertions.py

Step 2: Implement Mocks

MockLLMProvider class
MockVLMClient class
Response cycling logic
Call history tracking

Step 3: Implement Fixtures

create_test_agent() function
temp_database() context manager
temp_directory() context manager
Schema loading for temp DB

Step 4: Implement Assertions

assert_tool_called()
assert_tool_result()
assert_agent_completed()

Step 5: Write Tests

Step 6: Export

Export from src/gaia/testing/__init__.py
Export from src/gaia/__init__.py
Add to __all__

Step 7: Document

Add section to SDK.md
Add usage examples
Update agent testing guide

Step 8: Validate

Can import: from gaia.testing import MockLLMProvider
Example tests work
All tests pass
External agents can use it

Test Utilities Technical Specification

Core Framework

SDKs

Infrastructure

Code Infrastructure

Tool Mixins

Packaging

Agents & Apps

Overview

Requirements

Functional Requirements

Non-Functional Requirements

API Specification

File Locations

Public Interface

fixtures.py

assertions.py

Testing Requirements

Unit Tests

Usage Examples

Example 1: Testing Agent with Mock LLM

Example 2: Testing Database Agent

Example 3: Testing VLM Integration

Example 4: Testing File Watching

pytest Fixtures

Fixture Exports

Documentation Updates Required

SDK.md

Acceptance Criteria

Implementation Checklist

Step 1: Create Files

Step 2: Implement Mocks

Step 3: Implement Fixtures

Step 4: Implement Assertions

Step 5: Write Tests

Step 6: Export

Step 7: Document

Step 8: Validate

Core Framework

SDKs

Infrastructure

Code Infrastructure

Tool Mixins

Packaging

Agents & Apps

​Overview

​Requirements

​Functional Requirements

​Non-Functional Requirements

​API Specification

​File Locations

​Public Interface

​fixtures.py

​assertions.py

​Testing Requirements

​Unit Tests

​Usage Examples

​Example 1: Testing Agent with Mock LLM

​Example 2: Testing Database Agent

​Example 3: Testing VLM Integration

​Example 4: Testing File Watching

​pytest Fixtures

​Fixture Exports

​Documentation Updates Required

​SDK.md

​Acceptance Criteria

​Implementation Checklist

​Step 1: Create Files

​Step 2: Implement Mocks

​Step 3: Implement Fixtures

​Step 4: Implement Assertions

​Step 5: Write Tests

​Step 6: Export

​Step 7: Document

​Step 8: Validate

Overview

Requirements

Functional Requirements

Non-Functional Requirements

API Specification

File Locations

Public Interface

fixtures.py

assertions.py

Testing Requirements

Unit Tests

Usage Examples

Example 1: Testing Agent with Mock LLM

Example 2: Testing Database Agent

Example 3: Testing VLM Integration

Example 4: Testing File Watching

pytest Fixtures

Fixture Exports

Documentation Updates Required

SDK.md

Acceptance Criteria

Implementation Checklist

Step 1: Create Files

Step 2: Implement Mocks

Step 3: Implement Fixtures

Step 4: Implement Assertions

Step 5: Write Tests

Step 6: Export

Step 7: Document

Step 8: Validate