Skip to main content
Source Code:
Primary API: create_client() factory function Module: gaia.llm Imports:
  • from gaia.llm import create_client (preferred)
  • from gaia.llm import LLMClient, NotSupportedError

Overview

The LLM client package provides a unified interface for generating text from multiple LLM backends using a provider pattern. Each provider implements the abstract LLMClient interface, with optional methods raising NotSupportedError when unavailable. Key Features:
  • Factory-based client creation with create_client()
  • Three providers: Lemonade (local AMD-optimized), OpenAI, Claude
  • Abstract base class for type safety and extensibility
  • Graceful handling of unsupported features via NotSupportedError
  • Streaming and non-streaming generation
  • Backward-compatible use_claude/use_openai flags
Provider Capabilities:
MethodLemonadeOpenAIClaude
generate()
chat()
embed()
vision()
get_performance_stats()
load_model()
unload_model()
Methods marked with ✗ raise NotSupportedError when called on that provider.

Requirements

Functional Requirements

  1. Factory Pattern
    • create_client() factory function for client creation
    • Explicit provider selection via provider parameter (“lemonade”, “openai”, “claude”)
    • Backward-compatible use_claude/use_openai flags
    • Auto-detection of provider from flags when provider not specified
    • Default to Lemonade provider when no flags set
  2. Abstract Interface
    • LLMClient ABC defines unified interface
    • provider_name property returns provider name
    • Required methods (all providers must implement):
      • generate() - Text completion
      • chat() - Chat completion with message history
    • Optional methods (raise NotSupportedError if not implemented):
      • embed() - Generate embeddings
      • vision() - Vision/image understanding
      • get_performance_stats() - Performance statistics
      • load_model() - Load a model
      • unload_model() - Unload current model
  3. Provider Implementations
    • LemonadeProvider: Full support for all methods, connects to local Lemonade server
    • OpenAIProvider: generate, chat, embed only
    • ClaudeProvider: generate, chat, vision only
    • All providers support streaming and non-streaming modes
  4. Error Handling
    • NotSupportedError raised for unsupported methods
    • Clear error messages indicating provider and unsupported method
    • Connection errors handled by underlying provider implementations

Non-Functional Requirements

  1. Performance
    • Lazy provider loading via importlib (load only when needed)
    • Minimal overhead from abstraction layer
    • Streaming support across all providers
    • Default temperature of 0.1 for deterministic responses
  2. Reliability
    • Type safety through ABC pattern
    • Graceful handling of unsupported features
    • Clear error messages for provider capabilities
    • Provider-specific connection management
  3. Usability
    • Simple factory function interface
    • Backward compatibility with existing code
    • Consistent API across all providers
    • Clear documentation with examples

API Specification

Package Structure

src/gaia/llm/
├── __init__.py              # Package exports
├── base_client.py           # Abstract LLMClient interface
├── factory.py               # create_client() factory function
├── exceptions.py            # NotSupportedError
├── lemonade_client.py       # Low-level REST client for Lemonade
└── providers/
    ├── lemonade.py          # LemonadeProvider
    ├── openai_provider.py   # OpenAIProvider
    └── claude.py            # ClaudeProvider

Package Exports (__init__.py)

from gaia.llm import create_client, LLMClient, NotSupportedError

Factory Function (factory.py)

def create_client(
    provider: Optional[str] = None,
    use_claude: bool = False,
    use_openai: bool = False,
    **kwargs,
) -> LLMClient:
    """
    Create an LLM client, auto-detecting provider from parameters.

    Args:
        provider: Explicit provider name ("lemonade", "openai", or "claude").
                  If not specified, auto-detected from use_claude/use_openai flags.
        use_claude: If True, use Claude provider (ignored if provider is specified)
        use_openai: If True, use OpenAI provider (ignored if provider is specified)
        **kwargs: Provider-specific arguments (base_url, model, api_key, etc.)

    Returns:
        LLMClient instance for the specified or detected provider

    Raises:
        ValueError: If provider is unknown or both use_claude and use_openai are True

    Examples:
        # Default Lemonade provider
        client = create_client()

        # Explicit provider selection
        client = create_client(provider="lemonade", model="Qwen3-0.6B-GGUF")
        client = create_client(provider="openai", api_key="sk-...")
        client = create_client(provider="claude", api_key="sk-ant-...")

        # Backward-compatible flags
        client = create_client(use_claude=True, api_key="sk-ant-...")
        client = create_client(use_openai=True, api_key="sk-...")

    Note:
        Provider defaults to "lemonade" when no flags are set.
        The design maintains backward compatibility while allowing explicit provider selection.
    """

Abstract Base Class (base_client.py)

from abc import ABC, abstractmethod
from typing import Iterator, Union

class LLMClient(ABC):
    """
    Unified LLM client interface.

    Methods raise NotSupportedError if not available for this provider.
    """

    @property
    @abstractmethod
    def provider_name(self) -> str:
        """Return the provider name for error messages."""
        ...

    @abstractmethod
    def generate(
        self,
        prompt: str,
        model: str | None = None,
        stream: bool = False,
        **kwargs,
    ) -> Union[str, Iterator[str]]:
        """
        Generate text completion.

        Args:
            prompt: The user prompt/query to send to the LLM
            model: The model to use (defaults to provider's default model)
            stream: If True, returns a generator that yields chunks of the response
            **kwargs: Additional parameters (temperature, max_tokens, etc.)

        Returns:
            If stream=False: The complete generated text as a string
            If stream=True: A generator yielding chunks of the response

        Example:
            response = client.generate("Write a hello world program")
        """
        ...

    @abstractmethod
    def chat(
        self,
        messages: list[dict],
        model: str | None = None,
        stream: bool = False,
        **kwargs,
    ) -> Union[str, Iterator[str]]:
        """
        Chat completion with message history.

        Args:
            messages: List of message dicts with 'role' and 'content' keys
            model: The model to use (defaults to provider's default model)
            stream: If True, returns a generator that yields chunks of the response
            **kwargs: Additional parameters (temperature, max_tokens, etc.)

        Returns:
            If stream=False: The complete generated text as a string
            If stream=True: A generator yielding chunks of the response

        Example:
            messages = [
                {"role": "user", "content": "Hello"},
                {"role": "assistant", "content": "Hi there!"},
                {"role": "user", "content": "How are you?"}
            ]
            response = client.chat(messages)
        """
        ...

    # Optional methods - default raises NotSupportedError
    def embed(self, texts: list[str], **kwargs) -> list[list[float]]:
        """
        Generate embeddings for texts.

        Args:
            texts: List of text strings to embed
            **kwargs: Additional parameters (e.g., model="text-embedding-3-small" for OpenAI)

        Returns:
            List of embedding vectors (list of floats)

        Raises:
            NotSupportedError: If provider doesn't support embeddings

        Note:
            Supported by: Lemonade, OpenAI (default: "text-embedding-3-small")
            Not supported by: Claude
        """
        raise NotSupportedError(self.provider_name, "embed")

    def vision(self, images: list[bytes], prompt: str, **kwargs) -> str:
        """
        Vision/image understanding.

        Args:
            images: List of image data as bytes
            prompt: Text prompt describing what to analyze
            **kwargs: Additional parameters

        Returns:
            Text response describing the image

        Raises:
            NotSupportedError: If provider doesn't support vision

        Note:
            Supported by: Lemonade, Claude
            Not supported by: OpenAI
        """
        raise NotSupportedError(self.provider_name, "vision")

    def get_performance_stats(self) -> dict:
        """
        Get performance statistics from the last LLM request.

        Returns:
            Dictionary containing performance statistics

        Raises:
            NotSupportedError: If provider doesn't support performance stats

        Note:
            Only supported by: Lemonade
        """
        raise NotSupportedError(self.provider_name, "get_performance_stats")

    def load_model(self, model_name: str, **kwargs) -> None:
        """
        Load a specific model.

        Args:
            model_name: Name of the model to load
            **kwargs: Additional parameters

        Raises:
            NotSupportedError: If provider doesn't support model loading

        Note:
            Only supported by: Lemonade
        """
        raise NotSupportedError(self.provider_name, "load_model")

    def unload_model(self) -> None:
        """
        Unload the current model.

        Raises:
            NotSupportedError: If provider doesn't support model unloading

        Note:
            Only supported by: Lemonade
        """
        raise NotSupportedError(self.provider_name, "unload_model")

NotSupportedError (exceptions.py)

class NotSupportedError(Exception):
    """Raised when a provider doesn't support a method."""

    def __init__(self, provider: str, method: str):
        super().__init__(f"{provider} does not support {method}")

Provider Implementations

LemonadeProvider (providers/lemonade.py)

Full feature support - implements all methods.
class LemonadeProvider(LLMClient):
    """Lemonade provider - local AMD-optimized inference."""

    def __init__(
        self,
        model: Optional[str] = None,
        base_url: Optional[str] = None,
        host: Optional[str] = None,
        port: Optional[int] = None,
        system_prompt: Optional[str] = None,
        **kwargs,
    ):
        """
        Initialize Lemonade provider.

        Args:
            model: Model name (defaults to "Qwen3-0.6B-GGUF")
            base_url: Base URL for Lemonade server (overrides LEMONADE_BASE_URL env var)
            host: Server host (alternative to base_url)
            port: Server port (alternative to base_url)
            system_prompt: Default system prompt for chat
            **kwargs: Additional arguments passed to LemonadeClient

        Environment:
            LEMONADE_BASE_URL: Default base URL (http://localhost:8000/api/v1)
            LEMONADE_MODEL: Default model name if not specified

        Note:
            Default model is "Qwen3-0.6B-GGUF" for CPU-only inference.
            All methods use temperature=0.1 by default for deterministic responses.
        """

    # Supports all methods: generate, chat, embed, vision,
    # get_performance_stats, load_model, unload_model

OpenAIProvider (providers/openai_provider.py)

Partial support - generate, chat, embed only.
class OpenAIProvider(LLMClient):
    """OpenAI (OpenAI API) provider."""

    def __init__(
        self,
        api_key: Optional[str] = None,
        model: str = "gpt-4o",
        system_prompt: Optional[str] = None,
        **_kwargs,
    ):
        """
        Initialize OpenAI provider.

        Args:
            api_key: OpenAI API key (defaults to OPENAI_API_KEY env var)
            model: Model name (default: "gpt-4o")
            system_prompt: Default system prompt for chat

        Environment:
            OPENAI_API_KEY: API key for OpenAI
        """

    # Supports: generate, chat, embed
    # Raises NotSupportedError: vision, get_performance_stats, load_model, unload_model

ClaudeProvider (providers/claude.py)

Partial support - generate, chat, vision only.
class ClaudeProvider(LLMClient):
    """Claude (Anthropic) provider."""

    def __init__(
        self,
        api_key: Optional[str] = None,
        model: str = "claude-3-5-sonnet-20241022",
        system_prompt: Optional[str] = None,
        **_kwargs,
    ):
        """
        Initialize Claude provider.

        Args:
            api_key: Anthropic API key (defaults to ANTHROPIC_API_KEY env var)
            model: Model name (default: "claude-3-5-sonnet-20241022")
            system_prompt: Default system prompt for chat

        Environment:
            ANTHROPIC_API_KEY: API key for Anthropic Claude

        Raises:
            ImportError: If anthropic package not installed
        """

    # Supports: generate, chat, vision
    # Raises NotSupportedError: embed, get_performance_stats, load_model, unload_model

Implementation Details

Provider Selection Logic

The factory function auto-detects the provider based on parameters:
# From factory.py
def create_client(provider=None, use_claude=False, use_openai=False, **kwargs):
    # Auto-detect provider from flags if not explicitly specified
    if provider is None:
        if use_claude and use_openai:
            raise ValueError("Cannot specify both use_claude and use_openai")
        elif use_claude:
            provider = "claude"
        elif use_openai:
            provider = "openai"
        else:
            provider = "lemonade"  # Default

    # Validate provider
    if provider.lower() not in _PROVIDERS:
        available = ", ".join(_PROVIDERS.keys())
        raise ValueError(f"Unknown provider: {provider}. Available: {available}")

    # Load provider class dynamically...

Lazy Provider Loading

Providers are loaded dynamically using importlib to avoid importing unnecessary dependencies:
_PROVIDERS = {
    "lemonade": "gaia.llm.providers.lemonade.LemonadeProvider",
    "openai": "gaia.llm.providers.openai_provider.OpenAIProvider",
    "claude": "gaia.llm.providers.claude.ClaudeProvider",
}

# Lazy import - only load when needed
module_path, class_name = _PROVIDERS[provider_lower].rsplit(".", 1)
module = importlib.import_module(module_path)
provider_class = getattr(module, class_name)

return provider_class(**kwargs)

NotSupportedError Pattern

Optional methods raise NotSupportedError by default in the ABC:
# In base_client.py
class LLMClient(ABC):
    def embed(self, texts: list[str], **kwargs) -> list[list[float]]:
        raise NotSupportedError(self.provider_name, "embed")

    def vision(self, images: list[bytes], prompt: str, **kwargs) -> str:
        raise NotSupportedError(self.provider_name, "vision")
    # etc...
Providers override only the methods they support:
# OpenAIProvider overrides embed but not vision
class OpenAIProvider(LLMClient):
    def embed(self, texts: list[str], **kwargs):
        # Implementation for OpenAI embeddings
        response = self._client.embeddings.create(...)
        return [item.embedding for item in response.data]

    # vision() inherited - raises NotSupportedError

Temperature Defaults

All providers default to temperature=0.1 for deterministic responses:
# In LemonadeProvider
kwargs.setdefault("temperature", 0.1)
response = self._backend.completions(model=model, prompt=prompt, **kwargs)

Provider-Specific Implementation

LemonadeProvider wraps the low-level LemonadeClient:
class LemonadeProvider(LLMClient):
    def __init__(self, model=None, base_url=None, **kwargs):
        self._backend = LemonadeClient(model=model, base_url=base_url, **kwargs)

    def generate(self, prompt, model=None, stream=False, **kwargs):
        return self._backend.completions(prompt=prompt, stream=stream, **kwargs)
OpenAIProvider uses the OpenAI SDK directly:
class OpenAIProvider(LLMClient):
    def __init__(self, api_key=None, model="gpt-4o", **kwargs):
        import openai
        self._client = openai.OpenAI(api_key=api_key)
        self._model = model
ClaudeProvider uses the Anthropic SDK:
class ClaudeProvider(LLMClient):
    def __init__(self, api_key=None, model="claude-3-5-sonnet-20241022", **kwargs):
        import anthropic
        self._client = anthropic.Anthropic(api_key=api_key)
        self._model = model

Testing Requirements

Unit Tests

File: tests/unit/test_llm_client_factory.py
import pytest
from unittest.mock import patch, Mock
from gaia.llm import create_client, LLMClient, NotSupportedError

class TestImports:
    def test_can_import_create_client(self):
        """Verify create_client can be imported."""
        from gaia.llm import create_client
        assert callable(create_client)

    def test_can_import_llm_client_abc(self):
        """Verify LLMClient ABC can be imported."""
        from abc import ABC
        from gaia.llm import LLMClient
        assert issubclass(LLMClient, ABC)

    def test_can_import_not_supported_error(self):
        """Verify NotSupportedError can be imported."""
        from gaia.llm import NotSupportedError
        assert issubclass(NotSupportedError, Exception)

class TestFactory:
    def test_default_creates_lemonade_provider(self):
        """Test factory creates Lemonade by default."""
        with patch("gaia.llm.providers.lemonade.LemonadeClient"):
            client = create_client()
            assert client.provider_name == "Lemonade"

    def test_explicit_provider_selection(self):
        """Test explicit provider parameter."""
        with patch("gaia.llm.providers.lemonade.LemonadeClient"):
            client = create_client(provider="lemonade")
            assert client.provider_name == "Lemonade"

    def test_use_claude_flag(self):
        """Test backward-compatible use_claude flag."""
        with patch("gaia.llm.providers.claude.anthropic"):
            client = create_client(use_claude=True, api_key="test")
            assert client.provider_name == "Claude"

    def test_use_openai_flag(self):
        """Test backward-compatible use_openai flag."""
        with patch("openai.OpenAI"):
            client = create_client(use_openai=True, api_key="test")
            assert client.provider_name == "OpenAI"

    def test_invalid_provider_raises_error(self):
        """Test unknown provider raises ValueError."""
        with pytest.raises(ValueError, match="Unknown provider"):
            create_client(provider="invalid")

    def test_both_flags_raises_error(self):
        """Test both flags raise ValueError."""
        with pytest.raises(ValueError, match="Cannot specify both"):
            create_client(use_claude=True, use_openai=True)

    def test_case_insensitive_provider(self):
        """Test provider names are case-insensitive."""
        with patch("gaia.llm.providers.lemonade.LemonadeClient"):
            client = create_client(provider="LEMONADE")
            assert client.provider_name == "Lemonade"

class TestNotSupportedError:
    def test_error_message_format(self):
        """Test NotSupportedError message format."""
        error = NotSupportedError("TestProvider", "test_method")
        assert "TestProvider" in str(error)
        assert "test_method" in str(error)

    def test_claude_embed_raises_not_supported(self):
        """Test Claude provider raises NotSupportedError for embed."""
        with patch("gaia.llm.providers.claude.anthropic"):
            client = create_client(provider="claude", api_key="test")
        with pytest.raises(NotSupportedError) as exc:
            client.embed(["text"])
        assert "Claude" in str(exc.value)
        assert "embed" in str(exc.value)

    def test_openai_vision_raises_not_supported(self):
        """Test OpenAI provider raises NotSupportedError for vision."""
        with patch("openai.OpenAI"):
            client = create_client(provider="openai", api_key="test")
        with pytest.raises(NotSupportedError) as exc:
            client.vision([b"image"], "describe")
        assert "OpenAI" in str(exc.value)
        assert "vision" in str(exc.value)

class TestProviderMethods:
    def test_lemonade_generate(self):
        """Test LemonadeProvider.generate()."""
        with patch("gaia.llm.providers.lemonade.LemonadeClient") as MockClient:
            mock_backend = Mock()
            mock_backend.completions.return_value = {
                "choices": [{"text": "Hello"}]
            }
            MockClient.return_value = mock_backend

            client = create_client()
            response = client.generate("Test")
            assert response == "Hello"

    def test_lemonade_chat(self):
        """Test LemonadeProvider.chat()."""
        with patch("gaia.llm.providers.lemonade.LemonadeClient") as MockClient:
            mock_backend = Mock()
            mock_backend.chat_completions.return_value = {
                "choices": [{"message": {"content": "Hi"}}]
            }
            MockClient.return_value = mock_backend

            client = create_client()
            response = client.chat([{"role": "user", "content": "Hello"}])
            assert response == "Hi"

    def test_streaming_returns_iterator(self):
        """Test streaming returns an iterator."""
        with patch("gaia.llm.providers.lemonade.LemonadeClient") as MockClient:
            mock_backend = Mock()

            def mock_stream():
                yield {"choices": [{"delta": {"content": "Hi"}}]}
                yield {"choices": [{"delta": {"content": " there"}}]}

            mock_backend.chat_completions.return_value = mock_stream()
            MockClient.return_value = mock_backend

            client = create_client()
            result = client.chat([{"role": "user", "content": "Hello"}], stream=True)
            chunks = list(result)
            assert len(chunks) == 2
            assert "".join(chunks) == "Hi there"

Integration Tests

File: tests/integration/test_llm_providers.py
def test_integration_lemonade_generate():
    """Test Lemonade provider with live server."""
    try:
        client = create_client()
        response = client.generate("Say hello")
        assert isinstance(response, str)
        assert len(response) > 0
    except ConnectionError:
        pytest.skip("Lemonade server not running")

def test_integration_lemonade_streaming():
    """Test Lemonade streaming."""
    try:
        client = create_client()
        chunks = list(client.generate("Count to 3", stream=True))
        assert len(chunks) > 0
    except ConnectionError:
        pytest.skip("Lemonade server not running")

def test_integration_lemonade_performance_stats():
    """Test performance stats (Lemonade only)."""
    try:
        client = create_client()
        client.generate("Test")
        stats = client.get_performance_stats()
        assert isinstance(stats, dict)
    except ConnectionError:
        pytest.skip("Lemonade server not running")

Dependencies

Required Packages

# pyproject.toml
[project]
dependencies = [
    "openai>=1.0.0",      # OpenAI Python SDK (used for local + OpenAI)
    "httpx>=0.24.0",      # HTTP client with timeout support
    "requests>=2.31.0",   # For performance stats/control endpoints
    "python-dotenv>=1.0.0",  # Environment variable management
]

[project.optional-dependencies]
claude = ["anthropic>=0.18.0"]  # Claude API support

Import Dependencies

Factory (factory.py):
import importlib
from typing import Optional
from .base_client import LLMClient
Base Client (base_client.py):
from abc import ABC, abstractmethod
from typing import Iterator, Union
from .exceptions import NotSupportedError
LemonadeProvider (providers/lemonade.py):
from typing import Iterator, Optional, Union
from ..base_client import LLMClient
from ..lemonade_client import LemonadeClient, DEFAULT_MODEL_NAME
# DEFAULT_MODEL_NAME = "Qwen3-0.6B-GGUF"
OpenAIProvider (providers/openai_provider.py):
from typing import Iterator, Optional, Union
import openai  # Requires: pip install openai
from ..base_client import LLMClient
ClaudeProvider (providers/claude.py):
from typing import Iterator, Optional, Union
try:
    import anthropic  # Requires: pip install anthropic
except ImportError:
    anthropic = None
from ..base_client import LLMClient

Usage Examples

Example 1: Basic Usage with Factory

from gaia.llm import create_client

# Default Lemonade provider
client = create_client()

# Non-streaming generation
response = client.generate("Write a hello world program in Python")
print(response)

# Get performance stats (Lemonade only)
stats = client.get_performance_stats()
print(f"Speed: {stats.get('tokens_per_second', 'N/A')} tokens/sec")

Example 2: Explicit Provider Selection

from gaia.llm import create_client

# Lemonade (local)
lemonade = create_client(provider="lemonade", model="Qwen3-0.6B-GGUF")

# OpenAI
openai_client = create_client(provider="openai", api_key="sk-...")

# Claude
claude = create_client(provider="claude", api_key="sk-ant-...")

# Backward-compatible flags
legacy_claude = create_client(use_claude=True, api_key="sk-ant-...")

Example 3: Handling Unsupported Features

from gaia.llm import create_client, NotSupportedError

# Create OpenAI client
client = create_client(provider="openai", api_key="sk-...")

# This works (OpenAI supports embed)
embeddings = client.embed(["Hello world", "How are you?"])

# This raises NotSupportedError (OpenAI doesn't support vision)
try:
    result = client.vision([image_bytes], "Describe this image")
except NotSupportedError as e:
    print(f"Feature not available: {e}")
    # Output: "OpenAI does not support vision"

Example 4: Chat with Message History

from gaia.llm import create_client

client = create_client()

# Build conversation history
messages = [
    {"role": "user", "content": "What's 2+2?"},
    {"role": "assistant", "content": "2+2 equals 4."},
    {"role": "user", "content": "What about 3+3?"}
]

# Use chat() method
response = client.chat(messages)
print(response)  # "3+3 equals 6."

Example 5: Streaming Responses

from gaia.llm import create_client

client = create_client()

# Streaming with generate()
print("AI: ", end="", flush=True)
for chunk in client.generate("Tell me a short story", stream=True):
    print(chunk, end="", flush=True)
print()

# Streaming with chat()
for chunk in client.chat([{"role": "user", "content": "Hello"}], stream=True):
    print(chunk, end="", flush=True)

Example 6: Embeddings (Lemonade and OpenAI only)

from gaia.llm import create_client

# With Lemonade
lemonade = create_client()
embeddings = lemonade.embed(["Hello world", "How are you?"])
print(f"Embedding dimensions: {len(embeddings[0])}")

# With OpenAI
openai_client = create_client(provider="openai", api_key="sk-...")
embeddings = openai_client.embed(["Text to embed"])

Example 7: Vision (Lemonade and Claude only)

from gaia.llm import create_client

# With Claude
claude = create_client(provider="claude", api_key="sk-ant-...")
with open("image.jpg", "rb") as f:
    image_data = f.read()
description = claude.vision([image_data], "Describe what you see")
print(description)

Example 8: Model Management (Lemonade only)

from gaia.llm import create_client

client = create_client()

# Load a specific model
client.load_model("Qwen3-0.6B-GGUF")

# Generate
response = client.generate("Hello")

# Get performance stats
stats = client.get_performance_stats()
print(f"Speed: {stats.get('tokens_per_second', 'N/A')} tokens/sec")

# Unload model
client.unload_model()

Example 9: Remote Lemonade Server

from gaia.llm import create_client

# Connect to remote server
client = create_client(base_url="http://192.168.1.100:8000")

response = client.generate("Hello from remote server")
print(response)

Example 10: System Prompts

from gaia.llm import create_client

# Set default system prompt
client = create_client(
    system_prompt="You are a helpful coding assistant."
)

# System prompt automatically prepended to chat messages
response = client.chat([
    {"role": "user", "content": "Write a binary search function"}
])
print(response)

Third-Party LLM Integration

GAIA supports third-party LLM service providers through its OpenAI-compatible API interface. Any service implementing the OpenAI API specification can be used with GAIA.

Required API Endpoints

Your LLM service must implement at least one of these OpenAI-compatible endpoints:

Completions Endpoint

Default: POST /v1/completionsUsed for pre-formatted prompts

Chat Completions Endpoint

POST /v1/chat/completionsUsed for structured conversations with message history

Completions Endpoint

{
  "model": "your-model-name",
  "prompt": "Your prompt text here",
  "stream": false,
  "temperature": 0.1,
  "max_tokens": 2048
}

Chat Completions Endpoint

{
  "model": "your-model-name",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "stream": false,
  "temperature": 0.1
}

Configuration

Linux
export LEMONADE_BASE_URL="http://your-llm-service:8080"
Windows (PowerShell)
$env:LEMONADE_BASE_URL="http://your-llm-service:8080"
Windows (CMD)
set LEMONADE_BASE_URL=http://your-llm-service:8080
URL Normalization: LemonadeClient automatically appends /api/v1 if not present:
  • http://localhost:8080http://localhost:8080/api/v1
  • If your service uses /v1 instead, provide the full path: http://localhost:8080/v1

Example Integration

from gaia.llm import create_client

# Connect to your third-party LLM service
client = create_client(base_url="http://your-service:8080/v1")

# Test connection
response = client.generate("Hello, are you working?")
print(response)

Compatibility Checklist

  • OpenAI-compatible endpoints (/v1/completions or /v1/chat/completions)
  • JSON request/response format matching OpenAI specification
  • HTTP POST method for generation requests
  • Non-streaming responses (complete response as JSON)
  • ⚠️ Streaming responses (Server-Sent Events format)
  • ⚠️ Error handling (proper HTTP status codes: 200, 400, 404, 500)
  • ⚠️ Model listing (GET /v1/models endpoint)
  • ⚠️ Token counting (usage statistics in responses)
The following features are specific to Lemonade provider and raise NotSupportedError with third-party services:
  • get_performance_stats() - Performance statistics
  • load_model() - Model loading
  • unload_model() - Model unloading

Troubleshooting

Problem: ConnectionError: LLM Server Connection ErrorSolutions:
  1. Verify service is running:
    curl http://your-service:port/v1/models
    
  2. Check firewall settings
  3. Ensure correct base URL format
  4. Test with explicit base URL:
    client = create_client(base_url="http://localhost:8080/v1")
    
Problem: 404 endpoint not foundSolutions:
  1. Check if service uses /v1/completions (OpenAI standard)
  2. Verify API path structure: /v1 vs /api/v1
  3. Consult service documentation for correct endpoint paths
  4. Use chat method explicitly if needed:
    client.chat([{"role": "user", "content": "Test"}])
    
Problem: Model errors or “model not loaded”Solutions:
  1. Specify model explicitly:
    client.generate("Test", model="your-model-name")
    
  2. List available models (if service supports):
    curl http://your-service:port/v1/models
    
  3. Ensure model is loaded in your service before connecting
Problem: Streaming responses not workingSolutions:
  1. Verify service supports Server-Sent Events (SSE)
  2. Check Content-Type headers: text/event-stream
  3. Test non-streaming first:
    response = client.generate("Test", stream=False)
    
  4. Enable debug logging:
    import logging
    logging.basicConfig(level=logging.DEBUG)
    

Documentation Updates Required

SDK.md

Add to LLM Section:
### LLM Client

**Import:** `from gaia.llm import create_client, LLMClient, NotSupportedError`

**Purpose:** Provider-based LLM client with factory pattern for local and cloud backends.

**Features:**
- Factory-based client creation with `create_client()`
- Three providers: Lemonade (local), OpenAI, Claude
- Abstract base class for type safety
- `NotSupportedError` for unsupported features
- Streaming and non-streaming generation
- Backward-compatible flags

**Quick Start:**
```python
from gaia.llm import create_client

# Local LLM (default)
client = create_client()
response = client.generate("Hello world")

# Streaming
for chunk in client.generate("Tell me a story", stream=True):
    print(chunk, end="")

# Claude API
claude = create_client(provider="claude", api_key="sk-ant-...")
response = claude.generate("Explain Python decorators")

# Backward-compatible
client = create_client(use_claude=True)

LLMClient Technical Specification