Skip to main content
GAIA’s talk mode enables natural voice-based interaction with LLMs using Whisper for automatic speech recognition (ASR) and Kokoro TTS for text-to-speech (TTS). Have natural conversations with AI through your microphone and speakers.
First time here? Complete the Setup guide first to install GAIA and its dependencies.

Quick Start

1

Install talk extras

With GAIA installed, add the talk extras:
uv pip install "amd-gaia[talk]"
2

Start Lemonade Server

Launch the AI backend server:
lemonade-server serve
You can also double-click the desktop shortcut to start the server
3

Launch Talk Mode

Start voice conversation:
gaia talk
4

Start Speaking

When you see ⠴ Listening..., you can start talking
[2025-02-06 11:56:30] | INFO | Starting audio processing thread...
[2025-02-06 11:56:30] | INFO | Listening for voice input...
 Listening...
Say “exit” or “quit” to end the session

Voice Commands

Exit Session

Say “exit” or “quit”

Clear History

Say “restart”

Trigger Response

Natural pauses (>1 second)

Stop Playback

Press Enter during audio

Configuration Options

Customize your voice interaction experience:
# Choose model size for speech recognition
gaia talk --whisper-model-size medium

# Available: tiny, base, small, medium, large
Larger models provide better accuracy but require more resources

Document Q&A with Voice

Voice interaction supports document-based Q&A through RAG (Retrieval-Augmented Generation). Ask questions about your PDF documents using natural speech!

Quick Start with Documents

# Voice chat with a document
gaia talk --index manual.pdf

Use Cases

Technical Support

Voice chat with product manuals and troubleshooting guides

Research

Speak questions about research papers and documentation

Learning

Voice interaction with textbooks and educational materials

Accessibility

Hands-free document Q&A for users with mobility needs

Field Work

Voice queries about procedures when hands are busy

Documentation

Quick reference lookup while working

How It Works

1

Document Indexing

PDFs are automatically indexed when you start talk with --index
2

Voice Input

Speak your question about the documents
3

Context Retrieval

Relevant document sections are retrieved automatically
4

Voice Response

AI answers based on document context and speaks the response
See the Chat documentation - Document Q&A section for more details on RAG capabilities

Testing ASR Components

Test the automatic speech recognition system using various test modes:

Audio File Transcription

Test transcription of existing audio files:
gaia test --test-type asr-file-transcription --input-audio-file path/to/audio.wav
  • WAV
  • MP3
  • M4A
  • Other common formats
Options:
  • --input-audio-file: Path to the audio file (required)
  • --whisper-model-size: Model size (default: “base”)

List Audio Devices

Discover available audio input devices:
gaia test --test-type asr-list-audio-devices

Microphone Recording Test

Test real-time transcription from your microphone:
gaia test --test-type asr-microphone --recording-duration 15
Options:
  • --recording-duration: Recording duration in seconds (default: 10)
  • --whisper-model-size: Model size (default: “base”)
  • --audio-device-index: Specific microphone (optional)

Testing TTS Components

Test text-to-speech capabilities with various test modes:

Text Preprocessing

Test how TTS processes and formats text:
gaia test --test-type tts-preprocessing

Streaming Playback

Test real-time audio generation and playback:
gaia test --test-type tts-streaming --test-text "Your test text here"
  • Processing progress
  • Playback progress
  • Currently spoken text
  • Performance metrics

Audio File Generation

Generate and save audio to WAV file:
gaia test --test-type tts-audio-file \
  --test-text "Your test text here" \
  --output-audio-file ./test_output.wav

Troubleshooting

  • Try different --audio-device-index values
  • List available devices: gaia test --test-type asr-list-audio-devices
  • Check system audio input settings (Settings > Audio > Input)
  • Ensure correct microphone is selected as default input device
  • Try larger Whisper models: --whisper-model-size medium or large
  • Ensure you’re in a quiet environment
  • Speak clearly at a moderate pace
  • Check microphone positioning and quality
  • Verify microphone is not muted
  • Check system audio output/speaker settings
  • Verify TTS is enabled (not using --no-tts flag)
  • Ensure system volume is not muted
  • Verify espeak-ng is properly installed
  • Test with: gaia test --test-type tts-streaming
  • Check microphone permissions
  • Verify microphone is working in other applications
  • Test with: gaia test --test-type asr-microphone
  • Adjust --audio-device-index if multiple microphones
Missing RAG dependencies:
uv pip install -e ".[rag]"
Other issues:
  • PDF processing errors: Ensure PDFs have extractable text (not scanned images)
  • Slow indexing: Use --stats to monitor; larger documents take time
  • Context not used: Verify documents indexed successfully at startup
  • Empty responses: Check PDFs contain extractable text

Best Practices

Optimal Environment

Use in quiet environments for best recognition accuracy

Speech Clarity

Speak clearly and at moderate pace

Model Selection

Balance accuracy vs. performance based on your hardware

Natural Pauses

Use natural pauses to trigger AI responses

Next Steps