Source Code:
src/gaia/talk/ · src/gaia/audio/First time here? Complete the Setup guide first to install GAIA and its dependencies.
Quick Start
1
Install talk extras
With GAIA installed, add the talk extras:
2
Start Lemonade Server
Launch the AI backend server:
3
Launch Talk Mode
Start voice conversation:
- Full Voice (ASR + TTS)
- ASR Only
4
Start Speaking
When you see Say “exit” or “quit” to end the session
⠴ Listening..., you can start talkingVoice Commands
Exit Session
Say “exit” or “quit”
Clear History
Say “restart”
Trigger Response
Natural pauses (>1 second)
Stop Playback
Press Enter during audio
Configuration Options
Customize your voice interaction experience:- Whisper Model Size
- Audio Device
- Performance Stats
Larger models provide better accuracy but require more resources
Document Q&A with Voice
Voice interaction supports document-based Q&A through RAG (Retrieval-Augmented Generation). Ask questions about your PDF documents using natural speech!
Quick Start with Documents
Use Cases
Technical Support
Voice chat with product manuals and troubleshooting guides
Research
Speak questions about research papers and documentation
Learning
Voice interaction with textbooks and educational materials
Accessibility
Hands-free document Q&A for users with mobility needs
Field Work
Voice queries about procedures when hands are busy
Documentation
Quick reference lookup while working
How It Works
1
Document Indexing
PDFs are automatically indexed when you start talk with
--index2
Voice Input
Speak your question about the documents
3
Context Retrieval
Relevant document sections are retrieved automatically
4
Voice Response
AI answers based on document context and speaks the response
See the Chat documentation - Document Q&A section for more details on RAG capabilities
Testing ASR Components
Test the automatic speech recognition system using various test modes:Audio File Transcription
Test transcription of existing audio files:Supported Audio Formats
Supported Audio Formats
- WAV
- MP3
- M4A
- Other common formats
--input-audio-file: Path to the audio file (required)--whisper-model-size: Model size (default: “base”)
List Audio Devices
Discover available audio input devices:Microphone Recording Test
Test real-time transcription from your microphone:--recording-duration: Recording duration in seconds (default: 10)--whisper-model-size: Model size (default: “base”)--audio-device-index: Specific microphone (optional)
Testing TTS Components
Test text-to-speech capabilities with various test modes:Text Preprocessing
Test how TTS processes and formats text:Streaming Playback
Test real-time audio generation and playback:Test Output Includes
Test Output Includes
- Processing progress
- Playback progress
- Currently spoken text
- Performance metrics
Audio File Generation
Generate and save audio to WAV file:Troubleshooting
Audio Device Errors
Audio Device Errors
- Try different
--audio-device-indexvalues - List available devices:
gaia test --test-type asr-list-audio-devices - Check system audio input settings (Settings > Audio > Input)
- Ensure correct microphone is selected as default input device
Poor ASR Accuracy
Poor ASR Accuracy
- Try larger Whisper models:
--whisper-model-size mediumorlarge - Ensure you’re in a quiet environment
- Speak clearly at a moderate pace
- Check microphone positioning and quality
- Verify microphone is not muted
No Voice Response (TTS)
No Voice Response (TTS)
- Check system audio output/speaker settings
- Verify TTS is enabled (not using
--no-ttsflag) - Ensure system volume is not muted
- Verify espeak-ng is properly installed
- Test with:
gaia test --test-type tts-streaming
Voice Input Not Recognized
Voice Input Not Recognized
- Check microphone permissions
- Verify microphone is working in other applications
- Test with:
gaia test --test-type asr-microphone - Adjust
--audio-device-indexif multiple microphones
RAG Issues
RAG Issues
Missing RAG dependencies:Other issues:
- PDF processing errors: Ensure PDFs have extractable text (not scanned images)
- Slow indexing: Use
--statsto monitor; larger documents take time - Context not used: Verify documents indexed successfully at startup
- Empty responses: Check PDFs contain extractable text
Best Practices
Optimal Environment
Use in quiet environments for best recognition accuracy
Speech Clarity
Speak clearly and at moderate pace
Model Selection
Balance accuracy vs. performance based on your hardware
Natural Pauses
Use natural pauses to trigger AI responses
Next Steps
Chat SDK
Learn about the underlying chat capabilities
CLI Reference
Explore all command-line options
Development Guide
Build custom voice-enabled agents
Features Overview
Discover all GAIA capabilities