GAIA v0.15.3 Release Notes
Overview
This release introduces Stable Diffusion image generation with the new SD Agent, multi-step workflow parameter passing, and composable system prompts architecture. Includes Lemonade 9.2.0 support, comprehensive playbook, and enhanced agent reliability. TL;DR:- New: SD Agent - Multi-modal image generation + story creation
- New: SDToolsMixin & VLMToolsMixin - Add image/vision capabilities to any agent
- Fixed: Multi-step workflows - Agents pass results between steps automatically
- Improved: Agent reliability - Smarter loop detection, 16K context
What’s New
SD Agent: Multi-Modal Image Generation
New agent demonstrating how to combine image generation with vision analysis for creative workflows. Shows developers how to build multi-modal applications using GAIA’s mixin pattern.- 4 SD Models: SDXL-Base-1.0 (photorealistic), SDXL-Turbo (fast), SD-1.5, SD-Turbo
- LLM-Enhanced Prompts: Research-backed keyword strategies automatically applied
- Vision Analysis: Image descriptions and Q&A using Vision LLM
- Story Creation: Creative narratives generated from images
- Story Persistence: Stories saved as
.txtfiles alongside images - Random Seeds: Each generation unique by default (specify seed for reproducibility)
- Image generation: ~17s (SDXL-Turbo, 512x512)
- Story creation: ~15s (Qwen3-VL-4B)
- Total workflow: ~35s
SDToolsMixin: Stable Diffusion SDK
New mixin for adding image generation to any agent. How it helps: Add professional image generation to any agent in 3 lines. Auto-configures optimal settings per model. Features:- 4 Models Supported: SDXL-Base-1.0, SDXL-Turbo, SD-1.5, SD-Turbo
- 3 Auto-registered Tools:
generate_image(),list_sd_models(),get_generation_history() - Model-Specific Defaults: Automatic size, steps, CFG scale per model (e.g., SDXL-Turbo: 512x512, 4 steps, CFG 1.0)
- Session Tracking: Generation history maintained in
self.sd_generationslist - Composable Prompts:
get_sd_system_prompt()provides research-backed prompt engineering per model
VLMToolsMixin: Vision Language Model SDK
New mixin for adding vision capabilities to any agent. How it helps: Enable agents to understand and analyze images. Access vision client for building custom vision-based tools. Features:- 2 Auto-registered Tools:
analyze_image(),answer_question_about_image() - Multi-Model Support: Qwen3-VL-4B, Qwen2.5-VL-7B, and other vision models
- Client Access:
self.vlm_client.extract_from_image()for building custom tools - Composable Prompts:
get_vlm_system_prompt()provides usage guidelines
create_story_from_image implemented as custom tool in SDAgent (not in VLMToolsMixin) to demonstrate building specialized tools using self.vlm_client. Encourages custom tool development over bloating mixins with every use case.
Multi-Step Workflow Parameter Passing
Framework improvement enabling agents to pass results between steps automatically. How it helps: Build complex workflows (data fetch → process → analyze → store) without manual result passing. Works for all agents, not just SD Agent. Problem: Multi-step workflows failed because agents couldn’t reference previous outputs. Resulted in “Image not found” errors when step 2 needed step 1’s image_path. Solution: Placeholder syntax automatically resolves to actual values:$PREV.field- Reference previous step$STEP_N.field- Reference specific step (0-indexed)- Recursive resolution for nested structures
- Backward compatible (existing plans work unchanged)
Improvements
Composable System Prompts
Architectural pattern enabling automatic prompt composition across mixins. How it helps: Build agents that inherit domain expertise automatically. No manual prompt assembly or knowledge duplication. Implementation:- Mixins own knowledge:
get_sd_system_prompt()provides SD prompt engineering,get_vlm_system_prompt()provides VLM usage - Auto-composition: Agent base class collects and merges mixin prompts
- Easy extension: Agents add custom prompts via
_get_system_prompt()
Agent Framework
- Loop Detection: Configurable
max_consecutive_repeats(default: 4) - Allows “create 3 designs” while preventing infinite loops - Default max_steps: Increased from 5 → 20 - Supports complex multi-step workflows without artificial limits
- State Management: Cleanup on error recovery - Prevents stale data contamination between plan attempts
- Console Warnings: Rich-formatted output - Better visibility than silent logger messages
Model Downloads
- CLI-based:
lemonade-server pullinstead of HTTP - More reliable with built-in retry logic - Interrupt Support: Graceful Ctrl+C - Cancel long downloads without breaking state
- Context Verification: Force unload/reload - Ensures 16K context persists correctly
Documentation
- Consolidated Playbook: 4 files → 1 guide - Faster learning without fragmentation
- GitHub Support Links: Issue reporting in troubleshooting - Clear path to get help
- Contributing Guide: Documentation guidelines - Easier for community contributions
- Example Code:
examples/sd_agent_example.py- Working reference implementation
Developer Experience
- uvx Fallback: Lint works without uvx - One less dependency to install
- Video Demo Scripts: Documentation tooling - Easier to create demos
- Better Console Output: Rich formatting - Clearer agent execution visibility
Infrastructure
- Lemonade 9.2.0: Required for SDXL models
- Merge Queue: Concurrency optimization - Faster CI/CD feedback
- Release Automation: Auto-triggered notes - Streamlined release process
Bug Fixes
- Multi-step workflows: Fixed “Image not found” when step 2 references step 1 output (e.g., passing image_path)
- Context exceeded: SD Agent completes without hitting token limits (16K context)
- Loop detection: Agents handle “create 3 designs” without false warnings (threshold: 4 consecutive)
- Context persistence: 16K settings saved correctly during
gaia initreruns - Missing exports: Fixed
gaia.agents.toolspackage in setup.py - Missing dependencies: Added
requeststo requirements
Breaking Changes
None - This release is 100% backward compatible.Upgrade
Full Changelog
66 commits from multiple contributors Key PRs:- #287 - Add Stable Diffusion Image Generation Support
- #296 - SD Agent enhancements: multi-modal capabilities, composable prompts, parameter passing
- #291 - Use lemonade CLI for model downloads
- #288 - Standardize playbook installation
- #286 - Contributing guide for documentation
- #284 - Update Lemonade to v9.2.0
- #283 - Fix missing gaia.agents.tools package
- #256 - Optimize merge queue