Computer Use Agent (CUA) - Implementation Plan
Status: Planning
Priority: High
Target: January 30, 2026
Vote with 👍
Executive Summary
Build a Computer Use Agent (CUA) that wraps external MCP servers and enables AI-powered desktop control through natural language commands. The agent dynamically discovers tools from the external MCP server and registers them locally, using Lemonade (Qwen-Coder 30B) for intelligent reasoning about which actions to take. Goal: Let GAIA automate desktop features and system settings through natural language commands.Requirements
Prerequisites
- GAIA SDK v0.16+
- Python 3.10+
- Lemonade Server running with Qwen-Coder 30B
- External MCP server for desktop control (user-provided)
External MCP Server Requirements
External MCP servers must support the MCP Protocol:- One of: stdio or HTTP transport
initializehandshaketools/listcapabilitytools/callfor tool execution
The Problem
GAIA agents currently cannot interact with desktop features and system settings. When agents need to:- Adjust system settings and preferences
- Control hardware features and utilities
- Query device status and capabilities
- Automate repetitive system tasks
| Current Limitation | Impact |
|---|---|
| No desktop automation | Can’t build end-to-end workflow agents |
| No system control | Can’t adjust settings or features |
| No device queries | Can’t check status or capabilities |
| Isolated from OS | Limited to file/code operations only |
The Solution
A Computer Use Agent that connects to external MCP servers and exposes their tools to Lemonade:Desktop Automation
System settings, features, utilities
Automate any supported function
Natural Language
Describe tasks in plain English
AI figures out the steps
Tool-Agnostic
Works with any MCP server
No hardcoded tool definitions
Architecture
Key Design Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Base Class | Agent (not MCPAgent) | CLI-first, no MCP server exposure needed |
| Connection Modes | stdio + HTTP | Flexibility for different external servers |
| Tool Discovery | Dynamic via tools/list | No hardcoded tool definitions |
| Error Handling | User-friendly + verbose flag | Clean UX by default |
What This Agent Does NOT Do
- Does NOT expose itself as an MCP server
- Does NOT manage the external MCP server process
- Future UI integration will use OpenAI-compatible API (out of scope)
Components
| Component | Purpose | Implementation |
|---|---|---|
| ComputerUseAgent | Main agent class | Inherits from Agent, registers external tools |
| ExternalMCPClient | Protocol handler | JSON-RPC 2.0 over stdio or HTTP |
| CLI | User interface | Single command execution, tool listing |
| Tool Registry Bridge | Tool integration | Dynamically registers MCP tools with @tool system |
Data Flow
Tool Execution Flow
Connection Flow
CLI Commands
Execute Commands
Tool Discovery
Configuration Options
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
GAIA_CUA_MCP_URL | stdio | External MCP server URL or “stdio” |
Connection Modes
- stdio (Default)
- HTTP
Multi-Agent Future Architecture
The design supports future expansion with namespaced environment variables:Implementation Plan
MCP Client Layer (TDD)
ExternalMCPClient class with stdio and HTTP transport, JSON-RPC 2.0 protocol, connection handlingAgent Implementation (TDD)
ComputerUseAgent inheriting from Agent, dynamic tool registration, Lemonade integrationProject Structure
| File | Purpose |
|---|---|
agent.py | ComputerUseAgent class |
cli.py | Standalone CLI entry point |
mcp_client.py | JSON-RPC client for external server |
test_agent.py | Unit tests (mocked) |
test_mcp_client.py | MCP client unit tests |
test_connection_modes.py | stdio vs HTTP mode tests |
test_integration.py | Real server integration tests |
Success Metrics
| Metric | Target |
|---|---|
| External server connection time (stdio) | < 2 seconds |
| External server connection time (HTTP) | < 500ms |
| Tool call latency overhead | < 50ms |
| Supported connection modes | stdio, HTTP |
| Test coverage | > 90% |
| Graceful degradation | Agent usable when server unavailable |
Comparison
| Feature | Direct Implementation | CUA Agent |
|---|---|---|
| Desktop automation | Custom integration code | Use any MCP server |
| Tool discovery | Hardcoded definitions | Dynamic via tools/list |
| LLM reasoning | Manual prompt engineering | Lemonade handles it |
| Error handling | Custom per-tool | Standardized, user-friendly |
| Future MCP servers | Rewrite everything | Just change URL |
Error Handling
The agent provides user-friendly error messages by default, with technical details available via--verbose:
Security Considerations
- Process isolation: External MCP server runs as separate process
- No credential storage: Connection URLs configured via environment variables
- Audit logging: All tool calls can be logged with
--traceflag - Graceful degradation: Agent doesn’t crash when server unavailable
Relationship to MCP Client Mixin
This agent is a specific implementation of the MCP Client Mixin pattern:| Component | MCP Client Mixin | CUA Agent |
|---|---|---|
| Scope | General-purpose mixin | Desktop automation agent |
| Usage | Inherit in custom agents | Ready-to-use CLI |
| Configuration | Programmatic | Environment variables |
| Target | SDK developers | End users |
Related
- Roadmap - High-level feature timeline
- MCP Client Mixin Plan - General MCP client architecture
- MCP Client - MCP protocol documentation
- Docker Agent Guide - Similar CLI-first agent pattern
Vote on GitHub
React with 👍 to help prioritize this feature