Computer Use Agent (CUA) - Implementation Plan

Status: Planning Priority: High Target: v0.21.0 Vote with 👍

Executive Summary

Build a Computer Use Agent (CUA) that wraps external MCP servers and enables AI-powered desktop control through natural language commands. The agent dynamically discovers tools from the external MCP server and registers them locally, using Lemonade (Qwen3.5-35B) for intelligent reasoning about which actions to take. Goal: Let GAIA automate desktop features and system settings through natural language commands.

Requirements

Prerequisites

GAIA SDK v0.16+
Python 3.10+
Lemonade Server running with Qwen3.5-35B
External MCP server for desktop control (user-provided)

External MCP Server Requirements

External MCP servers must support the MCP Protocol:

One of: stdio or HTTP transport
initialize handshake
tools/list capability
tools/call for tool execution

The Problem

GAIA agents currently cannot interact with desktop features and system settings. When agents need to:

Adjust system settings and preferences
Control hardware features and utilities
Query device status and capabilities
Automate repetitive system tasks

…they have no way to do so.

Current Limitation	Impact
No desktop automation	Can’t build end-to-end workflow agents
No system control	Can’t adjust settings or features
No device queries	Can’t check status or capabilities
Isolated from OS	Limited to file/code operations only

Meanwhile, external MCP servers already provide desktop automation capabilities. GAIA just needs a way to connect to them.

The Solution

A Computer Use Agent that connects to external MCP servers and exposes their tools to Lemonade:

# Single command execution
python -m gaia.agents.cua.cli "turn on dark mode"

# List available tools from external server
python -m gaia.agents.cua.cli --list-tools

Desktop Automation

System settings, features, utilities Automate any supported function

Natural Language

Describe tasks in plain English AI figures out the steps

Tool-Agnostic

Works with any MCP server No hardcoded tool definitions

Architecture

Key Design Decisions

Decision	Choice	Rationale
Base Class	`Agent` (not `MCPAgent`)	CLI-first, no MCP server exposure needed
Connection Modes	stdio + HTTP	Flexibility for different external servers
Tool Discovery	Dynamic via `tools/list`	No hardcoded tool definitions
Error Handling	User-friendly + verbose flag	Clean UX by default

What This Agent Does NOT Do

Does NOT expose itself as an MCP server
Does NOT manage the external MCP server process
Future UI integration will use OpenAI-compatible API (out of scope)

Components

Component	Purpose	Implementation
ComputerUseAgent	Main agent class	Inherits from `Agent`, registers external tools
ExternalMCPClient	Protocol handler	JSON-RPC 2.0 over stdio or HTTP
CLI	User interface	Single command execution, tool listing
Tool Registry Bridge	Tool integration	Dynamically registers MCP tools with `@tool` system

Data Flow

Tool Execution Flow

Connection Flow

CLI Commands

Execute Commands

# Single command (primary use case)
python -m gaia.agents.cua.cli "turn on dark mode"

# With verbose output
python -m gaia.agents.cua.cli "check my battery status" --verbose

# Silent mode (no step output)
python -m gaia.agents.cua.cli "enable power saver" --silent

# Show execution trace
python -m gaia.agents.cua.cli "what features does my laptop support" --trace

Tool Discovery

# List available tools from external server
python -m gaia.agents.cua.cli --list-tools

# List tools with verbose connection info
python -m gaia.agents.cua.cli --list-tools --verbose

Configuration Options

# Specify external MCP server URL
python -m gaia.agents.cua.cli --mcp-server-url http://localhost:9000 "enable dark mode"

# Use specific LLM model
python -m gaia.agents.cua.cli --model Qwen3.5-35B "check storage status"

Configuration

Environment Variables

Variable	Default	Description
`GAIA_CUA_MCP_URL`	`stdio`	External MCP server URL or “stdio”

Connection Modes

stdio (Default)
HTTP

export GAIA_CUA_MCP_URL=stdio
python -m gaia.agents.cua.cli "turn on dark mode"

export GAIA_CUA_MCP_URL=http://localhost:9000
python -m gaia.agents.cua.cli "turn on dark mode"

Multi-Agent Future Architecture

The design supports future expansion with namespaced environment variables:

GAIA_CUA_MCP_URL          # Computer Use Agent
GAIA_BROWSER_MCP_URL      # Future: Browser Agent
GAIA_FILESYSTEM_MCP_URL   # Future: File System Agent

Implementation Plan

MCP Client Layer (TDD)

ExternalMCPClient class with stdio and HTTP transport, JSON-RPC 2.0 protocol, connection handling

Agent Implementation (TDD)

ComputerUseAgent inheriting from Agent, dynamic tool registration, Lemonade integration

Standalone CLI

Command parsing, single-command execution, --list-tools mode, error handling

Integration Tests

Real external server tests, Lemonade integration tests, connection mode tests

Documentation

User guide (docs/guides/cua.mdx), API reference, troubleshooting guide

Code Review & Coverage

Lint check, >90% test coverage, final review

Project Structure

File	Purpose
`agent.py`	ComputerUseAgent class
`cli.py`	Standalone CLI entry point
`mcp_client.py`	JSON-RPC client for external server
`test_agent.py`	Unit tests (mocked)
`test_mcp_client.py`	MCP client unit tests
`test_connection_modes.py`	stdio vs HTTP mode tests
`test_integration.py`	Real server integration tests

Success Metrics

Metric	Target
External server connection time (stdio)	< 2 seconds
External server connection time (HTTP)	< 500ms
Tool call latency overhead	< 50ms
Supported connection modes	stdio, HTTP
Test coverage	> 90%
Graceful degradation	Agent usable when server unavailable

Comparison

Feature	Direct Implementation	CUA Agent
Desktop automation	Custom integration code	Use any MCP server
Tool discovery	Hardcoded definitions	Dynamic via tools/list
LLM reasoning	Manual prompt engineering	Lemonade handles it
Error handling	Custom per-tool	Standardized, user-friendly
Future MCP servers	Rewrite everything	Just change URL

Error Handling

The agent provides user-friendly error messages by default, with technical details available via --verbose:

# Default: user-friendly message
$ python -m gaia.agents.cua.cli "turn on dark mode"
Error: External MCP server is not available.
Please ensure the server is running and try again.

# Verbose: includes technical details
$ python -m gaia.agents.cua.cli "turn on dark mode" --verbose
Error: External MCP server is not available.
Please ensure the server is running and try again.
Detail: Connection refused to stdio

Security Considerations

Process isolation: External MCP server runs as separate process
No credential storage: Connection URLs configured via environment variables
Audit logging: All tool calls can be logged with --trace flag
Graceful degradation: Agent doesn’t crash when server unavailable

Relationship to MCP Client Mixin

This agent is a specific implementation of the MCP Client Mixin pattern (the base mixin shipped in v0.15.4):

Component	MCP Client Mixin	CUA Agent
Scope	General-purpose mixin	Desktop automation agent
Usage	Inherit in custom agents	Ready-to-use CLI
Configuration	Programmatic	Environment variables
Target	SDK developers	End users

The CUA agent validates the MCP Client Mixin architecture and provides a reference implementation.

Roadmap - High-level feature timeline
MCP Client - MCP protocol documentation and the shipped client SDK
Docker Agent Guide - Similar CLI-first agent pattern

Vote on GitHub

React with 👍 to help prioritize this feature

​Computer Use Agent (CUA) - Implementation Plan

​Executive Summary

​Requirements

​Prerequisites

​External MCP Server Requirements

​The Problem

​The Solution

Desktop Automation

Natural Language

Tool-Agnostic

​Architecture

​Key Design Decisions

​What This Agent Does NOT Do

​Components

​Data Flow

​Tool Execution Flow

​Connection Flow

​CLI Commands

​Execute Commands

​Tool Discovery

​Configuration Options

​Configuration

​Environment Variables

​Connection Modes

​Multi-Agent Future Architecture

​Implementation Plan

​Project Structure

​Success Metrics

​Comparison

​Error Handling

​Security Considerations

​Relationship to MCP Client Mixin

​Related

Vote on GitHub

Computer Use Agent (CUA) - Implementation Plan

Executive Summary

Requirements

Prerequisites

External MCP Server Requirements

The Problem

The Solution

Architecture

Key Design Decisions

What This Agent Does NOT Do

Components

Data Flow

Tool Execution Flow

Connection Flow

CLI Commands

Execute Commands

Tool Discovery

Configuration Options

Configuration

Environment Variables

Connection Modes

Multi-Agent Future Architecture

Implementation Plan

Project Structure

Success Metrics

Comparison

Error Handling

Security Considerations

Relationship to MCP Client Mixin

Related