Time to complete: 15-20 minutes
What you’ll build: A hardware analysis agent that recommends which LLMs you can run
What you’ll learn: LemonadeClient APIs, GPU/NPU detection, memory-based recommendations
Platform: Runs locally on AI PCs with Ryzen AI (NPU/iGPU acceleration)
Privacy-First AI: This agent runs entirely on your AI PC. Hardware detection and model recommendations happen locally—no data leaves your machine.
When users ask “What size LLM can I run?”, the answer depends on their actual hardware. Instead of guessing or looking up specifications manually, this agent:
Detects system RAM, GPU, and NPU via Lemonade Server
Queries the available model catalog with size estimates
Calculates which models fit in available memory
Provides personalized recommendations based on real hardware specs
What you’re building:A hardware advisor agent that combines:
LemonadeClient SDK - System info and model catalog APIs
Platform-specific detection - Windows PowerShell / Linux lspci for GPU info
Memory calculations - 70% rule for safe model sizing
Interactive CLI - Natural language queries about capabilities
Local execution - Runs entirely on your AI PC using Ryzen AI acceleration
Get a working agent running to understand the basic flow.
1
Install GAIA
Copy
uv pip install amd-gaia
2
Start Lemonade Server
Copy
# Start local LLM server with AMD NPU/iGPU accelerationlemonade-server serve
Lemonade Server provides AMD-optimized inference for AI PCs with Ryzen AI. It also exposes system info and model catalog APIs that this agent uses.
3
Run the Hardware Advisor
Copy
python examples/hardware_advisor_agent.py
Try asking:
“What size LLM can I run?”
“Show me my system specs”
“What models are available?”
“Can I run a 30B model?”
4
See it in action
Example interaction:
Copy
You: What size LLM can I run?Agent: Let me check your hardware specs...[Tool: get_hardware_info]RAM: 32 GB, GPU: AMD Radeon RX 7900 XTX (24 GB), NPU: Available[Tool: recommend_models]Based on your 32 GB RAM, you can safely run models up to ~22 GB.Agent: Great news! With 32 GB RAM and a 24 GB GPU, you can run:- 30B parameter models (like Qwen3-Coder-30B)- Most 7B-14B models comfortably- NPU acceleration available for smaller models
The agent requires Lemonade Server to be running for hardware detection and model catalog queries. GAIA auto-starts it on first use if not running.
# PowerShell query (wmic deprecated on Windows 11)ps_command = ( "Get-WmiObject Win32_VideoController | " "Select-Object Name,AdapterRAM | " "ConvertTo-Csv -NoTypeInformation")result = subprocess.run( ["powershell", "-Command", ps_command], capture_output=True, text=True, timeout=5)# Parse CSV output for GPU name and VRAM
Copy
# lspci for VGA devicesresult = subprocess.run( ["lspci"], capture_output=True, text=True, timeout=5)# Parse output for "VGA compatible controller" lines# Note: Memory not available via lspci
Start with a minimal agent that just has a system prompt.
Implementation
Run
What You Have
step1_basic.py
Copy
from gaia.agents.base.agent import Agentclass SimpleAdvisorAgent(Agent): """Minimal hardware advisor with no tools.""" def _get_system_prompt(self) -> str: return """You are a hardware advisor for running local LLMs.When users ask about LLM capabilities, explain that you needto check their actual hardware to give accurate recommendations.Be helpful and explain concepts in plain language.""" def _register_tools(self): # No tools yet pass# Use itagent = SimpleAdvisorAgent()result = agent.process_query("What size LLM can I run?")print(result)
Copy
python step1_basic.py
Expected output:
Copy
Agent: To give you accurate recommendations, I would need to checkyour actual hardware specifications. Generally, the size of LLMyou can run depends on your RAM, GPU memory, and whether you havean NPU available...
✓ Agent with reasoning loop
✓ System prompt definition
✗ No tools (can only give general advice)
✗ Cannot detect actual hardware
This basic agent cannot check real hardware. It can only provide generic advice based on LLM training data.
Add the first tool to detect system specs via LemonadeClient.
Implementation
Output Example
step2_hardware.py
Copy
from typing import Dict, Anyfrom gaia.agents.base.agent import Agentfrom gaia.agents.base.tools import toolfrom gaia.llm.lemonade_client import LemonadeClientclass HardwareDetectorAgent(Agent): """Agent that can check system hardware.""" def __init__(self, **kwargs): self.client = LemonadeClient(keep_alive=True) super().__init__(**kwargs) def _get_system_prompt(self) -> str: return """You are a hardware advisor for running local LLMs.Use get_hardware_info to check the user's actual system specs.Always check real hardware - never guess specifications.""" def _register_tools(self): client = self.client @tool def get_hardware_info() -> Dict[str, Any]: """Get system RAM, GPU, and NPU information.""" try: info = client.get_system_info() # Parse RAM ram_str = info.get("Physical Memory", "0 GB") ram_gb = float(ram_str.split()[0]) if ram_str else 0 # Get NPU info devices = info.get("devices", {}) npu_info = devices.get("npu", {}) npu_available = npu_info.get("available", False) return { "success": True, "ram_gb": ram_gb, "processor": info.get("Processor", "Unknown"), "npu_available": npu_available } except Exception as e: return {"success": False, "error": str(e)}# Test itagent = HardwareDetectorAgent()result = agent.process_query("How much RAM do I have?")
Copy
You: How much RAM do I have?[Tool: get_hardware_info]{"success": true, "ram_gb": 32.0, "processor": "AMD Ryzen 9", "npu_available": true}Agent: You have 32 GB of RAM and an AMD Ryzen 9 processor.Your system also has an NPU available for accelerated inference!
What you have: Real hardware detection via Lemonade Server. The agent can now tell users their actual specs.
Add the final tool that combines hardware info with model sizes to give recommendations.
Implementation
Output Example
step4_recommend.py
Copy
# Add to existing agent...@tooldef recommend_models(ram_gb: float, gpu_memory_mb: int = 0) -> Dict[str, Any]: """Recommend models based on available memory. Args: ram_gb: Available system RAM in GB gpu_memory_mb: Available GPU memory in MB (0 if no GPU) """ try: # Get all models models_result = list_available_models() if not models_result.get("success"): return models_result all_models = models_result.get("models", []) # Apply 70% rule max_model_size_gb = ram_gb * 0.7 # Filter models that fit fitting_models = [ model for model in all_models if model["size_gb"] <= max_model_size_gb and model["size_gb"] > 0 ] # Add runtime estimates for model in fitting_models: model["estimated_runtime_gb"] = round(model["size_gb"] * 1.3, 2) model["fits_in_ram"] = model["estimated_runtime_gb"] <= ram_gb return { "success": True, "recommendations": fitting_models, "max_model_size_gb": round(max_model_size_gb, 2), "total_fitting": len(fitting_models) } except Exception as e: return {"success": False, "error": str(e)}# Test itagent = HardwareDetectorAgent()result = agent.process_query("What size LLM can I run?")print(result)
Copy
You: What size LLM can I run?[Tool: get_hardware_info]{"ram_gb": 32.0, "npu_available": true}[Tool: recommend_models]{"max_model_size_gb": 22.4, "recommendations": [...], "total_fitting": 12}Agent: With your 32 GB RAM, you can safely run models up to 22.4 GB!Top recommendations:1. Qwen3-Coder-30B (18.5 GB) - Best for coding tasks2. Llama-3.1-8B (4.7 GB) - Great general purpose3. Qwen2.5-0.5B (0.3 GB) - Fast, runs on NPUYour NPU is available for accelerated inference on smaller models.
What you have: Complete hardware advisor! The agent detects hardware, queries models, and provides personalized recommendations.