Time to complete: 15-20 minutes
What you’ll build: A hardware analysis agent that recommends which LLMs you can run
What you’ll learn: LemonadeClient APIs, GPU/NPU detection, memory-based recommendations
Platform: Runs locally on AI PCs with Ryzen AI (NPU/iGPU acceleration)
Privacy-First AI: This agent runs entirely on your AI PC. Hardware detection and model recommendations happen locally—no data leaves your machine.
When users ask “What size LLM can I run?”, the answer depends on their actual hardware. Instead of guessing or looking up specifications manually, this agent:
Detects system RAM, GPU, and NPU via Lemonade Server
Queries the available model catalog with size estimates
Calculates which models fit in available memory
Provides personalized recommendations based on real hardware specs
What you’re building:A hardware advisor agent that combines:
LemonadeClient SDK - System info and model catalog APIs
Platform-specific detection - Windows PowerShell / Linux lspci for GPU info
Memory calculations - 70% rule for safe model sizing
Interactive CLI - Natural language queries about capabilities
Local execution - Runs entirely on your AI PC using Ryzen AI acceleration
Use this path if you want to modify GAIA source code or run the example from examples/hardware_advisor_agent.py.
2
Start Lemonade Server
Copy
# Start local LLM server with AMD NPU/iGPU accelerationlemonade-server serve
Lemonade Server provides AMD-optimized inference for AI PCs with Ryzen AI. It also exposes system info and model catalog APIs that this agent uses.
3
Run the Hardware Advisor
PyPI Installation
Developer Installation
Create hardware_advisor.py in your project folder and follow the step-by-step build below, or copy the complete example from the Building It section.
Copy
python hardware_advisor.py
Run the example directly from the repository:
Copy
uv run examples/hardware_advisor_agent.py
Try asking:
“What size LLM can I run?”
“Show me my system specs”
“What models are available?”
“Can I run a 30B model?”
4
See it in action
Example interaction:
Copy
You: What size LLM can I run?Agent: Let me check your hardware specs...[Tool: get_hardware_info]RAM: 32 GB, GPU: AMD Radeon RX 7900 XTX (24 GB), NPU: Available[Tool: recommend_models]Based on your 32 GB RAM, you can safely run models up to ~22 GB.Agent: Great news! With 32 GB RAM and a 24 GB GPU, you can run:- 30B parameter models (like Qwen3-Coder-30B)- Most 7B-14B models comfortably- NPU acceleration available for smaller models
The agent requires Lemonade Server to be running for hardware detection and model catalog queries. GAIA auto-starts it on first use if not running.
# PowerShell query (wmic deprecated on Windows 11)ps_command = ( "Get-WmiObject Win32_VideoController | " "Select-Object Name,AdapterRAM | " "ConvertTo-Csv -NoTypeInformation")result = subprocess.run( ["powershell", "-Command", ps_command], capture_output=True, text=True, timeout=5)# Parse CSV output for GPU name and VRAM
Copy
# lspci for VGA devicesresult = subprocess.run( ["lspci"], capture_output=True, text=True, timeout=5)# Parse output for "VGA compatible controller" lines# Note: Memory not available via lspci
Now let’s build this agent incrementally using a single file that grows with each step. We’ll build exactly what’s in examples/hardware_advisor_agent.py by adding functionality progressively.
Building Strategy: You’ll create ONE file called hardware_advisor.py and progressively add features to it. Each step builds on the previous one, and by Step 6 you’ll have code that matches the example exactly.
Start by creating the file with a minimal agent structure—just the class and a basic system prompt. This creates the foundation, but the agent has no tools yet and cannot answer queries.
from gaia import Agentfrom gaia.llm.lemonade_client import LemonadeClientclass HardwareAdvisorAgent(Agent): """Agent that advises on LLM capabilities based on your hardware.""" def __init__(self, **kwargs): self.client = LemonadeClient(keep_alive=True) super().__init__(**kwargs) def _get_system_prompt(self) -> str: return "You are a hardware advisor for running local LLMs on AMD systems." def _register_tools(self): # Tools will be added in the next steps passif __name__ == "__main__": agent = HardwareAdvisorAgent() print("Agent created successfully!")
Formatting Note: When pasting code, ensure proper Python indentation (4 spaces for class/function bodies). Use an IDE with Python support to avoid alignment issues.
Don’t try to query this agent yet! It has no tools, so it cannot check hardware or recommend models. Continue to Step 2 to add GPU detection and hardware tools.
Add the _get_gpu_info() helper method and the full get_hardware_info() tool with GPU object, NPU object, and OS field. This makes the agent interactive—you can now query it about system specs!
Add the _get_gpu_info() helper after the _get_system_prompt() method:
Copy
def _get_gpu_info(self) -> Dict[str, Any]: """Detect GPU using OS-native commands.""" import platform import subprocess system = platform.system() try: if system == "Windows": # Use PowerShell Get-WmiObject (wmic is deprecated on Windows 11) ps_command = ( "Get-WmiObject Win32_VideoController | " "Select-Object Name,AdapterRAM | " "ConvertTo-Csv -NoTypeInformation" ) result = subprocess.run( ["powershell", "-Command", ps_command], capture_output=True, text=True, timeout=5, ) if result.returncode == 0: lines = [ l.strip() for l in result.stdout.strip().split("\n") if l.strip() ] # CSV format: "Name","AdapterRAM" for line in lines[1:]: # Skip header # Remove quotes and split line = line.replace('"', "") parts = line.split(",") if len(parts) >= 2: try: name = parts[0].strip() adapter_ram = ( int(parts[1]) if parts[1].strip().isdigit() else 0 ) if name and len(name) > 0: return { "name": name, "memory_mb": ( adapter_ram // (1024 * 1024) if adapter_ram > 0 else 0 ), } except (ValueError, IndexError): continue elif system == "Linux": # Use lspci to find VGA devices result = subprocess.run( ["lspci"], capture_output=True, text=True, timeout=5 ) if result.returncode == 0: for line in result.stdout.split("\n"): if "VGA compatible controller" in line: # Extract GPU name after the colon parts = line.split(":", 2) if len(parts) >= 3: return { "name": parts[2].strip(), "memory_mb": 0, # Memory not available via lspci } except Exception as e: # Debug output print(f"GPU detection error: {e}") return {"name": "Not detected", "memory_mb": 0}
Replace the _register_tools() method with:
Copy
def _register_tools(self): client = self.client agent = self @tool def get_hardware_info() -> Dict[str, Any]: """Get detailed system hardware information including RAM, GPU, and NPU.""" try: # Use Lemonade Server's system info API for basic info info = client.get_system_info() # Parse RAM (format: "32.0 GB") ram_str = info.get("Physical Memory", "0 GB") ram_gb = float(ram_str.split()[0]) if ram_str else 0 # Detect GPU gpu_info = agent._get_gpu_info() gpu_name = gpu_info.get("name", "Not detected") gpu_available = gpu_name != "Not detected" gpu_memory_mb = gpu_info.get("memory_mb", 0) gpu_memory_gb = ( round(gpu_memory_mb / 1024, 2) if gpu_memory_mb > 0 else 0 ) # Get NPU information from Lemonade devices = info.get("devices", {}) npu_info = devices.get("npu", {}) npu_available = npu_info.get("available", False) npu_name = ( npu_info.get("name", "Not detected") if npu_available else "Not detected" ) return { "success": True, "os": info.get("OS Version", "Unknown"), "processor": info.get("Processor", "Unknown"), "ram_gb": ram_gb, "gpu": { "name": gpu_name, "memory_mb": gpu_memory_mb, "memory_gb": gpu_memory_gb, "available": gpu_available, }, "npu": {"name": npu_name, "available": npu_available}, } except Exception as e: return { "success": False, "error": str(e), "message": "Failed to get hardware information from Lemonade Server", }
Update the __main__ block to enable interactive testing:
Copy
if __name__ == "__main__": agent = HardwareAdvisorAgent() print("Hardware Advisor Agent (Ctrl+C to exit)") print("Try: 'Show me my system specs'\n") while True: try: query = input("You: ").strip() if query: agent.process_query(query) print() except KeyboardInterrupt: print("\nGoodbye!") break
Formatting Note: When pasting code, ensure proper Python indentation. Method bodies should be indented 4 spaces from the class level, and nested blocks should maintain consistent 4-space indentation.
Add the list_available_models() tool with labels field and message in the response. Now the agent can tell you what models are available in the Lemonade catalog.
Add this tool inside the _register_tools() method (after the get_hardware_info function):
Copy
@tooldef list_available_models() -> Dict[str, Any]: """List all models available in the catalog with their sizes and download status.""" try: # Fetch model catalog from Lemonade Server response = client.list_models(show_all=True) models_data = response.get("data", []) # Enrich each model with size information enriched_models = [] for model in models_data: model_id = model.get("id", "") # Get size estimate for this model model_info = client.get_model_info(model_id) size_gb = model_info.get("size_gb", 0) enriched_models.append( { "id": model_id, "name": model.get("name", model_id), "size_gb": size_gb, "downloaded": model.get("downloaded", False), "labels": model.get("labels", []), } ) # Sort by size (largest first) enriched_models.sort(key=lambda m: m["size_gb"], reverse=True) return { "success": True, "models": enriched_models, "count": len(enriched_models), "message": f"Found {len(enriched_models)} models in catalog", } except Exception as e: return { "success": False, "error": str(e), "message": "Failed to fetch models from Lemonade Server", }
The __main__ block stays the same (already interactive from Step 2)
Formatting Note: When pasting the tool code inside _register_tools(), maintain proper indentation. The @tool decorator and function definition should be indented 4 spaces, with the function body indented 8 spaces.
Add the complete recommend_models() tool with estimated_runtime_gb, fits_in_ram, fits_in_gpu, and constraints object. The agent can now calculate which models fit in your system’s memory!
Add this tool inside the _register_tools() method (after the list_available_models function):
Copy
@tooldef recommend_models(ram_gb: float, gpu_memory_mb: int = 0) -> Dict[str, Any]: """Recommend models based on available system memory. Args: ram_gb: Available system RAM in GB gpu_memory_mb: Available GPU memory in MB (0 if no GPU) Returns: Dictionary with model recommendations that fit in available memory """ try: # Get all available models models_result = list_available_models() if not models_result.get("success"): return models_result # Propagate error all_models = models_result.get("models", []) # Calculate maximum safe model size # Rule: Model size should be < 70% of available RAM (30% overhead for inference) max_model_size_gb = ram_gb * 0.7 # Filter models that fit in memory fitting_models = [ model for model in all_models if model["size_gb"] <= max_model_size_gb and model["size_gb"] > 0 ] # Add recommendation metadata for model in fitting_models: # Estimate actual runtime memory needed (model size + ~30% overhead) model["estimated_runtime_gb"] = round(model["size_gb"] * 1.3, 2) model["fits_in_ram"] = model["estimated_runtime_gb"] <= ram_gb # Check GPU fit if GPU available if gpu_memory_mb > 0: gpu_memory_gb = gpu_memory_mb / 1024 model["fits_in_gpu"] = model["size_gb"] <= (gpu_memory_gb * 0.9) # Sort by size (largest = most capable) fitting_models.sort(key=lambda m: m["size_gb"], reverse=True) return { "success": True, "recommendations": fitting_models, "total_fitting_models": len(fitting_models), "constraints": { "available_ram_gb": ram_gb, "available_gpu_mb": gpu_memory_mb, "max_model_size_gb": round(max_model_size_gb, 2), "safety_margin_percent": 30, }, } except Exception as e: return { "success": False, "error": str(e), "message": "Failed to generate model recommendations", }
The __main__ block stays the same (already interactive from Step 2)
Formatting Note: When pasting the tool code inside _register_tools(), maintain proper indentation. The @tool decorator and function definition should be indented 4 spaces, with the function body indented 8 spaces.
Replace the simple __main__ block with a full interactive CLI function. This adds a professional banner, better error handling, and graceful exit options.
Replace the entire if __name__ == "__main__": block at the end of the file with:
Copy
def main(): """Run the Hardware Advisor Agent interactively.""" print("=" * 60) print("Hardware Advisor Agent") print("=" * 60) print("\nHi! I can help you figure out what size LLM your system can run.") print("\nTry asking:") print(" - 'What size LLM can I run?'") print(" - 'Show me my system specs'") print(" - 'What models are available?'") print(" - 'Can I run a 30B model?'") print("\nType 'quit', 'exit', or 'q' to stop.\n") # Create agent (uses local Lemonade server by default) try: agent = HardwareAdvisorAgent() print("Agent ready!\n") except Exception as e: print(f"Error initializing agent: {e}") print("\nMake sure Lemonade server is running.") print("GAIA will start it automatically on first use.") return # Interactive loop while True: try: user_input = input("You: ").strip() if not user_input: continue if user_input.lower() in ("quit", "exit", "q"): print("Goodbye!") break # Process the query (agent prints the output) agent.process_query(user_input) print() # Add spacing except KeyboardInterrupt: print("\nGoodbye!") break except Exception as e: print(f"\nError: {e}\n")if __name__ == "__main__": main()
Formatting Note: When pasting the main() function, ensure the function body is indented 4 spaces, and nested blocks (try/except/while) maintain consistent 4-space indentation increments.
You’ll now get a full interactive session!Expected output:
Copy
============================================================Hardware Advisor Agent============================================================Hi! I can help you figure out what size LLM your system can run.Try asking: - 'What size LLM can I run?' - 'Show me my system specs' - 'What models are available?' - 'Can I run a 30B model?'Type 'quit', 'exit', or 'q' to stop.Agent ready!You: What size LLM can I run?[Agent processes and responds...]You: quitGoodbye!
Congratulations! You’ve built a fully functional hardware advisor agent! Your implementation includes OS-native GPU detection, smart memory-based recommendations, and an interactive CLI—all running locally on AMD hardware with Ryzen AI acceleration.