MemoryMixinmust come before Agent in the class declaration. This is required because MemoryMixin overrides process_query and _execute_tool, both of which call super() to reach the Agent base class. If Agent is listed first, both overrides are silently shadowed.
from gaia.agents.base.agent import Agentfrom gaia.agents.base.memory import MemoryMixin# Correct -- MemoryMixin before Agentclass MyAgent(MemoryMixin, Agent): def __init__(self, **kwargs): self.init_memory() # Before super().__init__() super().__init__(**kwargs) def _register_tools(self): super()._register_tools() self.register_memory_tools() # Exposes 5 tools to the LLM
If you put Agent before MemoryMixin in the class declaration, tool logging and dynamic context injection will silently fail. Python’s MRO requires the mixin to appear first.
Validate Lemonade embedding service connectivity — raises RuntimeError if unreachable
Backfill embeddings for items missing them (up to 100 per startup)
Rebuild FAISS index from stored embeddings
apply_confidence_decay() — 30-day decay
reconcile_memory() — Hindsight-inspired, max 20 pairs
consolidate_old_sessions() — max 5 sessions
prune() — 90-day hard delete
Generate session UUID
Embedding is a hard requirement in v2. If the Lemonade embedding service is unavailable, init_memory() raises RuntimeError("Lemonade embedding service required for memory system"). There is no silent degradation to keyword-only search.
Returns the stable frozen prefix for the system prompt. Always includes proactive usage instructions for the LLM, plus any stored preferences, facts, skills, and error patterns. Nothing time-sensitive. This method is called automatically by Agent._get_mixin_prompts().
def get_memory_system_prompt(self) -> str
The output stays frozen for the entire session so the LLM inference engine can reuse its KV cache across turns. Always returns a non-empty string — even with zero stored memories, the instructions block is included so the LLM knows it has persistent memory tools. Example output (with stored memories):
=== MEMORY (Persistent Second Brain) ===You have persistent memory across sessions. USE IT PROACTIVELY:- When the user states a fact, preference, or commitment → call `remember` immediately- When the user asks what you know, what was discussed, or about a person/project → call `recall`- When information changes or is corrected → call `recall` to find the old item, then `update_memory`- When the user mentions a deadline or reminder → call `remember` with due_at (ISO 8601)- When the user wants to forget something → call `recall` to find it, then `forget`- BIAS TOWARD REMEMBERING: if in doubt, store it. It's better to remember too much than too little.- Every fact, preference, name, project detail, deadline, or observation is worth storing.Preferences: - tone: professional but friendly - code_style: black formatter, 88 char linesKnown facts: - Project uses React 19 with app router (confidence: 0.82) - User's name is Alex, role is tech lead (confidence: 0.95)Skills: - Deploy workflow: test → build → push → verify (confidence: 0.88) - Docker compose: always use --build flag on first run (confidence: 0.72) - Git bisect: use binary search for regression hunting (confidence: 0.65)Known errors to avoid: - execute_code: "import torch" fails -- torch not installed on this machine - pip install: always use --index-url for PyTorch packages
Example output (zero memories stored):
=== MEMORY (Persistent Second Brain) ===You have persistent memory across sessions. USE IT PROACTIVELY:[... same instructions ...]No memories stored yet. Start building your knowledge base by remembering what the user tells you.
Filters applied to the knowledge sections:
Includes items from global context + active context
Excludes items where sensitive=1
Excludes items where superseded_by IS NOT NULL (only current/active items)
Sorted by confidence descending
Hard limits: max 10 preferences, 5 facts, 3 skills, 5 errors
Hard cap on total output: 4000 chars (truncated with ... (memory truncated) if exceeded)
Returns the per-turn dynamic context that is prepended to the user message each turn. Contains the current time and upcoming/overdue items.
def get_memory_dynamic_context(self) -> str
This is injected into the user message (not the system prompt) so the frozen prefix is preserved for KV-cache reuse. Example output:
[GAIA Memory Context]Current time: 2026-03-25T10:30:00-0700 (Tuesday)Upcoming/overdue: - [DUE Mar 27] Online course starts next week - [OVERDUE Mar 24] Follow up on deployment reviewAfter mentioning a time-sensitive item, call update_memory to set reminded_at so you don't repeat yourself.
Always returns at least the current time. The upcoming/overdue section is included only when time-sensitive items are active. Returns an empty string only if init_memory() has not been called.
These methods handle the vector embedding pipeline for hybrid search. All are internal (_-prefixed) — you do not call them directly.
def _get_embedder(self) -> Any
Lazy-initializes a LemonadeProvider for embedding. Cached for the process lifetime. Raises RuntimeError if Lemonade is unreachable.
def _embed_text(self, text: str) -> np.ndarray
Embeds a single text string into a 768-dimensional vector via nomic-embed-text-v2-moe-GGUF. Returns a normalized numpy array suitable for cosine similarity via FAISS IndexFlatIP.
def _backfill_embeddings(self, limit: int = 100) -> int
Embeds knowledge items that are missing embeddings (e.g., after a v1 → v2 migration). Called automatically during init_memory() startup. Returns the number of items backfilled.
Combines vector similarity (FAISS) and keyword matching (FTS5 BM25) via Reciprocal Rank Fusion (RRF), then reranks with a cross-encoder. The full pipeline:
Embed query via Lemonade (nomic-embed-text-v2, 768-dim)
Information already captured — not included in output
—
The extraction fetches top-10 relevant existing items first via _hybrid_search(), so the LLM can see what already exists and decide whether to add, update, delete, or do nothing. This replaces v1’s regex-based heuristic extraction.Error handling: Invalid JSON → logged error, skip this turn. Timeout (3s) → logged warning, skip. Individual operation failure → logged, continue with remaining operations. No fallback to regex heuristics.
def consolidate_old_sessions(self, max_sessions: int = 5) -> Dict:
Distills old conversation sessions into durable knowledge before they age out at the 90-day prune boundary. Called automatically during init_memory() startup.Returns:{"consolidated": int, "extracted_items": int}Criteria for consolidation:
def reconcile_memory(self, max_pairs: int = 20) -> Dict:
Background reconciliation of high-similarity knowledge pairs. Detects and resolves contradictory, reinforcing, or weakening facts that were never co-retrieved during extraction. Called on startup after decay, before consolidation.Returns:{"pairs_checked": int, "reinforced": int, "contradicted": int, "weakened": int, "neutral": int}Process:
For each context, compute pairwise embedding similarity among active items
Flag pairs with cosine similarity > 0.85
For each flagged pair, a single LLM call classifies the relationship:
MemoryMixin hooks into the Agent lifecycle at 3 points. These are automatic — you do not call them directly.Hook 1: process_query() overridePrepends per-turn dynamic context (time + upcoming items) to the user message. Saves the original user input so _after_process_query can store the clean version without the context prefix.Hook 2: _execute_tool() overrideWraps every non-memory tool call to auto-log it to tool_history. If a tool fails, the error is automatically stored as knowledge (category="error") for future avoidance. Memory tools (remember, recall, etc.) are excluded from logging to avoid noise and recursion.Hook 3: _after_process_query() callbackCalled after process_query() completes. Stores both conversation turns (user + assistant) in the conversations table and runs Mem0-style LLM extraction (ADD/UPDATE/DELETE/NOOP operations against existing memory). For turns ≥ 20 words, the extraction pipeline fetches top-10 relevant existing items via _hybrid_search(), then asks the LLM to decide what operations to perform — no regex heuristic fallback.
The system prompt is deliberately split into two parts:
Part
Method
Where injected
Changes between turns?
Stable prefix
get_memory_system_prompt()
System prompt via _get_mixin_prompts()
No — frozen for KV-cache reuse
Dynamic context
get_memory_dynamic_context()
Prepended to user message each turn
Yes — current time, upcoming items
This design allows LLM inference engines (like Lemonade Server) to cache the attention computations for the system prompt and reuse them across conversation turns. Only the small dynamic section (typically 2-5 lines) changes per turn.
# Simplified flow inside MemoryMixin.process_query():def process_query(self, user_input, **kwargs): self._original_user_input = user_input # Save clean version dynamic = self.get_memory_dynamic_context() # Time + upcoming augmented = f"{dynamic}\n\n{user_input}" if dynamic else user_input return super().process_query(augmented, **kwargs) # System prompt stays frozen
Three tables in a single SQLite file. Schema version 2 adds vector embedding support and fact lineage tracking.
conversations — every conversation turn, persistent across sessions, with FTS5 index. v2 adds consolidated_at TEXT column for tracking which turns have been distilled to knowledge.
knowledge — persistent facts, preferences, errors, skills with FTS5 index, confidence scoring, context scoping, entity linking, temporal fields (due_at, reminded_at). v2 adds embedding BLOB (768-dim float32 vector) and superseded_by TEXT (fact lineage — ID of newer item that replaced this one).
tool_history — every tool call the agent makes, auto-logged with success/failure, duration, error messages
Schema migrations run automatically in MemoryStore.__init__(). v1 → v2 adds:
Deduplication: If a new entry has >80% word overlap (Szymkiewicz-Simpson coefficient) with an existing entry in the same category + context + entity scope, the existing entry is updated with the newer content. The newer fact is assumed to be more current.Validation:content must be non-empty (raises ValueError otherwise). Content longer than 2000 characters is silently truncated. due_at, if provided, is normalized to timezone-aware ISO 8601.Embedding: After storage, MemoryMixin immediately embeds the new item via _embed_text() and writes the embedding BLOB back via store_embedding(). The FAISS index is incrementally updated.
def search( self, query: str, # FTS5 search query category: str = None, # Filter by category context: str = None, # Filter by context entity: str = None, # Filter by entity include_sensitive: bool = False, # Include sensitive items top_k: int = 5, # Max results time_from: str = None, # ISO 8601 lower bound on created_at time_to: str = None, # ISO 8601 upper bound on created_at) -> List[Dict]:
Pure FTS5 keyword search with BM25 ranking. Uses AND semantics by default; if zero results, falls back to OR. Bumps confidence +0.02 on each recalled item. Filters on superseded_by IS NULL to return only current/active items.The time_from and time_to parameters add temporal filtering on created_at, narrowing results before BM25 ranking.
This is the keyword component of search. For full hybrid search (vector + BM25 + RRF + cross-encoder reranking), use MemoryMixin._hybrid_search(), which calls this method internally as one of its two retrieval signals.
Returns time-sensitive items due within N days or overdue. Filters out items that have already been reminded about (unless the due date has passed since the last reminder). Filters on superseded_by IS NULL to return only current/active items.
def update( self, knowledge_id: str, content: str = None, category: str = None, domain: str = None, metadata: dict = None, context: str = None, sensitive: bool = None, entity: str = None, due_at: str = None, reminded_at: str = None, superseded_by: str = None, # ID of newer item that replaces this one) -> bool: # False if ID not found
Only provided fields are changed. Sets updated_at to the current time. When content is updated, the stored embedding is cleared (embedding = NULL) to force re-embedding. The superseded_by parameter is used by the LLM extraction pipeline to mark old items as replaced by newer versions while preserving fact lineage.
Adjust confidence by delta, clamped to [0.0, 1.0]. Used internally by reconciliation (+0.05 reinforce, +0.1 contradict newer, -0.1 weaken) and hybrid search (+0.02 per recall for vector-only results).
Delete all knowledge entries with a given source (e.g., "discovery"). Returns the number of entries deleted. Used by gaia memory bootstrap --reset to clear discovery items.
def get_stats(self) -> Dict # Returns counts by category, context, entity, conversations, tools, temporaldef get_all_knowledge(self, category=None, context=None, entity=None, sensitive=None, search=None, sort_by="updated_at", order="desc", offset=0, limit=50, include_superseded=False) -> Dict # Returns: {"items": [...], "total": 142, "offset": 0, "limit": 50} # When include_superseded=False (default), filters on superseded_by IS NULLdef get_tool_summary(self) -> List[Dict] # Per-tool stats: total_calls, success_rate, avg_duration_ms, last_errordef get_activity_timeline(self, days: int = 30) -> List[Dict] # Daily activity counts for the last N daysdef get_recent_errors(self, limit: int = 20) -> List[Dict]def prune(self, days: int = 90) -> Dict # Delete tool_history and conversations older than N days. # Also prunes low-confidence knowledge (confidence < 0.1) last used > N days ago. # Returns: {"tool_history_deleted": N, "conversations_deleted": N, "knowledge_deleted": N} # Called automatically on agent startup via init_memory().def rebuild_fts(self) -> None # Rebuild all FTS5 indexes from source tables. # Use if search results seem wrong or incomplete. # Also available via POST /api/memory/rebuild-fts
def store_embedding(self, knowledge_id: str, embedding: bytes) -> bool # Store a float32 embedding BLOB for a knowledge item. # Called by MemoryMixin after store() to persist the vector. # Returns False if knowledge_id not found.def get_items_with_embeddings( self, category: str = None, context: str = None, entity: str = None, include_sensitive: bool = False, top_k: int = 100, time_from: str = None, # ISO 8601 lower bound on created_at time_to: str = None, # ISO 8601 upper bound on created_at) -> List[Dict] # Returns knowledge items that have embeddings (embedding IS NOT NULL, # superseded_by IS NULL). Includes the embedding BLOB in each dict. # Used to build/rebuild the FAISS index and for filtered vector retrieval.def get_items_without_embeddings(self, limit: int = 100) -> List[Dict] # Returns knowledge items missing embeddings (embedding IS NULL). # Used by _backfill_embeddings() during startup.
def get_unconsolidated_sessions( self, older_than_days: int = 14, min_turns: int = 5, limit: int = 5,) -> List[str] # Returns session_ids eligible for consolidation. # Criteria: all turns > older_than_days old, >= min_turns, at least one # turn with consolidated_at IS NULL.def mark_turns_consolidated(self, turn_ids: List[int]) -> int # Sets consolidated_at = now for the given conversation turn IDs. # Returns count of turns marked. Turns remain until 90-day prune; # consolidated_at prevents re-processing.
def get_items_for_reconciliation( self, context: str = None, limit: int = 100,) -> List[Dict] # Returns active knowledge items (superseded_by IS NULL) with embeddings, # suitable for pairwise similarity comparison during reconciliation.def get_sessions(self, limit: int = 20) -> List[Dict] # List conversation sessions with turn counts and preview text. # Used by the Memory Dashboard.def get_entities(self, limit: int = 100) -> List[Dict] # List all unique entities with knowledge counts and last update time. # Returns: [{"entity": "person:sarah_chen", "count": 5, "last_updated": "..."}, ...]def get_contexts(self, limit: int = 100) -> List[Dict] # List all contexts with knowledge counts. # Returns: [{"context": "work", "count": 42}, ...]
Local system scanner for day-zero bootstrap. Returns lists of discovered facts for user review — nothing is stored directly.
from gaia.agents.base.discovery import SystemDiscoverydiscovery = SystemDiscovery()results = discovery.scan_all()# results is a Dict[str, List[Dict]] — source name → list of discovered facts# Example: {"file_system": [...], "git_repos": [...], "installed_apps": [...]}# To iterate all findings:findings = []for source_name, items in results.items(): findings.extend(items)# Each item dict:# {content, category, context, entity, sensitive, confidence, source, approved}
Folder names + file extensions in project directories
Project names, languages used
scan_git_repos(paths)
.git/config files — remotes, branch names
Repo names, languages, remote URLs
scan_installed_apps()
Windows registry, Start Menu shortcuts
App inventory
scan_browser_bookmarks()
Chrome/Edge/Firefox bookmark files
Categorized sites and interests
scan_browser_history(days)
Browser history DBs (URLs only, no page content)
Top domains (all flagged sensitive)
scan_email_accounts()
Windows credential store — addresses only
Email addresses (all flagged sensitive)
Each method returns dicts like:
{ "content": "Project 'gaia' -- Python/TypeScript, github.com/amd/gaia", "category": "fact", "context": "work", "entity": "project:gaia", "sensitive": False, "confidence": 0.4, # Lower than user-stated (inferred) "source": "discovery", "approved": None, # Set by user review: True/False}
Discovery never reads file contents, email content, or browser page content. It reads names, extensions, URLs, and metadata only. All browser history and email items are auto-flagged as sensitive.
from gaia.agents.base.agent import Agentfrom gaia.agents.base.memory import MemoryMixinclass RememberBot(MemoryMixin, Agent): """A simple agent that remembers everything.""" def __init__(self): self.init_memory(context="global") super().__init__( name="RememberBot", system_prompt="You are a helpful assistant with persistent memory.", ) def _register_tools(self): super()._register_tools() self.register_memory_tools()# Usageagent = RememberBot()result = agent.process_query("My name is Alex and I prefer concise answers")# Memory auto-extracts: fact("My name is Alex"), preference("prefer concise answers")# Next session, system prompt includes these automatically
class WorkPersonalAgent(MemoryMixin, Agent): def __init__(self, context="work"): self.init_memory(context=context) super().__init__(name="DualContext") def _register_tools(self): super()._register_tools() self.register_memory_tools()agent = WorkPersonalAgent(context="work")agent.process_query("Remember: deploy with kubectl apply -f prod.yaml")# Stored in 'work' contextagent.set_memory_context("personal")agent.process_query("Remember: dentist appointment Thursday at 2pm")# Stored in 'personal' contextagent.set_memory_context("work")# System prompt now shows work items only (plus global)# Personal items are invisible until context switches back
These 5 tools are registered by register_memory_tools() and exposed to the LLM:
remember
Store a fact, preference, error, skill, note, or reminder. Supports category, domain, due_at, context, sensitive, entity.
recall
Search memory by query (hybrid: vector + BM25 + cross-encoder), category, context, entity, or time range. Returns results with IDs for use with update/forget.
update_memory
Modify an existing entry by ID. Only non-empty fields change. Use reminded_at="now" after mentioning time-sensitive items.
forget
Delete a specific memory entry by ID.
search_past_conversations
Search conversation history by keywords, time range, or both. Returns matching turns with timestamps and session IDs.
These map to CRUD operations: remember = create, recall = read, update_memory = update, forget = delete, plus search_past_conversations for history.
Auto-extracted by LLM from conversation, Mem0-style
0.4
error_auto
Auto-stored from tool failure
0.5
user
Manually created via dashboard
0.8
discovery
System scan during bootstrap
0.4
consolidation
Distilled from old conversation sessions
0.5
v2 replaces the v1 heuristic source (regex-based) with llm_extract (Mem0-style LLM extraction). The LLM sees both the conversation and existing memory, then decides what operations to perform (ADD/UPDATE/DELETE/NOOP). This produces higher-quality extractions with proper deduplication and contradiction resolution.