Memento-Skills Agent Framework That Rewrite Its Own Code

Memento-Skills: The Self-Evolving Agent Framework That Rewrites Its Own Code

What if your AI agent didn't just fail silently—but actually got smarter every time it messed up?

Most developers have been there. You deploy an LLM-powered agent, watch it handle a few tasks gracefully, then witness it collapse on something slightly novel. The typical response? Patch the prompt, add another tool, cross your fingers, and redeploy. It's a frustrating cycle of manual intervention that scales poorly and learns nothing from experience.

But here's the uncomfortable truth: we've been building agents that forget everything. They don't accumulate wisdom. They don't repair their own broken reasoning paths. They certainly don't redesign their own capabilities when faced with unfamiliar challenges. Every deployment is essentially Groundhog Day for your agent—same mistakes, same blind spots, same brittle behavior.

Enter Memento-Skills, the radical open-source framework that flips this paradigm entirely. Born from the Memento research team and freshly released as v0.3.0, this isn't another thin wrapper around someone else's API. It's a fully self-developed agent architecture built around one obsession: making agents learn from deployment experience, reflect on their failures, and rewrite their own skill code and prompts. No fine-tuning. No retraining. Zero parameter updates. Just pure, relentless self-evolution through external memory.

If you're tired of babysitting agents that plateau on day one, this framework might just be the secret weapon you've been hunting for.

What Is Memento-Skills?

Memento-Skills is a fully self-developed agent framework where skills are first-class citizens—retrievable, executable, persistent, and most critically, evolvable. Created by the Memento research team (Huichi Zhou, Siyuan Guo, and collaborators), it represents a fundamental departure from conventional agent architectures that treat tools as static function collections.

The framework operates on deployment-time learning: instead of updating model parameters through expensive pre-training or fine-tuning, Memento-Skills keeps the underlying LLM frozen and accumulates experience in an external skill memory. This enables continual adaptation from live interactions at zero retraining cost—a crucial advantage when working with proprietary APIs or resource-constrained environments.

What makes Memento-Skills genuinely trending in developer circles right now? Three forces are converging:

The open-source LLM explosion — With ecosystems like Kimi/Moonshot, MiniMax, and GLM/Zhipu maturing, developers need frameworks designed specifically for these platforms rather than OpenAI-centric assumptions.
Benchmark fatigue — The community is hungry for systems that demonstrate measurable learning curves on rigorous evaluations like HLE (Humanity's Last Exam) and GAIA, not just flashy demos.
Production deployment reality — CLI tools, desktop GUIs, IM integrations, and local sandbox execution matter more than ever for agents that need to survive outside Jupyter notebooks.

Memento-Skills delivers on all three fronts. It's not a research prototype duct-taped together for a paper—it's engineered for real deployment with persistent state, multi-platform IM bridges, and a reflection loop that runs continuously.

Key Features That Separate It From the Pack

Let's dissect what makes this framework technically distinctive:

Fully Self-Developed Agent Stack

Unlike frameworks that are essentially orchestration layers over LangChain or LlamaIndex, Memento-Skills ships its own complete runtime: orchestration, skill routing, execution engine, reflection system, storage backends, CLI, and GUI. This vertical integration enables the deep optimizations necessary for self-evolution.

4-Stage ReAct^{↗ Bright Coding Blog} Architecture with Finalize Phase

The reasoning pipeline follows Intent → Planning → Execution → Reflection, with a dedicated Finalize phase for structured result summarization. The Execution phase itself runs as a multi-step ReAct loop, enabling complex tool chaining and iterative problem-solving.

Read-Write Reflective Learning Loop

This is the architectural heart. When a user submits a task:

Read: The skill router retrieves candidate skills from local library or generates new ones
Execute: Skills run through tool calling in a local uv sandbox
Reflect: The system evaluates success/failure, updates utility scores, attributes issues to specific skills
Write: Weak skills get optimized, broken ones rewritten, new ones created when gaps exist

BM25 + Semantic Vector Hybrid Retrieval

As skill libraries grow, finding the right capability becomes critical. Memento-Skills combines traditional BM25 text search (via jieba tokenization) with sqlite-vec semantic search for robust skill routing even at scale.

Configuration v2: Three-Layer Isolation

System Config (read-only defaults) / User Config (persistent customization) / Runtime Config (merged at startup). Auto-migration preserves user values when templates update, with x-managed-by: user markers protecting customizations.

Multi-Platform IM Gateway

Real-time messaging across Feishu (WebSocket long-connection), DingTalk (webhook + event subscription), WeCom (enterprise integration), and WeChat (iLink API with QR binding)—all with per-user persistent sessions.

Skill Market with Cloud Catalogue

Search, download, and auto-install validated skills from a shared repository. The system moves toward deduplicated, quality-controlled reusable skills rather than chaotic plugin accumulation.

Real-World Use Cases Where Memento-Skills Dominates

1. Autonomous Research Assistant

Imagine an agent tasked with deep research across academic papers, web sources, and internal documents. Traditional agents would repeatedly fail on novel query formulations or unfamiliar PDF structures. Memento-Skills identifies which retrieval or parsing skill failed, optimizes it through reflection, and writes the improved version back to its library. Over weeks of deployment, the research assistant becomes measurably more capable without any developer intervention.

2. Enterprise IM Operations Bot

Deployed across Feishu, DingTalk, WeCom, and WeChat simultaneously, the agent handles diverse requests: scheduling, document generation, data queries, cross-platform notifications. When a new enterprise system joins the stack, the agent doesn't break—it creates new integration skills through its skill-creator capability, validates them in sandbox, and adds them to its repertoire. The IM-platform skill (new in v0.2.0) enables this natively.

3. Personal Productivity Agent with Long-Term Memory

The v0.3.0 Agent Profile system (core/agent_profile/) maintains persistent soul and user profiles. The background soul_evolver and user_evolver refine these from conversation history, while the dream/daemon consolidates experiences between sessions. Your agent genuinely learns your preferences, communication style, and recurring workflows—not through prompt engineering, but through structured profile evolution.

4. Benchmark-Driven Capability Development

For researchers and competitive developers, Memento-Skills demonstrates measurable improvement curves on HLE and GAIA benchmarks. The skill library grows from atomic primitives into semantically clustered, task-specific capabilities. This isn't skill accumulation—it's skill learning through task experience, with evaluation data to prove it.

Step-by-Step Installation & Setup Guide

Quick Start (One Line)

python^{↗ Bright Coding Blog} -m venv .venv && source .venv/bin/activate && pip install -e . && memento doctor && memento agent

This single command creates your environment, installs dependencies, validates your setup, and launches an interactive agent session.

Developer Installation

# Clone the repository
git clone https://github.com/Memento-Teams/Memento-Skills.git
cd Memento-Skills

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install in editable mode
pip install -e .

On first launch, the system auto-creates ~/memento_s/config.json. You'll need to configure your LLM profile:

{
  "llm": {
    "active_profile": "default",
    "profiles": {
      "default": {
        "model": "openai/gpt-4o",
        "api_key": "your-api-key",
        "base_url": "https://api.openai.com/v1",
        "max_tokens": 8192,
        "temperature": 0.7,
        "timeout": 120
      }
    }
  },
  "env": {
    "TAVILY_API_KEY": "your-search-api-key"
  }
}

Critical configuration notes:

The model field uses provider/model format: anthropic/claude-3.5-sonnet, ollama/llama3, or open-source endpoints like kimi/kimi-latest
TAVILY_API_KEY enables web search; without it, web-search skill fails
For open-source LLM ecosystems (Kimi, MiniMax, GLM), set appropriate base_url in your profile

Verification & Launch

memento doctor        # Comprehensive environment diagnostic
memento agent         # Interactive terminal session
memento agent -m "..." # Single-message batch mode
memento-gui           # Desktop GUI (Flet-based)

One-Click GUI Install (No Python Required)

Platform	Download
macOS (Apple Silicon)	Memento-S-0.3.1-arm64.dmg
Windows (x64)	Memento-S-win-x64-0.3.1.zip

Post-install: open settings, paste your LLM API key, done.

REAL Code Examples from the Repository

Example 1: Core Agent Initialization (bootstrap.py)

The v0.3.0 architecture introduces bootstrap.py as the centralized application initialization entry:

# bootstrap.py — Centralized application initialization
# This entry point wires together all v0.3.0 components:
# - infra/ layer for memory, context, and compaction
# - tools/ registry for unified tool discovery
# - core/agent_profile/ for persistent identity
# - daemon/ services for background evolution

from infra.service import InfraService
from tools.registry import get_registry
from core.agent_profile.manager import ProfileManager

def bootstrap():
    """
    Initialize the complete Memento-Skills runtime.
    
    The v0.3.0 redesign separates infrastructure from core logic,
    enabling independent evolution of agent capabilities and platform code.
    """
    # Initialize infrastructure services (memory, context, compaction)
    infra = InfraService()
    
    # Load unified tool registry with atomic tools and MCP integrations
    tool_registry = get_registry()
    
    # Load or create persistent agent profile
    profile_manager = ProfileManager()
    
    # Wire components and return configured agent runtime
    return AgentRuntime(infra=infra, tools=tool_registry, profile=profile_manager)

What this reveals: The v0.3.0 architecture explicitly decouples infrastructure from business logic. The InfraService entry point (infra/service.py) connects memory, context providers, and compaction pipelines without polluting core agent code. This separation is what enables the framework to evolve its storage backends, context strategies, and memory implementations without touching skill execution logic.

Example 2: Configuration v2 with Auto-Migration

The three-layer configuration system demonstrates production-grade configuration management:

// ~/memento_s/config.json — User Config layer (read-write)
{
  "llm": {
    "active_profile": "default",
    "profiles": {
      "default": {
        "model": "openai/gpt-4o",
        "api_key": "your-api-key",
        "base_url": "https://api.openai.com/v1",
        "max_tokens": 8192,
        "temperature": 0.7,
        "timeout": 120
        // x-managed-by: user marks protect this from auto-migration
      }
    }
  },
  "env": {
    "TAVILY_API_KEY": "your-search-api-key"
  }
}

Behind the scenes: When the codebase's system_config.json template updates (new fields, changed defaults), the system automatically merges changes while preserving values marked with x-managed-by: user. This Pydantic-validated, JSON Schema-backed configuration enables IDE auto-completion and prevents the silent configuration drift that plagues long-running agent deployments.

Example 3: Agent Profile Evolution (daemon/agent_profile/orchestrator.py)

The v0.3.0 background evolution system:

# daemon/agent_profile/orchestrator.py
# Background orchestrator for persistent agent identity evolution

from daemon.agent_profile.soul_evolver import SoulEvolver
from daemon.agent_profile.user_evolver import UserEvolver

class ProfileOrchestrator:
    """
    Triggers periodic refinement of agent soul and user profiles.
    
    The soul maintains long-term identity, traits, and policies.
    The user profile tracks per-user preferences and interaction history.
    Both evolve from recent conversations and outcomes without human intervention.
    """
    
    def __init__(self):
        self.soul_evolver = SoulEvolver()
        self.user_evolver = UserEvolver()
    
    async def run_evolution_cycle(self):
        """
        Scheduled by daemon loop. Refines profiles from accumulated experience.
        This is analogous to skill reflection, but applied to agent identity.
        """
        # Evolve agent's core personality and decision policies
        await self.soul_evolver.evolve_from_recent_sessions()
        
        # Adapt to specific user's communication patterns and preferences
        await self.user_evolver.evolve_from_interaction_history()

Why this matters: Most agent frameworks have no concept of persistent identity. Memento-Skills treats agent personality and user relationships as learnable, evolvable structures. The dream/daemon consolidation loop (similar to how reflection updates skills) runs between sessions, ensuring your agent doesn't reset to a blank slate.

Example 4: Skill Verification Pipeline

# Verify a skill: download, audit, and execute validation
memento verify

# This triggers the complete validation pipeline:
# 1. Static review of skill structure and dependencies
# 2. Sandbox execution with isolated uv environment
# 3. Output validation against expected schema
# 4. Security policy checks (path validation, argument sanitization)

The core/skill/downloader/ pipeline (new in v0.3.0) and tools/atomics/ execution layer ensure that skills from the cloud marketplace are validated before integration. The shared/security/ primitives enforce path and argument security across all tool invocations.

Example 5: Runtime Dependency Auto-Resolution

# utils/runtime_requirements/ — Auto-installer for missing dependencies
# Resolves packages on first use without manual pip install

from utils.runtime_requirements.checker import RequirementChecker

checker = RequirementChecker()
checker.ensure("pandas")  # Auto-installs if missing in sandbox environment
checker.ensure("openpyxl")  # Required for xlsx skill

This eliminates the "dependency hell" common in agent deployments where each skill demands different packages. The runtime checks and auto-installs, keeping the base environment minimal.

Advanced Usage & Best Practices

Optimize Your Reflection Loop

The reflection phase is computationally expensive. For production deployments, configure context compaction (infra/compact/) to prevent unbounded growth. The compaction pipeline includes multiple strategies for long-conversation summarization without losing critical learning signals.

Leverage MCP Integration

The tools/mcp/ module wraps external MCP (Model Context Protocol) servers as first-class tools. This lets you extend capabilities without writing custom skill code—just configure the MCP endpoint and let the registry handle discovery.

Profile-Based LLM Routing

Use multiple LLM profiles for different task types: a fast, cheap model for simple skill routing; a powerful model for reflection and skill generation; specialized endpoints for specific domains. The profile system makes this trivial to configure.

Monitor Skill Utility Scores

Skills carry utility scores updated through reflection. Periodically audit low-scoring skills with memento verify—they're candidates for regeneration or removal. A bloated skill library degrades retrieval accuracy.

Enable Dream Daemon for Long-Running Deployments

The daemon/dream/consolidator.py runs between sessions to merge short-term experiences into long-term memory and skill candidates. Essential for agents with multi-day or multi-week deployment lifecycles.

Comparison with Alternatives

Dimension	Memento-Skills	OpenClaw	AutoGPT	LangChain Agents
Core Focus	Self-evolution through reflection	Real-world assistant deployment	Autonomous goal pursuit	General orchestration
Skill Learning	Native read-write reflective loop	External plugin-based growth	Manual tool addition	Static tool definitions
Skill Retrieval	BM25 + vector hybrid, optimized for large libraries	Context-dependent hit-rate	Flat tool list	Depends on implementation
Failure Handling	Skill-level attribution and repair	Retry + human intervention	Loop detection, limited repair	Exception-based
Configuration	Three-layer with auto-migration	Standard config files	Environment variables	Code-based
IM Integration	Native 4-platform gateway	Platform-specific bridges	Limited	Requires custom build
Open-Source LLM Focus	Kimi, MiniMax, GLM optimized	General compatibility	OpenAI-centric	Provider-agnostic
Benchmark Validation	HLE + GAIA learning curves	Usability emphasis	Limited formal eval	Framework-dependent
Execution Isolation	uv sandbox	Varies	Docker^{↗ Bright Coding Blog}-based	None native
GUI/CLI	Both native	CLI primary	Web UI	Requires external

The verdict: Choose Memento-Skills when you need agents that improve autonomously over deployment time. Choose alternatives when you need maximum ecosystem breadth or simpler, static capability stacks.

FAQ: What Developers Actually Ask

Q: Does Memento-Skills require fine-tuning my LLM? A: Absolutely not. The entire learning mechanism operates through external skill memory—no parameter updates, no GPU clusters, no training data curation. Your model weights stay frozen.

Q: Which LLM providers work best? A: The profile system is especially optimized for Kimi/Moonshot, MiniMax, GLM/Zhipu, and other OpenAI-compatible endpoints. Anthropic Claude, OpenAI GPT, Ollama, and self-hosted vLLM/SGLang also work via standard configuration.

Q: How does skill evolution handle security? A: All skill execution runs in a uv sandbox with path validation (shared/security/), argument sanitization, and execution policies (tool_gate, path_validator, pre_execute). Generated skills undergo the same validation as human-written ones.

Q: Can I migrate from v0.2.0 to v0.3.0 easily? A: Yes, but update your imports. builtin.tools.* → tools.*; core/shared/* (compact, memory, context) → infra/*; core/manager/ → shared/chat/. The configuration auto-migrates.

Q: What's the difference between reflection and the dream daemon? A: Reflection happens immediately after task execution, updating skill utility and repairing failures. Dream daemon runs asynchronously between sessions, consolidating experiences into long-term memory and proposing new skill candidates.

Q: Is the GUI required, or can I run headless? A: Fully headless-capable. memento agent provides interactive CLI; memento agent -m "..." enables single-message scripting; IM gateways run as background services.

Q: How large can the skill library grow before retrieval degrades? A: The BM25 + vector hybrid search is designed for scalability. Semantic clustering (visible in benchmark results) naturally organizes skills into meaningful groups. Periodic verification and low-utility skill pruning maintain performance.

Conclusion: Deploy an Agent That Actually Learns

Memento-Skills represents a genuine paradigm shift in how we build autonomous systems. Instead of accepting agents that peak on deployment day and decay thereafter, this framework embraces deployment-time learning as a first-class concern. The read-write reflective loop, persistent agent profiles, and background dream consolidation create something rare: an agent that compounds capability through experience.

The v0.3.0 architecture—with its clean infrastructure separation, unified tool registry, and expanded shared layer—demonstrates this isn't experimental code. It's engineered for production: CLI, GUI, four-platform IM integration, local sandbox execution, and rigorous benchmark validation.

My take? If you're building agents for long-running deployment, personal assistance, or any domain where tasks evolve and failures teach, Memento-Skills deserves your immediate attention. The alternative is manually patching prompts forever while your agent forgets everything it ever learned.

Stop building amnesiac agents. Start building agents that evolve.

👉 Get started now: github.com/Memento-Teams/Memento-Skills

Join the Discord community, explore the project site, and watch your agent design itself.

Citation: Zhou, H., Guo, S., Liu, A., et al. (2026). Memento-Skills: Let Agents Design Agents. arXiv:2603.18743.