Stop Building AI Agents From Scratch! Use Deep Agents Instead

What if I told you that every AI agent project you've built from scratch was a waste of time? Not the learning—that was invaluable. But the boilerplate. The endless plumbing. The context management nightmares that hit at 2 AM when your agent forgets everything from three messages ago. The sub-agent orchestration that turns into spaghetti code. The filesystem operations that somehow break between sandboxed and local environments.

Sound familiar? You're not alone. The AI agent space has exploded, yet most developers are still cobbling together the same infrastructure pieces—tool calling, memory persistence, human approval loops, context summarization—like it's 2023 all over again. Meanwhile, a quiet revolution is happening at github.com/langchain-ai/deepagents. LangChain's team didn't just build another agent framework. They built something dangerously close to Claude Code's secret sauce, then made it open-source, model-agnostic, and production-ready out of the box.

This is Deep Agents. And if you're still hand-rolling agent harnesses, you're working too hard.

What Is Deep Agents?

Deep Agents is an open-source agent harness—an opinionated, batteries-included system for building long-horizon, multi-step AI agents that actually work in production. Created by the LangChain team and inspired by Claude Code's architecture, it represents a deliberate evolution in how we think about agent infrastructure.

The repository's tagline says everything: "The batteries-included agent harness." But what does that actually mean? Unlike LangGraph (the graph runtime) or LangChain's create_agent (a minimal harness), Deep Agents sits at the highest abstraction layer of the LangChain ecosystem—providing pre-built, production-hardened implementations of the patterns every serious agent needs.

Here's the critical insight: Deep Agents isn't trying to replace LangGraph or LangChain. It composes with them. You can drop down to LangGraph for custom graph shapes, use LangChain's create_agent for lighter use cases, or deploy Deep Agents when you need the full harness—filesystem operations, sub-agent delegation, context management, and persistent memory—all working together from day one.

The project is trending now because the agent development community has hit an inflection point. We've moved past "can we build agents?" to "can we build agents that don't fall apart in production?" Deep Agents answers that question with a resounding yes, backed by LangGraph's streaming, persistence, and checkpointing infrastructure, plus first-class integration with LangSmith for tracing and evaluation.

Key Features That Eliminate Agent Development Pain

Deep Agents bundles capabilities that typically consume weeks of development time into a single, extensible package. Let's dissect what makes this harness genuinely powerful:

Sub-agents with Isolated Context Windows Complex tasks demand delegation. Deep Agents lets you spawn sub-agents with completely isolated contexts, preventing context pollution while enabling sophisticated multi-agent workflows. Each sub-agent operates independently—yet reports back through a unified orchestration layer.

Pluggable Filesystem Operations Read, write, edit, and search across local, sandboxed, or remote backends. The filesystem abstraction means your agent can manipulate codebases, generate reports, or persist intermediate results without hardcoding environment assumptions. Swap from local development to cloud sandboxing with configuration changes, not code rewrites.

Intelligent Context Management Long-horizon tasks break most agents. Deep Agents implements automatic summarization of lengthy conversation threads and can offload tool outputs to disk when context windows approach capacity. This isn't naive truncation—it's strategic memory management that preserves task coherence.

Sandboxed Shell Access Execute commands in your sandbox of choice. Whether you need Docker isolation, cloud execution environments, or controlled local access, the shell integration respects your security boundaries while giving agents genuine computational power.

Persistent Memory with Pluggable Backends Cross-session recall isn't bolted on—it's architected in. State and store backends are fully pluggable, enabling everything from Redis-backed ephemeral storage to durable database persistence for long-running agent identities.

Human-in-the-Loop Controls Production agents need guardrails. Deep Agents supports approval, editing, and rejection of tool calls before execution. This isn't all-or-nothing—configure granular intervention points based on tool risk, operation type, or custom heuristics.

Dynamic Skill Loading Reusable behaviors load on demand. Instead of bloating every agent invocation with every possible capability, skills activate contextually—keeping latency low and relevance high.

Universal Tool Integration Bring your own functions. Connect any MCP server. Deep Agents doesn't lock you into predefined tool ecosystems—it provides the harness, you provide the capabilities.

Model Agnosticism Frontier APIs (OpenAI, Anthropic, Google), open-weight models via Baseten or Fireworks, or fully local deployments through Ollama, vLLM, or llama.cpp—if it supports tool calling, it works.

Where Deep Agents Transforms Real-World Development

Autonomous Research and Report Generation

Imagine an agent that receives a topic, plans a research strategy, delegates web searches to sub-agents, synthesizes findings into structured documents, and persists everything to version-controlled files. Deep Agents handles the planning, delegation, filesystem operations, and context management automatically. You provide the search tools—it orchestrates the entire pipeline.

Codebase Analysis and Refactoring at Scale

Need to modernize a 100,000-line legacy codebase? Deep Agents can spawn isolated sub-agents for each module, execute refactoring scripts in sandboxed environments, maintain cross-module dependency awareness through persistent memory, and present changes for human approval before filesystem writes. The context management prevents the "lost in the codebase" failure mode that breaks simpler agents.

Multi-Step Data Processing Pipelines

ETL workflows with human validation gates become trivial. An agent ingests data, applies transformations via shell commands, validates outputs against schemas, routes anomalies for human review, and persists checkpointed state for resumability after interruptions. LangGraph's streaming and checkpointing infrastructure makes this reliable, not theoretical.

Continuous Learning Customer Support

Deploy an agent that maintains persistent memory of customer interactions across sessions, loads specialized skills for different product domains, delegates complex technical issues to sub-agents with access to internal documentation, and escalates to humans with full context preserved. The pluggable memory backends mean conversation history survives deployments and scaling events.

Step-by-Step Installation & Setup Guide

Getting started with Deep Agents takes under 60 seconds. The project uses modern Python packaging through uv, though standard pip works equally well.

Installation

# Using uv (recommended for speed and dependency resolution)
uv add deepagents

# Alternative with pip
pip install deepagents

Environment Configuration

Deep Agents requires API keys for your chosen LLM provider. Configure these as environment variables:

# OpenAI
export OPENAI_API_KEY="sk-..."

# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

# Or any other LangChain-compatible provider
export GOOGLE_API_KEY="..."

For local models, ensure your Ollama, vLLM, or llama.cpp server is running and accessible:

# Ollama example
ollama pull llama3.1
ollama serve

Basic Agent Creation

from deepagents import create_deep_agent

# Minimal configuration—just a model and a task
agent = create_deep_agent(
    model="openai:gpt-4o",  # Provider:model format
)

result = agent.invoke({"messages": "Summarize the key features of LangGraph"})
print(result)

Production-Ready Configuration

from deepagents import create_deep_agent
from langchain_community.tools import DuckDuckGoSearchRun

# Custom tool integration
search_tool = DuckDuckGoSearchRun()

# Full harness with all capabilities enabled
agent = create_deep_agent(
    model="anthropic:claude-3-5-sonnet-20241022",
    tools=[search_tool],  # Your custom tools + defaults
    system_prompt="""You are a senior research analyst. 
    Plan thoroughly, delegate searches to sub-agents, 
    and write structured reports to the filesystem.""",
    # Sub-agents, filesystem, context management, 
    # memory, and human-in-the-loop auto-configured
)

# Long-horizon research task
result = agent.invoke({
    "messages": "Research emerging vector database technologies, "
                "compare their architectures, and write a report to /output/report.md"
})

LangSmith Integration for Production Monitoring

export LANGSMITH_API_KEY="ls-..."
export LANGSMITH_TRACING="true"
export LANGSMITH_PROJECT="deep-agents-production"

With these variables set, all agent executions automatically trace to LangSmith—enabling debugging, evaluation, and performance monitoring without code changes.

REAL Code Examples from Deep Agents

The README provides a deceptively simple quickstart that conceals enormous power. Let's unpack it with production context:

Example 1: The Minimal Harness

from deepagents import create_deep_agent

# This single call instantiates the FULL harness:
# - Planning and reasoning loop
# - Filesystem read/write capabilities
# - Context summarization for long conversations
# - Sub-agent delegation ready
# - Persistent memory configured
agent = create_deep_agent(
    model="openai:gpt-5.5",  # Any LangChain chat model identifier
    tools=[my_custom_tool],  # Your functions + default toolset
    system_prompt="You are a research assistant.",  # Behavioral steering
)

# The invoke call triggers the complete agent lifecycle:
# 1. Plan formation based on the user message
# 2. Tool selection and execution (with optional human approval)
# 3. Context management as conversation grows
# 4. Sub-agent delegation for parallelizable subtasks
# 5. Result synthesis and return
result = agent.invoke({"messages": "Research LangGraph and write a summary"})

What's happening under the hood? The create_deep_agent factory doesn't just wrap an LLM call. It constructs a LangGraph CompiledStateGraph with pre-configured nodes for planning, tool execution, context summarization, and delegation. The invoke method executes this graph with full checkpointing—if interrupted, resume from the last persisted state.

Example 2: Deep Agents Code — Terminal Power

The repository reveals a separate, pre-built coding agent that demonstrates the harness's flexibility:

# One-line installation of the coding agent variant
curl -LsSf https://langch.in/dcode | bash

This installs dcode, a terminal-based coding agent comparable to Claude Code or Cursor—but powered by any LLM you choose. The same harness that runs research agents runs code generation, with filesystem operations mapped to your actual codebase, shell access sandboxed appropriately, and context management preventing the "lost in the files" problem.

Example 3: Custom Tool Integration Pattern

While the README shows tools=[my_custom_tool], the real power emerges with MCP servers:

from deepagents import create_deep_agent
from langchain_mcp_adapters import load_mcp_tools

# Load tools from any Model Context Protocol server
mcp_tools = load_mcp_tools("http://localhost:3000/sse")

agent = create_deep_agent(
    model="openai:gpt-4o",
    tools=mcp_tools,  # External tool ecosystem integrated seamlessly
)

# Agent now has access to all MCP server capabilities
# with full harness benefits: approval gates, logging, resumability

Example 4: Sub-Agent Composition

The FAQ reveals a critical architectural strength—LangGraph graphs plug directly into Deep Agents:

from deepagents import create_deep_agent
from langgraph.graph import StateGraph, END

# Your custom LangGraph for specialized processing
custom_graph = StateGraph(dict)
# ... build your custom nodes and edges ...
compiled_custom = custom_graph.compile()

# Use as sub-agent within the harness
agent = create_deep_agent(
    model="anthropic:claude-3-5-sonnet",
    sub_agents={
        "specialized_processor": compiled_custom,
        # Deep Agent delegates to this graph when task matches
    }
)

# The harness orchestrates, your custom graph executes
result = agent.invoke({"messages": "Process this with specialized pipeline"})

This layered composition—LangGraph → LangChain create_agent → Deep Agents—isn't accidental. It's deliberate architectural progression. Start with the harness, drop down when you need control, compose back up when you don't.

Advanced Usage & Best Practices

Optimize Context Windows Strategically Don't let conversations grow unbounded. Configure summarization triggers based on token counts or message thresholds. Offload large tool outputs (code files, search results) to filesystem storage, referencing paths in conversation instead of embedding full content.

Implement Tiered Human Approval Not all tools need equal scrutiny. Configure approval requirements based on operation risk: filesystem writes and shell commands require human review; read-only operations execute autonomously. Use LangSmith traces to identify where your approval configuration creates bottlenecks.

Design Skill Boundaries Carefully Skills load dynamically—design them as focused, composable units rather than monolithic capabilities. A "web_research" skill, "data_analysis" skill, and "report_writing" skill compose more flexibly than a single "do_everything" skill.

Leverage Persistent Memory for Agent Identity Store user preferences, project conventions, and accumulated knowledge in pluggable memory backends. An agent that remembers "we use snake_case in this codebase" or "this user prefers concise summaries" delivers dramatically better experiences.

Monitor Sub-Agent Delegation Patterns LangSmith traces reveal when your agent over-delegates or under-delegates. Optimize delegation thresholds based on task complexity estimates. The goal is parallelization without fragmentation—sub-agents should handle genuinely independent subtasks.

Deep Agents vs. Alternatives: The Honest Comparison

Capability	Deep Agents	LangChain `create_agent`	Raw LangGraph	AutoGPT/BabyAGI
Setup Time	Minutes	Minutes	Hours-Days	Hours
Filesystem Operations	Built-in, pluggable	Manual implementation	Manual implementation	Basic, often broken
Sub-Agent Delegation	Native with isolation	Manual orchestration	Possible, complex	Limited
Context Management	Automatic summarization + offload	Manual	Manual	Basic truncation
Production Streaming	Native (LangGraph)	Native	Native	Unreliable
Human-in-the-Loop	Configurable, granular	Manual implementation	Possible, complex	Rarely implemented
Persistent Memory	Pluggable backends	Manual	Manual	File-based, fragile
Model Flexibility	Any tool-calling LLM	Any tool-calling LLM	Any tool-calling LLM	Often OpenAI-locked
Tracing/Evaluation	LangSmith native	LangSmith native	LangSmith native	Ad-hoc
Abstraction Level	High (full harness)	Medium (minimal agent)	Low (graph runtime)	Medium (loop pattern)

The verdict? Choose Deep Agents when you need production agent capabilities without rebuilding infrastructure. Choose LangChain's create_agent for lighter use cases where bundled middleware adds overhead. Choose LangGraph directly when your problem demands custom graph topologies that don't fit the agent loop pattern. Avoid legacy autonomous agent frameworks unless you specifically need their particular architectural patterns—they typically lack the production hardening that LangGraph provides.

FAQ: What Developers Actually Ask

Does Deep Agents lock me into LangChain's ecosystem? No. While it leverages LangGraph and integrates optionally with LangSmith, the core harness is open-source MIT-licensed. Your tools, your models, your infrastructure. The composition pattern even lets you extract custom LangGraph graphs and run them independently.

How does this compare to building on OpenAI's Assistants API? Assistants API provides managed threads and tools—but only with OpenAI models, only in their environment, with limited observability. Deep Agents gives you equivalent capabilities with any model, full deployment flexibility, and production-grade tracing through LangSmith.

Can I use Deep Agents with my existing LangGraph projects? Absolutely. The FAQ explicitly confirms that any CompiledStateGraph can function as a sub-agent. Your existing graphs become composable components within the larger harness.

What about security? The agent can execute shell commands! Deep Agents follows a "trust the LLM" model with sandbox-level enforcement. The agent can only do what its tools allow—and you control tool availability, sandbox boundaries, and human approval gates. Don't expect the model to self-police; enforce boundaries at the infrastructure level.

Is this suitable for multi-tenant SaaS applications? Yes, with architectural attention. Use isolated memory backends per tenant, configure separate sandbox environments, and leverage LangSmith's project-based organization for trace separation. The pluggable backend design supports multi-tenancy patterns.

How do I migrate from my custom agent implementation? Start by identifying which Deep Agents features replace your custom code—likely filesystem operations, context management, and sub-agent orchestration. Migrate incrementally: first swap your agent loop for create_deep_agent, then migrate tools, then activate advanced features like persistent memory.

What's the performance overhead versus minimal implementations? The harness adds orchestration logic, but this is typically dwarfed by LLM API latency. Context summarization actually improves performance for long conversations by reducing token counts. Profile with LangSmith to identify actual bottlenecks in your specific workload.

Conclusion: The Agent Infrastructure You Should Have Built Yesterday

Deep Agents represents something rare in the AI tooling space: genuine architectural maturity. It doesn't chase novelty—it solves the boring, hard problems that separate demo agents from production systems. Context management that actually works. Sub-agent delegation that doesn't create chaos. Filesystem operations that adapt to your environment. Memory that persists across sessions. Human oversight that doesn't block autonomy.

The LangChain team looked at Claude Code, identified what makes it genuinely powerful, then open-sourced that DNA and made it model-agnostic. The result is a harness that eliminates weeks of infrastructure development without sacrificing the flexibility to customize at any layer.

If you're building agents in 2025 and still hand-rolling these capabilities, you're investing engineering time in solved problems. Worse, you're likely solving them less robustly than a production-hardened implementation that's already integrated with the best tracing and evaluation tooling available.

My recommendation? Stop building agent harnesses. Start building agent capabilities. Let github.com/langchain-ai/deepagents handle the infrastructure while you focus on what matters: the tools, the prompts, the skills, and the user experiences that differentiate your application.

Install it today. Deploy it tomorrow. Thank yourself next month when your agent handles a 50-step task without losing context—and you have the LangSmith traces to prove it.