PromptHub
Developer Tools Financial Technology

Stop Scraping SEC Data Manually! This MCP Server Changes Everything

B

Bright Coding

Author

14 min read
7 views
Stop Scraping SEC Data Manually! This MCP Server Changes Everything

Stop Scraping SEC Data Manually! This MCP Server Changes Everything

What if your AI assistant could pull exact financial figures from any public company's SEC filings—instantly, accurately, and without you writing a single line of scraping code? No more wrestling with XBRL parsers. No more brittle HTML scrapers breaking when EDGAR updates its interface. No more copy-pasting balance sheet numbers into spreadsheets at 2 AM before earnings calls.

Here's the painful reality: financial analysts, quant developers, and AI engineers waste hours every week just accessing structured data that is technically public. The SEC's EDGAR system contains the most valuable financial dataset on earth—over 20 million filings from every U.S. public company—but it's locked behind archaic interfaces, cryptic CIK numbers, and XML schemas that seem designed to repel humans.

What if I told you there's a bridge that connects your AI assistant directly to this goldmine? Not through fragile APIs with rate limits, but through a protocol designed specifically for AI tool use. Enter sec-edgar-mcp—a Model Context Protocol server that transforms how AI systems interact with financial regulatory data. Built by Stefano Amorelli and powered by the robust edgartools library, this open-source project is already turning heads in the fintech and AI engineering communities. And yes, it's verified on MseeP, carries a DOI for academic citation, and runs automated evaluation suites to ensure reliability.

Ready to stop fighting EDGAR and start leveraging it? Let's dive deep.

What is sec-edgar-mcp?

sec-edgar-mcp is a specialized Model Context Protocol (MCP) server that exposes SEC EDGAR filing data as structured tools for AI assistants. Created by Stefano Amorelli, this Python-based server translates between the MCP standard—pioneered by Anthropic to standardize how AI models discover and invoke external capabilities—and the complex reality of SEC regulatory filings.

The Model Context Protocol represents a seismic shift in AI architecture. Instead of hardcoding API integrations into every application, MCP creates a universal plug-and-play ecosystem where AI assistants dynamically discover available tools. Think of it like USB-C for AI capabilities: one standard, infinite peripherals. The sec-edgar-mcp server plugs into this ecosystem, offering AI models immediate access to company fundamentals, financial statements, insider trading records, and full filing text extraction.

Why is this trending now? Three converging forces: the explosive adoption of Claude Desktop and other MCP-compatible AI assistants, the growing frustration with traditional financial data APIs (Bloomberg Terminal costs $24,000/year; free alternatives break constantly), and the AI industry's hunger for grounded, verifiable data sources. The repository's badges tell the story—PyPI distribution, Conda availability, Python 3.11+ modern codebase, AGPL-3.0 open-source licensing, and that coveted MseeP verification mark signaling production-ready security.

Amorelli built this on edgartools, the most mature Python library for EDGAR interaction. This isn't a fragile wrapper around web scraping—it's a robust, XBRL-parsing, section-extracting powerhouse that handles the SEC's complex filing structures with precision. The project even includes academic citation support through Zenodo DOI registration, reflecting its serious research utility.

Key Features That Make sec-edgar-mcp Essential

Let's dissect what makes this server indispensable for anyone serious about financial AI:

Exact Numeric Precision with XBRL Parsing Financial data demands accuracy. The server doesn't just grab text—it parses XBRL (eXtensible Business Reporting Language) tags to extract structured financial statements. Your AI gets numbers, not approximate text extractions. Balance sheets, income statements, and cash flow statements arrive as precise, computable data structures.

Comprehensive Tool Coverage Across Four Critical Categories The server exposes four distinct tool categories, each solving real research workflows:

  • Company Intelligence: CIK lookup resolves company names to SEC identifiers; company info retrieves metadata; company facts access the entire XBRL-tagged historical dataset
  • Filing Retrieval: Full 10-K annual reports, 10-Q quarterly filings, 8-K current reports—with intelligent section extraction so your AI reads only relevant portions
  • Financial Statement Analysis: Direct access to parsed balance sheets, income statements, and cash flow statements without manual XBRL navigation
  • Insider Trading Surveillance: Form 3 (initial ownership), Form 4 (changes), and Form 5 (annual changes) transaction data for detecting executive trading patterns

Multiple Transport Protocols for Any Architecture Whether you're running Claude Desktop locally or orchestrating cloud AI pipelines, sec-edgar-mcp adapts. The default stdio transport works seamlessly with MCP-native clients. Need HTTP for platform integration? The streamable HTTP transport connects to tools like Dify without protocol translation layers.

Verified Security and Continuous Evaluation That MseeP badge isn't decoration—it's independent security verification. The automated evaluation suite using Promptfoo runs continuously via GitHub Actions, testing tool invocation accuracy against real EDGAR data. This isn't "hope it works"; it's "prove it works, repeatedly."

Zero Authentication Complexity The server uses SEC EDGAR's public access model. No API keys to expire, no credit cards to maintain, no rate limit anxiety. Just provide a user agent string (SEC requirement) and access the full public dataset.

Real-World Use Cases Where sec-edgar-mcp Dominates

Scenario 1: Automated Earnings Analysis at Scale A hedge fund analyst needs to compare Q3 revenue guidance across 50 semiconductor companies. Traditional approach: manually download 50 10-Q filings, locate MD&A sections, extract guidance language. With sec-edgar-mcp, their AI assistant iterates through CIK lookups, retrieves latest 10-Qs, extracts guidance sections via structured tools, and compiles comparative analysis—in minutes, not days.

Scenario 2: Insider Trading Pattern Detection A compliance officer monitors executive trading for suspicious patterns. Form 4 filings arrive irregularly; manual monitoring is impossible. The MCP server enables their AI to poll recent Form 4 submissions, aggregate transaction types (purchases, sales, option exercises), correlate with stock price movements, and flag anomalous patterns for human review.

Scenario 3: Academic Financial Research Reproducibility A PhD candidate studies the relationship between R&D capitalization and future stock returns. They need consistent financial statement data across 20 years for 2,000 companies. sec-edgar-mcp provides structured, citable data extraction (complete with DOI for methodology citation) that other researchers can replicate exactly—no black-box data vendor required.

Scenario 4: AI-Powered Investment Memo Generation A venture capitalist evaluates public market comparables for a private company. Their AI assistant uses sec-edgar-mcp to pull relevant 10-K financials, calculate valuation multiples, summarize business risk factors from Item 1A, and draft a comprehensive comparable analysis—all within their existing Claude Desktop workflow.

Step-by-Step Installation & Setup Guide

Getting sec-edgar-mcp running takes under five minutes. Choose your deployment path:

Docker (Recommended for Most Users)

The fastest path to production-ready operation. Ensure Docker is installed, then configure your MCP client:

{
  "mcpServers": {
    "sec-edgar-mcp": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "-e", "SEC_EDGAR_USER_AGENT=Your Name (your@email.com)",
        "stefanoamorelli/sec-edgar-mcp:latest"
      ]
    }
  }
}

Critical detail: The -i flag enables interactive mode for MCP's JSON-RPC communication over stdin/stdout. Without it, the container exits immediately. The --rm flag ensures cleanup after each session.

Replace Your Name (your@email.com) with your actual identity—the SEC requires this user agent for all EDGAR access.

Python Package Installation

For custom environments or development:

# Using pip
pip install sec-edgar-mcp

# Using conda
conda install -c stefanoamorelli sec-edgar-mcp

# Using uv (fastest)
uv pip install sec-edgar-mcp

After installation, run directly:

python -m sec_edgar_mcp.server

HTTP Transport for Platform Integration

Modern AI orchestration platforms like Dify, LangChain, or custom microservices need HTTP endpoints. Launch the server with streamable HTTP transport:

python -m sec_edgar_mcp.server --transport streamable-http --port 9870

Security warning: No authentication is built-in. Deploy only on private networks or behind your own auth layer. This design prioritizes simplicity for trusted environments.

Verification and Evaluation

Confirm your installation with the automated test suite:

git clone https://github.com/stefanoamorelli/sec-edgar-mcp.git
cd sec-edgar-mcp/evals
npm install
npm run eval

These Promptfoo-based evaluations verify tool accuracy against live EDGAR data, ensuring your deployment functions correctly.

REAL Code Examples from the Repository

Let's examine actual implementation patterns from the sec-edgar-mcp codebase and documentation.

Example 1: MCP Client Configuration (Docker Deployment)

The repository's quickstart demonstrates the canonical Docker-based configuration:

{
  "mcpServers": {
    "sec-edgar-mcp": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "-e", "SEC_EDGAR_USER_AGENT=Your Name (your@email.com)",
        "stefanoamorelli/sec-edgar-mcp:latest"
      ]
    }
  }
}

What's happening here? This JSON configures an MCP client (like Claude Desktop) to spawn the SEC EDGAR server as a subprocess. The command specifies Docker as the runtime. The args array constructs the full docker run invocation. The -i flag is non-negotiable—it keeps stdin open for the JSON-RPC message protocol that MCP uses for tool discovery and invocation. The environment variable SEC_EDGAR_USER_AGENT satisfies SEC requirements; without a valid user agent, EDGAR blocks requests. The --rm flag ensures Docker removes the container after the MCP session ends, preventing resource leaks. This pattern exemplifies MCP's "server as subprocess" architecture—clean isolation, no persistent services, automatic cleanup.

Example 2: HTTP Transport for Cloud Platforms

For integration with modern AI platforms, the streamable HTTP transport eliminates subprocess complexity:

python -m sec_edgar_mcp.server --transport streamable-http --port 9870

Deep dive: This launches the same MCP server but exposes it over HTTP instead of stdio. The --transport parameter switches protocol handlers; --port binds to a specific interface. Streamable HTTP in MCP 2024-11-05 specification enables server-sent events for real-time tool responses—critical for long-running EDGAR queries that might take seconds. Why no authentication? MCP intentionally separates transport security from application logic. In production, deploy behind nginx with OAuth2, or within a Kubernetes service mesh with mTLS. The repository's documentation emphasizes private network deployment—heed this warning, as EDGAR data access logs to your IP.

Example 3: Running Automated Evaluations

The evaluation suite ensures your deployment works correctly:

cd evals && npm install && npm run eval

Technical explanation: This three-command sequence validates tool implementations against real SEC data. cd evals enters the evaluation directory containing Promptfoo configuration. npm install fetches the Promptfoo testing framework and test definitions. npm run eval executes the full suite—likely testing CIK resolution accuracy, filing retrieval correctness, financial statement parsing precision, and insider trading data completeness. The GitHub Actions badge (https://github.com/stefanoamorelli/sec-edgar-mcp/actions/workflows/evals.yml/badge.svg) shows these run automatically on every commit. For researchers, this evaluability is crucial: you can verify the exact behavior of the tools your AI invokes, ensuring reproducible financial analysis.

Example 4: Academic Citation

For research use, the repository provides precise citation metadata:

@software{amorelli_sec_edgar_mcp_2025,
  title = {{SEC EDGAR MCP (Model Context Protocol) Server}},
  author = {Amorelli, Stefano},
  version = {1.0.6},
  year = {2025},
  month = {9},
  url = {https://doi.org/10.5281/zenodo.17123166},
  doi = {10.5281/zenodo.17123166}
}

Why this matters: The Zenodo DOI (10.5281/zenodo.17123166) provides permanent, versioned citation. Unlike GitHub URLs that shift with repository transfers, DOIs persist. The version = {1.0.6} field enables precise reproducibility—cite exactly the version used in your analysis. The month = {9} indicates September 2025 release, helping readers contextualize against EDGAR system changes. For financial research where regulatory data access methods evolve, this citation precision prevents "works on my machine" replication failures.

Advanced Usage & Best Practices

Optimize CIK Lookup Caching: Company identifiers rarely change. Cache CIK lookups at your application layer to avoid repeated resolution calls. The SEC doesn't rate-limit aggressively, but redundant lookups add latency to multi-company analyses.

Leverage Section Extraction for Token Efficiency: Full 10-K filings exceed 100,000 tokens. Use the server's section extraction tools (Item 1A Risk Factors, Item 7 MD&A) to retrieve only relevant portions. This slashes API costs and improves AI focus.

Batch Financial Statement Requests: When analyzing historical trends, request multiple periods' financials in parallel. The XBRL parsing is CPU-intensive but stateless—perfect for concurrent processing.

Monitor EDGAR Filing Delays: The SEC has submission processing delays. For real-time analysis, know that Form 4 insider trading data appears within minutes, but 10-K amendments may take hours. Build appropriate polling intervals into your AI workflows.

Container Resource Limits: XBRL parsing large filings can spike memory. When deploying via Docker, set memory limits (--memory=2g) to prevent container OOM kills during complex financial statement extractions.

Version Pinning for Reproducibility: The latest tag is convenient but dangerous for research. Pin to specific versions (stefanoamorelli/sec-edgar-mcp:1.0.6) and update deliberately, verifying evals pass before migration.

Comparison with Alternatives

Capability sec-edgar-mcp Manual EDGAR Scraping Commercial APIs (Bloomberg, Refinitiv) SEC.gov Direct
Cost Free (AGPL-3.0) Free (labor-intensive) $15,000-$50,000/year Free
AI Integration Native MCP protocol Requires custom middleware Proprietary SDKs None
XBRL Parsing Built-in, exact precision Roll your own or use fragile libraries Often limited or extra cost Raw XBRL only
Setup Time 5 minutes Days to weeks Weeks (contracts, onboarding) Immediate but manual
Data Coverage Full EDGAR public dataset Full EDGAR public dataset Broader but selective Full EDGAR public dataset
Rate Limits SEC standard (generous) SEC standard Varies by contract SEC standard
Reproducibility DOI-cited, versioned Ad-hoc Black-box Manual process
Maintenance Burden Community-maintained Entirely yours Vendor-managed SEC-managed interface
Insider Trading Data Structured Form 3/4/5 tools Manual Form 4 parsing Often premium tier Raw HTML/txt

The verdict: sec-edgar-mcp occupies a unique position—free as in freedom, structured for AI, and precise for finance. Commercial APIs offer broader datasets but at crushing cost with opaque methodologies. Manual scraping gives control but consumes engineering resources better spent on analysis. The MCP server delivers the precision of commercial tools with the openness of public data, wrapped in a protocol designed for the AI era.

FAQ: Common Developer Concerns

Is sec-edgar-mcp legally compliant for commercial use? Yes, with conditions. The AGPL-3.0 license requires that any distributed derivative works also be open-sourced. For proprietary commercial applications, contact Stefano Amorelli at stefano@amorelli.tech for alternative licensing. The SEC data itself is public domain—no restrictions on usage.

What Python version do I need? Python 3.11 or newer. This requirement enables modern asyncio patterns and type hinting that the server relies on for MCP protocol compliance.

Can I use this with Claude Desktop? Absolutely—this is the primary use case. Add the Docker configuration to your Claude Desktop config, restart, and Claude discovers the tools automatically. The demo video in the repository shows this exact workflow.

How does this differ from just using edgartools directly? edgartools is a Python library for human programmers. sec-edgar-mcp exposes edgartools' capabilities through the Model Context Protocol, enabling AI assistants to invoke these functions autonomously. You get the same data precision with conversational AI interfaces.

What if the SEC changes EDGAR's structure? The underlying edgartools library maintains active compatibility. The MCP server's evaluation suite catches breaking changes. As an open-source project with community contributions, updates propagate faster than commercial alternatives with slower release cycles.

Is my user agent data private? The SEC logs access patterns but doesn't publish individual user agent data. The user agent requirement exists for abuse prevention. Use a professional email—avoid personal addresses if concerned about exposure.

Can I run multiple MCP servers alongside sec-edgar-mcp? Yes—MCP is designed for multi-server composition. Your AI assistant can simultaneously access sec-edgar-mcp for financial data, a database MCP server for internal records, and a web search MCP server for news context.

Conclusion: The Future of Financial AI is Open and Connected

The sec-edgar-mcp server represents something bigger than convenient filing access—it embodies how AI infrastructure should evolve: open protocols, public data, verifiable precision. In an era where AI capabilities are increasingly gatekept by proprietary APIs and opaque data vendors, Stefano Amorelli's creation proves that the most valuable datasets can remain accessible while becoming powerfully usable.

For financial analysts, this means AI assistants that actually understand your data sources. For engineers, it means replacing brittle integration code with protocol-standard tool definitions. For researchers, it means reproducible, citable financial analysis workflows that don't depend on vendor whims.

The installation takes minutes. The evaluation suite proves reliability. The AGPL license ensures community ownership. And the MCP protocol means this isn't a siloed tool—it's a node in an emerging ecosystem of AI-capable services.

Stop scraping. Stop paying crushing data fees. Stop accepting approximate financial data.

Get started now: Clone the repository at github.com/stefanoamorelli/sec-edgar-mcp, run the Docker quickstart, and connect your AI assistant to the world's most important financial dataset. Your future self—reviewing AI-generated 10-K analyses at 9 AM instead of manually extracting data at midnight—will thank you.

The code is waiting. The data is public. The protocol is ready. What's your next move?

Comments (0)

Comments are moderated before appearing.

No comments yet. Be the first to share your thoughts!

Support us! ☕