Stop Wasting Tokens on Dumb RAG! EdgeQuake's Graph Reasoning Exposed

Your vector database is lying to you. Yes, that shiny Pinecone or Weaviate instance you've been feeding chunks into? It's giving you semantic similarity theater — matching keywords without understanding a single relationship. Ask it "How does Tesla's battery supply chain impact European EV regulations through Chinese lithium mining?" and watch it crumble. Traditional RAG systems retrieve document chunks using vector similarity alone. This works for simple lookups but catastrophically fails on multi-hop reasoning — the kind of "How does X relate to Y through Z?" questions that actually matter in production.

The dirty secret? Vectors capture semantic similarity but annihilate structural relationships between concepts. Your chunks are floating in isolation, divorced from the very connections that give them meaning. Thematic questions? Relationship queries? Forget it. You're essentially running a glorified search engine with amnesia.

But what if your documents became a living knowledge graph — entities, relationships, communities — all queryable at Rust-powered speed? Enter EdgeQuake, the high-performance GraphRAG framework that transforms passive document storage into active intelligence networks. Inspired by the groundbreaking LightRAG algorithm and forged in Rust's zero-cost abstractions, EdgeQuake doesn't just retrieve information. It reasons across it. Ready to see what you've been missing?

What is EdgeQuake?

EdgeQuake is a high-performance Graph-RAG (Retrieval-Augmented Generation) framework written in Rust, created by Raphaël MANSUY — a Hong Kong-based developer building the future of intelligent document retrieval. Born from the academic insights of the LightRAG paper (Guo et al., 2024), EdgeQuake takes the theoretical promise of knowledge graph-enhanced retrieval and engineers it into a production-ready, blazing-fast system.

The core philosophy is radical in its simplicity: don't just chunk and embed documents — decompose them into structured knowledge. During ingestion, Large Language Models extract entities (people, organizations, technologies, concepts) and map their relationships. This graph structure is stored alongside vector embeddings in PostgreSQL with Apache AGE and pgvector extensions. At query time, EdgeQuake traverses both the vector space and the graph topology, combining the speed of similarity search with the reasoning power of graph traversal.

Why Rust? Because Python's Global Interpreter Lock is a performance death sentence for concurrent document processing. EdgeQuake's Tokio-based async runtime handles thousands of concurrent requests with zero-copy operations and memory safety guarantees that eliminate entire classes of production bugs. The result: 5x faster hybrid queries, 10x more concurrent users, and 4x lower memory per document compared to traditional RAG stacks.

Currently at v0.11.3, EdgeQuake has rapidly evolved from experimental prototype to enterprise-ready platform — adding Mistral La Plateforme as a first-class citizen, production-hardened PDF vision processing, knowledge injection for domain glossaries, and MCP (Model Context Protocol) integration for AI agent interoperability. It's not just trending on GitHub; it's trending because it works.

Key Features That Destroy Traditional RAG

🚀 Rust-Powered Performance Architecture

EdgeQuake's technical foundation is deliberately engineered for speed and safety:

Async-First Tokio Runtime: Every I/O operation — LLM calls, database queries, file processing — is non-blocking. Thousands of concurrent document ingestions don't create thread explosion.
Zero-Copy Memory Management: Rust's ownership model eliminates unnecessary data cloning. Documents flow through the pipeline with minimal allocation overhead.
Parallel Entity Extraction: Multi-threaded LLM calls for entity and relationship extraction across document chunks, saturating available compute.
SQL Pre-Filtering with GIN + B-Tree Indexes: Metadata filters (tenant, workspace, document) are pushed to PostgreSQL WHERE clauses before vector search — reducing wasted vector scans by up to 90% at scale.

💉 Knowledge Injection (v0.8.0+)

Domain expertise shouldn't require retraining. EdgeQuake's Knowledge Injection system lets you:

Inject acronym definitions and synonym mappings that automatically expand query terms
Create invisible citations — enrichment entries that improve graph quality without cluttering source attribution
Upload .txt or .md glossary files via full CRUD API with background processing and status polling
See real-time entity counts in a dedicated /knowledge UI

🏷️ Custom Entity Configuration (v0.9.0+)

One-size-fits-all entity types are a recipe for generic extraction. EdgeQuake offers:

6 domain presets: General, Manufacturing, Healthcare, Legal, Research, Finance
Up to 50 custom entity types per workspace — define BEARING_TYPE, VIBRATION_ANOMALY, or any UPPERCASE_UNDERSCORED domain concept
Auto-normalization with live UI selector and backward compatibility for existing workspaces

📄 Production-Ready PDF Processing (v0.4.0+)

PDFs are where RAG systems go to die. EdgeQuake ships embedded pdfium (zero external config) with dual-mode extraction:

Text Mode: Fast pdfium-based extraction for standard PDFs
Vision Mode: GPT-4o, Claude 3.5+, or Gemini 2.5 reads each page as an image — handling scanned documents, complex tables, and multi-column layouts
Automatic Fallback: Vision failures gracefully degrade to text extraction (error code BR1010)
Safe Large-PDF Guardrails: Adaptive DPI/concurrency limits prevent memory spikes

🔍 Six Query Modes for Every Question Type

Mode	Latency	Best For
Naive	~100-300ms	Simple keyword-like lookups
Local	~200-500ms	Specific entity relationships
Global	~300-800ms	Thematic/high-level questions
Hybrid (default)	~400-1000ms	Balanced, comprehensive results
Mix	Variable	Weighted blend of vector + graph
Bypass	Fastest	Direct LLM without retrieval

🌐 Enterprise API & Frontend

OpenAPI 3.0 REST API with SSE streaming for real-time token generation
Kubernetes-ready health checks (/health, /ready, /live)
Fail-closed multi-tenant workspace isolation — invalid workspace selectors are rejected, not silently remapped
React 19 frontend with interactive Sigma.js graph visualizations
MCP (Model Context Protocol) integration — expose EdgeQuake capabilities to Claude, Cursor, and other AI agents

Real-World Use Cases Where EdgeQuake Dominates

1. Multi-Hop Legal Discovery

Law firms need to trace "How did Contract A's force majeure clause influence Settlement B's negotiation through Precedent C's interpretation?" Traditional RAG retrieves chunks mentioning each term separately. EdgeQuake's Local + Hybrid modes traverse the entity graph: CONTRACT_A → CONTAINS_CLAUSE → FORCE_MAJEURE → INFLUENCED → SETTLEMENT_B, with PRECEDENT_C as community context. The difference? Actual reasoning versus keyword coincidence.

2. Manufacturing Root Cause Analysis

A factory line fails. The question isn't "What mentions bearing failure?" — it's "Which supplier batch, maintenance schedule deviation, and operator training gap combined to cause this vibration anomaly?" With custom entity types like BEARING_TYPE, VIBRATION_ANOMALY, SUPPLIER_BATCH, EdgeQuake's graph reveals causal chains that vector similarity cannot reconstruct.

3. Pharmaceutical Research Synthesis

Researchers ask: "What drug interactions between Compound X and biological pathway Y have been observed in populations with genetic marker Z?" EdgeQuake's Global mode with community detection clusters related research, while Hybrid mode grounds specific mechanism claims in source documents. Knowledge Injection ensures domain acronyms (CYP450, IC50) are properly expanded.

4. Financial Compliance Monitoring

Regulatory queries demand precision: "Show me all transactions where Entity A indirectly benefited from Entity B through shell companies established after Sanction C was imposed." EdgeQuake's graph traversal with SQL pre-filtering (date ranges, jurisdiction metadata) eliminates false positives before vector search begins.

5. Technical Documentation Intelligence

Developer platforms need answers like "Which API version introduced the deprecation that broke the integration pattern used by our largest enterprise customer?" With MCP integration, AI agents can programmatically explore the knowledge graph, upload new documentation, and trace deprecation impact chains.

Step-by-Step Installation & Setup Guide

⚡ Option 1: One-Command Docker Deploy (~30 seconds)

Zero prerequisites. No Rust, no Node.js, no build tools.

# Download and run the interactive setup wizard
curl -fsSL https://raw.githubusercontent.com/raphaelmansuy/edgequake/edgequake-main/quickstart.sh | sh

The wizard handles everything:

Provider selection — OpenAI or Ollama (explicitly chosen, never guessed)
Model selection — curated menu with pricing visibility
API key validation — live check before starting
Stack startup — pulls images, starts services, health polling for 90 seconds
Re-run intelligence — detects existing containers, offers "Update & Reconfigure" or safe "Fresh Start"

Alternative direct compose methods:

# Pipe directly to docker compose
curl -fsSL https://raw.githubusercontent.com/raphaelmansuy/edgequake/edgequake-main/docker-compose.quickstart.yml \
  | docker compose -f - up -d

# Or download first, then start
curl -fsSL https://raw.githubusercontent.com/raphaelmansuy/edgequake/edgequake-main/docker-compose.quickstart.yml \
  -o docker-compose.quickstart.yml
docker compose -f docker-compose.quickstart.yml up -d

Access points:

Service	URL
Web UI	http://localhost:3000
API	http://localhost:8080
Swagger	http://localhost:8080/swagger-ui
Health	http://localhost:8080/health

Headless / CI deployment (no interactive terminal):

# OpenAI provider
EDGEQUAKE_LLM_PROVIDER=openai \
  OPENAI_API_KEY=sk-... \
  docker compose -f docker-compose.quickstart.yml up -d

# Mistral La Plateforme (v0.11.0+)
MISTRAL_API_KEY=... \
  docker compose -f docker-compose.quickstart.yml up -d

Management commands:

docker compose -f docker-compose.quickstart.yml logs -f    # tail logs
docker compose -f docker-compose.quickstart.yml ps         # check status
docker compose -f docker-compose.quickstart.yml down       # stop

Pin version for reproducibility: EDGEQUAKE_VERSION=0.10.8 sh quickstart.sh

🛠️ Option 2: Full Development Setup (5 minutes)

Prerequisites:

Rust 1.95+ (rustup.rs)
Node.js 18+ or Bun 1.0+ (nodejs.org)
Docker (docker.com)
Ollama (optional, for local LLMs — ollama.ai)

# 1. Clone the repository
git clone https://github.com/raphaelmansuy/edgequake.git
cd edgequake

# 2. Install all dependencies (Rust crates + Node packages)
make install

# 3. Configure frontend environment
cp edgequake_webui/.env.local.example edgequake_webui/.env.local

# 4. Start full stack (no authentication, for local development)
make dev

# Optional: start with authentication enabled
make dev-auth

Services available:

Backend: http://localhost:8080
Frontend: http://localhost:3000 (auto-selects next free port if busy)
Swagger UI: http://localhost:8080/swagger-ui

Environment Configuration for Production

Create edgequake/docker/.env from .env.example:

Variable	Purpose
`DATABASE_URL`	PostgreSQL connection (required for API-only deploy)
`EDGEQUAKE_LLM_PROVIDER`	`openai`, `anthropic`, `gemini`, `mistral`, `azure`, `vertexai`, `ollama`
`OPENAI_API_KEY` / `ANTHROPIC_API_KEY` / `MISTRAL_API_KEY` / etc.	Provider credentials
`OLLAMA_HOST`	Default: `http://host.docker.internal:11434`
`EDGEQUAKE_CHUNK_TIMEOUT_SECS`	Per-chunk LLM timeout (default: 180s)
`EDGEQUAKE_MAX_CONCURRENT_EXTRACTIONS`	Parallel extraction limit (default: 16)

Slow local LLM tuning (Ollama/LM Studio):

export EDGEQUAKE_CHUNK_TIMEOUT_SECS=600       # 10 min per chunk
export EDGEQUAKE_MAX_CONCURRENT_EXTRACTIONS=4  # reduce parallelism
export EDGEQUAKE_LLM_TIMEOUT_SECS=3600        # 1 hour HTTP safety timeout

REAL Code Examples from the Repository

Example 1: First Document Upload via REST API

EdgeQuake's document ingestion pipeline transforms files into knowledge graphs automatically. Here's the exact API call from the README:

# Upload a file (PDF, TXT, MD, etc.)
curl -X POST http://localhost:8080/api/v1/documents/upload \
  -F "file=@your-document.pdf"

Expected response:

{
  "id": "doc-123",
  "status": "completed",
  "chunk_count": 15,
  "entity_count": 12,
  "relationship_count": 8,
  "processing_time_ms": 2500
}

What's happening under the hood? The edgequake-pipeline crate splits your document into ~1200-token chunks with 100-token overlap, then calls the configured LLM to extract (entity, type, description) and (source, target, keywords, description) tuples. The gleaning step (optional second pass) catches an additional 15-25% of entities that single-pass extraction misses. Entities are deduplicated via case normalization and description merging (36-40% duplicate reduction), then stored in PostgreSQL AGE as a property graph with pgvector embeddings for the chunks.

Example 2: First Query with Hybrid Mode

This is where EdgeQuake's intelligence shines — combining vector similarity with graph traversal:

# Query the knowledge graph
curl -X POST http://localhost:8080/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the main concepts?",
    "mode": "hybrid"
  }'

Response structure:

{
  "answer": "The main concepts are: knowledge graphs, entity extraction, and hybrid retrieval...",
  "sources": [
    { "chunk_id": "chunk-1", "similarity": 0.92 },
    { "chunk_id": "chunk-5", "similarity": 0.87 }
  ],
  "entities": ["KNOWLEDGE_GRAPH", "ENTITY_EXTRACTION"],
  "relationships": [
    {
      "source": "KNOWLEDGE_GRAPH",
      "target": "ENTITY_EXTRACTION",
      "type": "ENABLES"
    }
  ]
}

The Hybrid mode algorithm: First, vector search finds semantically similar chunks and entities. Then, for the top-k entities, EdgeQuake traverses their local graph neighborhood (1-2 hops) to capture relationships. Simultaneously, Louvain community detection identifies thematic clusters for global context. The LLM receives: (1) relevant chunk texts, (2) local subgraph context, (3) community summaries, and (4) relationship metadata — enabling genuine multi-hop reasoning.

Example 3: Production Docker Deployment with Prebuilt Images

For teams needing full-stack deployment without build tools:

cd edgequake/docker
cp .env.example .env        # configure EDGEQUAKE_LLM_PROVIDER and API key

# Start API + frontend + PostgreSQL from GHCR images
docker compose -f docker-compose.prebuilt.yml up -d

Services started:

Service	Port	Image
`edgequake` API	8080	`ghcr.io/raphaelmansuy/edgequake:latest`
`frontend`	3000	`ghcr.io/raphaelmansuy/edgequake-frontend:latest`
`postgres`	5432	`ghcr.io/raphaelmansuy/edgequake-postgres:latest`

Pin to specific version:

EDGEQUAKE_VERSION=0.10.8 docker compose -f docker-compose.prebuilt.yml up -d

Health verification:

curl http://localhost:8080/health

Why this matters: The prebuilt images support linux/amd64 and linux/arm64 natively — no QEMU emulation. This means Apple Silicon Macs, x86 servers, and AWS Graviton instances all run identical containers. The embedded pdfium (via pdfium-auto) eliminates external shared library dependencies, making this truly zero-config for PDF processing.

Example 4: API-Only Deployment (Bring Your Own PostgreSQL)

For teams with existing database infrastructure:

# One-liner deployment
docker run -d \
  --name edgequake \
  -p 8080:8080 \
  -e DATABASE_URL="postgres://user:password@your-db-host:5432/edgequake" \
  -e EDGEQUAKE_LLM_PROVIDER=openai \
  -e OPENAI_API_KEY="sk-..." \
  ghcr.io/raphaelmansuy/edgequake:latest

# Verify deployment
curl http://localhost:8080/health

Requirements for your PostgreSQL: PostgreSQL 15+ with pgvector and apache_age extensions installed. The edgequake-storage crate uses SQLx for type-safe, compile-time checked queries against this backend.

Example 5: Knowledge Injection API (v0.8.0+)

Programmatically enrich the graph with domain expertise:

# Upload a glossary file for workspace expansion
# POST /api/v1/workspaces/:id/injection/upload
# Content-Type: multipart/form-data
# Body: file=@manufacturing-glossary.txt

Glossary file format (plain text or markdown):

OEE: Overall Equipment Effectiveness, calculated as Availability × Performance × Quality
NLP: Natural Language Processing; synonym: computational linguistics, text analytics
ML: Machine Learning; synonym: statistical learning, predictive modeling

Processing flow: The injection system parses definitions, creates invisible knowledge graph nodes (never shown as citations), and automatically expands future queries using injected synonyms. Status polling tracks processing → completed or failed with entity count metrics.

Advanced Usage & Best Practices

Query Mode Selection Strategy

Don't default to Hybrid for everything. Match mode to question type:

Naive: User knows exact terminology, needs fast lookup
Local: "Who reported to whom in Q3?" — specific entity relationships
Global: "What are our main strategic risks?" — thematic synthesis
Hybrid: Unsure of scope, need comprehensive coverage
Mix: Tuning trade-off between speed and completeness with naive_weight parameter
Bypass: General knowledge questions, no document grounding needed

Performance Tuning at Scale

The SQL pre-filtering feature is your secret weapon. Always include metadata filters in queries:

{
  "query": "...",
  "mode": "hybrid",
  "filters": {
    "workspace_id": "ws-abc123",
    "document_type": "contract",
    "date_range": {"from": "2024-01-01", "to": "2024-12-31"}
  }
}

This pushes WHERE workspace_id = 'ws-abc123' AND document_type = 'contract' to PostgreSQL before vector search, leveraging GIN + B-tree indexes for up to 90% fewer wasted vector scans.

PDF Processing Strategy Matrix

Document Type	Recommended Mode	Rationale
Standard text PDFs	Text mode (default)	Fastest, zero-config with embedded pdfium
Scanned documents	Vision mode	OCR-free, LLM reads page images directly
Complex tables	Vision mode	Table reconstruction beats text parser mangling
Multi-column layouts	Vision mode	LLM understands reading order
Mixed content	Auto-fallback	Vision failure → text extraction (BR1010)

Enable vision mode per-request: X-Use-Vision: true header, or set use_vision_llm = true in config.

Security Hardening

Fail-closed workspace isolation: Invalid workspace selectors are rejected, not silently remapped to defaults
Runtime auth hardening: Prebuilt WebUI images consume runtime API/auth config; protected routes fail closed when auth enabled
Multi-tenant query/delete flows: Workspace-scoped operations prevent cross-tenant data leakage
Audit logging: edgequake-audit crate tracks all destructive operations

Comparison with Alternatives

Feature	EdgeQuake	LightRAG (Python)	Microsoft GraphRAG	Traditional RAG
Language	Rust	Python	Python	Any
Performance	10x concurrent users	Baseline	Slower indexing	Varies
Query Latency (hybrid)	< 200ms	~1000ms	~2000ms	~1000ms
Memory per Document	2MB	~8MB	~10MB	~8MB
PDF Vision Processing	✅ Native (GPT-4o, Claude, Gemini)	❌	❌	❌
Production API	✅ OpenAPI 3.0 + SSE	❌	❌	Varies
React Frontend	✅ React 19 + Sigma.js	❌	❌	Varies
Knowledge Injection	✅ Domain glossaries	❌	❌	❌
Custom Entity Types	✅ 50 per workspace	❌ Limited	❌	❌
MCP Agent Integration	✅	❌	❌	❌
Multi-tenant Isolation	✅ Fail-closed	❌	❌	Varies
SQL Pre-filtering	✅ GIN + B-tree	❌	❌	❌
Embedded pdfium	✅ Zero-config	❌	❌	❌
Docker Multi-arch	✅ amd64 + arm64 native	❌	❌	Varies

Bottom line: LightRAG proved the algorithm. EdgeQuake productionizes it — with Rust performance, native multimodal PDF processing, enterprise security, and modern developer experience.

FAQ

What LLM providers does EdgeQuake support?

OpenAI, Anthropic, Mistral, MiniMax, Google Gemini, Azure OpenAI, Vertex AI, xAI, Ollama, and LM Studio. Auto-detection via environment variables. Mistral La Plateforme is a first-class citizen as of v0.11.0 with chat (mistral-small-latest), vision PDF ingestion (pixtral-large-latest), and embeddings (mistral-embed, 1024 dimensions).

Can I use EdgeQuake without Docker?

Yes — make dev starts PostgreSQL via Docker but runs the Rust backend and Node frontend natively. For fully container-free operation, install PostgreSQL 15+ with pgvector and apache_age locally, then cargo run the backend.

How does EdgeQuake handle large PDFs?

Adaptive DPI scaling, concurrency limits, and early byte release prevent memory spikes. For very large files on slow local LLMs, increase EDGEQUAKE_CHUNK_TIMEOUT_SECS and reduce EDGEQUAKE_MAX_CONCURRENT_EXTRACTIONS.

Is there a hosted/cloud version?

Currently self-hosted only. The Docker deployment options (especially prebuilt images) minimize operational overhead. Cloud offerings may follow based on community demand.

How do I migrate from LightRAG Python?

See the dedicated migration guide. Core concepts align directly; EdgeQuake extends the algorithm with production features. Re-indexing documents is required due to different storage schemas.

What databases are required?

PostgreSQL 15+ with pgvector (vector storage) and apache_age (property graph) extensions. The full-stack Docker compose includes a pre-configured PostgreSQL image. In-memory storage is available for testing only (make dev-memory).

Can I contribute to EdgeQuake?

The project uses Specification-Driven Development with the edgecode coding agent. For now, contributions flow through Raphaël MANSUY directly via GitHub Issues and Discussions. See CONTRIBUTING.md for guidelines.

Conclusion

Traditional RAG is intentionally simple — and that's exactly why it fails on the questions that matter. EdgeQuake doesn't just incrementally improve retrieval; it fundamentally restructures how documents become knowledge. By implementing the LightRAG algorithm in Rust's zero-cost abstraction layer, adding production-hardened PDF vision processing, and wrapping everything in an enterprise API with React frontend, Raphaël MANSUY has created something rare: an academic insight engineered into genuine production utility.

The performance numbers aren't marginal gains — they're order-of-magnitude transformations. 5x faster hybrid queries. 10x concurrent users. 4x memory efficiency. And with features like Knowledge Injection, Custom Entity Configuration, and MCP agent integration, EdgeQuake keeps getting smarter about your specific domain.

But here's what haunts me: every day you spend with chunk-and-pray RAG is a day your system confidently returns semantically similar but structurally wrong answers. The knowledge graph isn't a nice-to-have optimization. It's the minimal viable intelligence for multi-hop reasoning.

Stop wasting tokens on dumb retrieval. Star EdgeQuake on GitHub, run curl -fsSL https://raw.githubusercontent.com/raphaelmansuy/edgequake/edgequake-main/quickstart.sh | sh, and watch your documents become a reasoning engine. The graph is waiting. Your questions are getting harder. Match them with intelligence that scales.

→ Get Started with EdgeQuake Now