Stop Wasting Tokens on Dumb RAG! EdgeQuake's Graph Reasoning Exposed
Your vector database is lying to you. Yes, that shiny Pinecone or Weaviate instance you've been feeding chunks into? It's giving you semantic similarity theater — matching keywords without understanding a single relationship. Ask it "How does Tesla's battery supply chain impact European EV regulations through Chinese lithium mining?" and watch it crumble. Traditional RAG systems retrieve document chunks using vector similarity alone. This works for simple lookups but catastrophically fails on multi-hop reasoning — the kind of "How does X relate to Y through Z?" questions that actually matter in production.
The dirty secret? Vectors capture semantic similarity but annihilate structural relationships between concepts. Your chunks are floating in isolation, divorced from the very connections that give them meaning. Thematic questions? Relationship queries? Forget it. You're essentially running a glorified search engine with amnesia.
But what if your documents became a living knowledge graph — entities, relationships, communities — all queryable at Rust-powered speed? Enter EdgeQuake, the high-performance GraphRAG framework that transforms passive document storage into active intelligence networks. Inspired by the groundbreaking LightRAG algorithm and forged in Rust's zero-cost abstractions, EdgeQuake doesn't just retrieve information. It reasons across it. Ready to see what you've been missing?
What is EdgeQuake?
EdgeQuake is a high-performance Graph-RAG (Retrieval-Augmented Generation) framework written in Rust, created by Raphaël MANSUY — a Hong Kong-based developer building the future of intelligent document retrieval. Born from the academic insights of the LightRAG paper (Guo et al., 2024), EdgeQuake takes the theoretical promise of knowledge graph-enhanced retrieval and engineers it into a production-ready, blazing-fast system.
The core philosophy is radical in its simplicity: don't just chunk and embed documents — decompose them into structured knowledge. During ingestion, Large Language Models extract entities (people, organizations, technologies, concepts) and map their relationships. This graph structure is stored alongside vector embeddings in PostgreSQL with Apache AGE and pgvector extensions. At query time, EdgeQuake traverses both the vector space and the graph topology, combining the speed of similarity search with the reasoning power of graph traversal.
Why Rust? Because Python's Global Interpreter Lock is a performance death sentence for concurrent document processing. EdgeQuake's Tokio-based async runtime handles thousands of concurrent requests with zero-copy operations and memory safety guarantees that eliminate entire classes of production bugs. The result: 5x faster hybrid queries, 10x more concurrent users, and 4x lower memory per document compared to traditional RAG stacks.
Currently at v0.11.3, EdgeQuake has rapidly evolved from experimental prototype to enterprise-ready platform — adding Mistral La Plateforme as a first-class citizen, production-hardened PDF vision processing, knowledge injection for domain glossaries, and MCP (Model Context Protocol) integration for AI agent interoperability. It's not just trending on GitHub; it's trending because it works.
Key Features That Destroy Traditional RAG
🚀 Rust-Powered Performance Architecture
EdgeQuake's technical foundation is deliberately engineered for speed and safety:
- Async-First Tokio Runtime: Every I/O operation — LLM calls, database queries, file processing — is non-blocking. Thousands of concurrent document ingestions don't create thread explosion.
- Zero-Copy Memory Management: Rust's ownership model eliminates unnecessary data cloning. Documents flow through the pipeline with minimal allocation overhead.
- Parallel Entity Extraction: Multi-threaded LLM calls for entity and relationship extraction across document chunks, saturating available compute.
- SQL Pre-Filtering with GIN + B-Tree Indexes: Metadata filters (tenant, workspace, document) are pushed to PostgreSQL WHERE clauses before vector search — reducing wasted vector scans by up to 90% at scale.
💉 Knowledge Injection (v0.8.0+)
Domain expertise shouldn't require retraining. EdgeQuake's Knowledge Injection system lets you:
- Inject acronym definitions and synonym mappings that automatically expand query terms
- Create invisible citations — enrichment entries that improve graph quality without cluttering source attribution
- Upload
.txtor.mdglossary files via full CRUD API with background processing and status polling - See real-time entity counts in a dedicated
/knowledgeUI
🏷️ Custom Entity Configuration (v0.9.0+)
One-size-fits-all entity types are a recipe for generic extraction. EdgeQuake offers:
- 6 domain presets: General, Manufacturing, Healthcare, Legal, Research, Finance
- Up to 50 custom entity types per workspace — define
BEARING_TYPE,VIBRATION_ANOMALY, or anyUPPERCASE_UNDERSCOREDdomain concept - Auto-normalization with live UI selector and backward compatibility for existing workspaces
📄 Production-Ready PDF Processing (v0.4.0+)
PDFs are where RAG systems go to die. EdgeQuake ships embedded pdfium (zero external config) with dual-mode extraction:
- Text Mode: Fast pdfium-based extraction for standard PDFs
- Vision Mode: GPT-4o, Claude 3.5+, or Gemini 2.5 reads each page as an image — handling scanned documents, complex tables, and multi-column layouts
- Automatic Fallback: Vision failures gracefully degrade to text extraction (error code BR1010)
- Safe Large-PDF Guardrails: Adaptive DPI/concurrency limits prevent memory spikes
🔍 Six Query Modes for Every Question Type
| Mode | Latency | Best For |
|---|---|---|
| Naive | ~100-300ms | Simple keyword-like lookups |
| Local | ~200-500ms | Specific entity relationships |
| Global | ~300-800ms | Thematic/high-level questions |
| Hybrid (default) | ~400-1000ms | Balanced, comprehensive results |
| Mix | Variable | Weighted blend of vector + graph |
| Bypass | Fastest | Direct LLM without retrieval |
🌐 Enterprise API & Frontend
- OpenAPI 3.0 REST API with SSE streaming for real-time token generation
- Kubernetes-ready health checks (
/health,/ready,/live) - Fail-closed multi-tenant workspace isolation — invalid workspace selectors are rejected, not silently remapped
- React 19 frontend with interactive Sigma.js graph visualizations
- MCP (Model Context Protocol) integration — expose EdgeQuake capabilities to Claude, Cursor, and other AI agents
Real-World Use Cases Where EdgeQuake Dominates
1. Multi-Hop Legal Discovery
Law firms need to trace "How did Contract A's force majeure clause influence Settlement B's negotiation through Precedent C's interpretation?" Traditional RAG retrieves chunks mentioning each term separately. EdgeQuake's Local + Hybrid modes traverse the entity graph: CONTRACT_A → CONTAINS_CLAUSE → FORCE_MAJEURE → INFLUENCED → SETTLEMENT_B, with PRECEDENT_C as community context. The difference? Actual reasoning versus keyword coincidence.
2. Manufacturing Root Cause Analysis
A factory line fails. The question isn't "What mentions bearing failure?" — it's "Which supplier batch, maintenance schedule deviation, and operator training gap combined to cause this vibration anomaly?" With custom entity types like BEARING_TYPE, VIBRATION_ANOMALY, SUPPLIER_BATCH, EdgeQuake's graph reveals causal chains that vector similarity cannot reconstruct.
3. Pharmaceutical Research Synthesis
Researchers ask: "What drug interactions between Compound X and biological pathway Y have been observed in populations with genetic marker Z?" EdgeQuake's Global mode with community detection clusters related research, while Hybrid mode grounds specific mechanism claims in source documents. Knowledge Injection ensures domain acronyms (CYP450, IC50) are properly expanded.
4. Financial Compliance Monitoring
Regulatory queries demand precision: "Show me all transactions where Entity A indirectly benefited from Entity B through shell companies established after Sanction C was imposed." EdgeQuake's graph traversal with SQL pre-filtering (date ranges, jurisdiction metadata) eliminates false positives before vector search begins.
5. Technical Documentation Intelligence
Developer platforms need answers like "Which API version introduced the deprecation that broke the integration pattern used by our largest enterprise customer?" With MCP integration, AI agents can programmatically explore the knowledge graph, upload new documentation, and trace deprecation impact chains.
Step-by-Step Installation & Setup Guide
⚡ Option 1: One-Command Docker Deploy (~30 seconds)
Zero prerequisites. No Rust, no Node.js, no build tools.
# Download and run the interactive setup wizard
curl -fsSL https://raw.githubusercontent.com/raphaelmansuy/edgequake/edgequake-main/quickstart.sh | sh
The wizard handles everything:
- Provider selection — OpenAI or Ollama (explicitly chosen, never guessed)
- Model selection — curated menu with pricing visibility
- API key validation — live check before starting
- Stack startup — pulls images, starts services, health polling for 90 seconds
- Re-run intelligence — detects existing containers, offers "Update & Reconfigure" or safe "Fresh Start"
Alternative direct compose methods:
# Pipe directly to docker compose
curl -fsSL https://raw.githubusercontent.com/raphaelmansuy/edgequake/edgequake-main/docker-compose.quickstart.yml \
| docker compose -f - up -d
# Or download first, then start
curl -fsSL https://raw.githubusercontent.com/raphaelmansuy/edgequake/edgequake-main/docker-compose.quickstart.yml \
-o docker-compose.quickstart.yml
docker compose -f docker-compose.quickstart.yml up -d
Access points:
| Service | URL |
|---|---|
| Web UI | http://localhost:3000 |
| API | http://localhost:8080 |
| Swagger | http://localhost:8080/swagger-ui |
| Health | http://localhost:8080/health |
Headless / CI deployment (no interactive terminal):
# OpenAI provider
EDGEQUAKE_LLM_PROVIDER=openai \
OPENAI_API_KEY=sk-... \
docker compose -f docker-compose.quickstart.yml up -d
# Mistral La Plateforme (v0.11.0+)
MISTRAL_API_KEY=... \
docker compose -f docker-compose.quickstart.yml up -d
Management commands:
docker compose -f docker-compose.quickstart.yml logs -f # tail logs
docker compose -f docker-compose.quickstart.yml ps # check status
docker compose -f docker-compose.quickstart.yml down # stop
Pin version for reproducibility:
EDGEQUAKE_VERSION=0.10.8 sh quickstart.sh
🛠️ Option 2: Full Development Setup (5 minutes)
Prerequisites:
- Rust 1.95+ (rustup.rs)
- Node.js 18+ or Bun 1.0+ (nodejs.org)
- Docker (docker.com)
- Ollama (optional, for local LLMs — ollama.ai)
# 1. Clone the repository
git clone https://github.com/raphaelmansuy/edgequake.git
cd edgequake
# 2. Install all dependencies (Rust crates + Node packages)
make install
# 3. Configure frontend environment
cp edgequake_webui/.env.local.example edgequake_webui/.env.local
# 4. Start full stack (no authentication, for local development)
make dev
# Optional: start with authentication enabled
make dev-auth
Services available:
- Backend: http://localhost:8080
- Frontend: http://localhost:3000 (auto-selects next free port if busy)
- Swagger UI: http://localhost:8080/swagger-ui
Environment Configuration for Production
Create edgequake/docker/.env from .env.example:
| Variable | Purpose |
|---|---|
DATABASE_URL |
PostgreSQL connection (required for API-only deploy) |
EDGEQUAKE_LLM_PROVIDER |
openai, anthropic, gemini, mistral, azure, vertexai, ollama |
OPENAI_API_KEY / ANTHROPIC_API_KEY / MISTRAL_API_KEY / etc. |
Provider credentials |
OLLAMA_HOST |
Default: http://host.docker.internal:11434 |
EDGEQUAKE_CHUNK_TIMEOUT_SECS |
Per-chunk LLM timeout (default: 180s) |
EDGEQUAKE_MAX_CONCURRENT_EXTRACTIONS |
Parallel extraction limit (default: 16) |
Slow local LLM tuning (Ollama/LM Studio):
export EDGEQUAKE_CHUNK_TIMEOUT_SECS=600 # 10 min per chunk
export EDGEQUAKE_MAX_CONCURRENT_EXTRACTIONS=4 # reduce parallelism
export EDGEQUAKE_LLM_TIMEOUT_SECS=3600 # 1 hour HTTP safety timeout
REAL Code Examples from the Repository
Example 1: First Document Upload via REST API
EdgeQuake's document ingestion pipeline transforms files into knowledge graphs automatically. Here's the exact API call from the README:
# Upload a file (PDF, TXT, MD, etc.)
curl -X POST http://localhost:8080/api/v1/documents/upload \
-F "file=@your-document.pdf"
Expected response:
{
"id": "doc-123",
"status": "completed",
"chunk_count": 15,
"entity_count": 12,
"relationship_count": 8,
"processing_time_ms": 2500
}
What's happening under the hood? The edgequake-pipeline crate splits your document into ~1200-token chunks with 100-token overlap, then calls the configured LLM to extract (entity, type, description) and (source, target, keywords, description) tuples. The gleaning step (optional second pass) catches an additional 15-25% of entities that single-pass extraction misses. Entities are deduplicated via case normalization and description merging (36-40% duplicate reduction), then stored in PostgreSQL AGE as a property graph with pgvector embeddings for the chunks.
Example 2: First Query with Hybrid Mode
This is where EdgeQuake's intelligence shines — combining vector similarity with graph traversal:
# Query the knowledge graph
curl -X POST http://localhost:8080/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"query": "What are the main concepts?",
"mode": "hybrid"
}'
Response structure:
{
"answer": "The main concepts are: knowledge graphs, entity extraction, and hybrid retrieval...",
"sources": [
{ "chunk_id": "chunk-1", "similarity": 0.92 },
{ "chunk_id": "chunk-5", "similarity": 0.87 }
],
"entities": ["KNOWLEDGE_GRAPH", "ENTITY_EXTRACTION"],
"relationships": [
{
"source": "KNOWLEDGE_GRAPH",
"target": "ENTITY_EXTRACTION",
"type": "ENABLES"
}
]
}
The Hybrid mode algorithm: First, vector search finds semantically similar chunks and entities. Then, for the top-k entities, EdgeQuake traverses their local graph neighborhood (1-2 hops) to capture relationships. Simultaneously, Louvain community detection identifies thematic clusters for global context. The LLM receives: (1) relevant chunk texts, (2) local subgraph context, (3) community summaries, and (4) relationship metadata — enabling genuine multi-hop reasoning.
Example 3: Production Docker Deployment with Prebuilt Images
For teams needing full-stack deployment without build tools:
cd edgequake/docker
cp .env.example .env # configure EDGEQUAKE_LLM_PROVIDER and API key
# Start API + frontend + PostgreSQL from GHCR images
docker compose -f docker-compose.prebuilt.yml up -d
Services started:
| Service | Port | Image |
|---|---|---|
edgequake API |
8080 | ghcr.io/raphaelmansuy/edgequake:latest |
frontend |
3000 | ghcr.io/raphaelmansuy/edgequake-frontend:latest |
postgres |
5432 | ghcr.io/raphaelmansuy/edgequake-postgres:latest |
Pin to specific version:
EDGEQUAKE_VERSION=0.10.8 docker compose -f docker-compose.prebuilt.yml up -d
Health verification:
curl http://localhost:8080/health
Why this matters: The prebuilt images support linux/amd64 and linux/arm64 natively — no QEMU emulation. This means Apple Silicon Macs, x86 servers, and AWS Graviton instances all run identical containers. The embedded pdfium (via pdfium-auto) eliminates external shared library dependencies, making this truly zero-config for PDF processing.
Example 4: API-Only Deployment (Bring Your Own PostgreSQL)
For teams with existing database infrastructure:
# One-liner deployment
docker run -d \
--name edgequake \
-p 8080:8080 \
-e DATABASE_URL="postgres://user:password@your-db-host:5432/edgequake" \
-e EDGEQUAKE_LLM_PROVIDER=openai \
-e OPENAI_API_KEY="sk-..." \
ghcr.io/raphaelmansuy/edgequake:latest
# Verify deployment
curl http://localhost:8080/health
Requirements for your PostgreSQL: PostgreSQL 15+ with pgvector and apache_age extensions installed. The edgequake-storage crate uses SQLx for type-safe, compile-time checked queries against this backend.
Example 5: Knowledge Injection API (v0.8.0+)
Programmatically enrich the graph with domain expertise:
# Upload a glossary file for workspace expansion
# POST /api/v1/workspaces/:id/injection/upload
# Content-Type: multipart/form-data
# Body: file=@manufacturing-glossary.txt
Glossary file format (plain text or markdown):
OEE: Overall Equipment Effectiveness, calculated as Availability × Performance × Quality
NLP: Natural Language Processing; synonym: computational linguistics, text analytics
ML: Machine Learning; synonym: statistical learning, predictive modeling
Processing flow: The injection system parses definitions, creates invisible knowledge graph nodes (never shown as citations), and automatically expands future queries using injected synonyms. Status polling tracks processing → completed or failed with entity count metrics.
Advanced Usage & Best Practices
Query Mode Selection Strategy
Don't default to Hybrid for everything. Match mode to question type:
- Naive: User knows exact terminology, needs fast lookup
- Local: "Who reported to whom in Q3?" — specific entity relationships
- Global: "What are our main strategic risks?" — thematic synthesis
- Hybrid: Unsure of scope, need comprehensive coverage
- Mix: Tuning trade-off between speed and completeness with
naive_weightparameter - Bypass: General knowledge questions, no document grounding needed
Performance Tuning at Scale
The SQL pre-filtering feature is your secret weapon. Always include metadata filters in queries:
{
"query": "...",
"mode": "hybrid",
"filters": {
"workspace_id": "ws-abc123",
"document_type": "contract",
"date_range": {"from": "2024-01-01", "to": "2024-12-31"}
}
}
This pushes WHERE workspace_id = 'ws-abc123' AND document_type = 'contract' to PostgreSQL before vector search, leveraging GIN + B-tree indexes for up to 90% fewer wasted vector scans.
PDF Processing Strategy Matrix
| Document Type | Recommended Mode | Rationale |
|---|---|---|
| Standard text PDFs | Text mode (default) | Fastest, zero-config with embedded pdfium |
| Scanned documents | Vision mode | OCR-free, LLM reads page images directly |
| Complex tables | Vision mode | Table reconstruction beats text parser mangling |
| Multi-column layouts | Vision mode | LLM understands reading order |
| Mixed content | Auto-fallback | Vision failure → text extraction (BR1010) |
Enable vision mode per-request: X-Use-Vision: true header, or set use_vision_llm = true in config.
Security Hardening
- Fail-closed workspace isolation: Invalid workspace selectors are rejected, not silently remapped to defaults
- Runtime auth hardening: Prebuilt WebUI images consume runtime API/auth config; protected routes fail closed when auth enabled
- Multi-tenant query/delete flows: Workspace-scoped operations prevent cross-tenant data leakage
- Audit logging:
edgequake-auditcrate tracks all destructive operations
Comparison with Alternatives
| Feature | EdgeQuake | LightRAG (Python) | Microsoft GraphRAG | Traditional RAG |
|---|---|---|---|---|
| Language | Rust | Python | Python | Any |
| Performance | 10x concurrent users | Baseline | Slower indexing | Varies |
| Query Latency (hybrid) | < 200ms | ~1000ms | ~2000ms | ~1000ms |
| Memory per Document | 2MB | ~8MB | ~10MB | ~8MB |
| PDF Vision Processing | ✅ Native (GPT-4o, Claude, Gemini) | ❌ | ❌ | ❌ |
| Production API | ✅ OpenAPI 3.0 + SSE | ❌ | ❌ | Varies |
| React Frontend | ✅ React 19 + Sigma.js | ❌ | ❌ | Varies |
| Knowledge Injection | ✅ Domain glossaries | ❌ | ❌ | ❌ |
| Custom Entity Types | ✅ 50 per workspace | ❌ Limited | ❌ | ❌ |
| MCP Agent Integration | ✅ | ❌ | ❌ | ❌ |
| Multi-tenant Isolation | ✅ Fail-closed | ❌ | ❌ | Varies |
| SQL Pre-filtering | ✅ GIN + B-tree | ❌ | ❌ | ❌ |
| Embedded pdfium | ✅ Zero-config | ❌ | ❌ | ❌ |
| Docker Multi-arch | ✅ amd64 + arm64 native | ❌ | ❌ | Varies |
Bottom line: LightRAG proved the algorithm. EdgeQuake productionizes it — with Rust performance, native multimodal PDF processing, enterprise security, and modern developer experience.
FAQ
What LLM providers does EdgeQuake support?
OpenAI, Anthropic, Mistral, MiniMax, Google Gemini, Azure OpenAI, Vertex AI, xAI, Ollama, and LM Studio. Auto-detection via environment variables. Mistral La Plateforme is a first-class citizen as of v0.11.0 with chat (mistral-small-latest), vision PDF ingestion (pixtral-large-latest), and embeddings (mistral-embed, 1024 dimensions).
Can I use EdgeQuake without Docker?
Yes — make dev starts PostgreSQL via Docker but runs the Rust backend and Node frontend natively. For fully container-free operation, install PostgreSQL 15+ with pgvector and apache_age locally, then cargo run the backend.
How does EdgeQuake handle large PDFs?
Adaptive DPI scaling, concurrency limits, and early byte release prevent memory spikes. For very large files on slow local LLMs, increase EDGEQUAKE_CHUNK_TIMEOUT_SECS and reduce EDGEQUAKE_MAX_CONCURRENT_EXTRACTIONS.
Is there a hosted/cloud version?
Currently self-hosted only. The Docker deployment options (especially prebuilt images) minimize operational overhead. Cloud offerings may follow based on community demand.
How do I migrate from LightRAG Python?
See the dedicated migration guide. Core concepts align directly; EdgeQuake extends the algorithm with production features. Re-indexing documents is required due to different storage schemas.
What databases are required?
PostgreSQL 15+ with pgvector (vector storage) and apache_age (property graph) extensions. The full-stack Docker compose includes a pre-configured PostgreSQL image. In-memory storage is available for testing only (make dev-memory).
Can I contribute to EdgeQuake?
The project uses Specification-Driven Development with the edgecode coding agent. For now, contributions flow through Raphaël MANSUY directly via GitHub Issues and Discussions. See CONTRIBUTING.md for guidelines.
Conclusion
Traditional RAG is intentionally simple — and that's exactly why it fails on the questions that matter. EdgeQuake doesn't just incrementally improve retrieval; it fundamentally restructures how documents become knowledge. By implementing the LightRAG algorithm in Rust's zero-cost abstraction layer, adding production-hardened PDF vision processing, and wrapping everything in an enterprise API with React frontend, Raphaël MANSUY has created something rare: an academic insight engineered into genuine production utility.
The performance numbers aren't marginal gains — they're order-of-magnitude transformations. 5x faster hybrid queries. 10x concurrent users. 4x memory efficiency. And with features like Knowledge Injection, Custom Entity Configuration, and MCP agent integration, EdgeQuake keeps getting smarter about your specific domain.
But here's what haunts me: every day you spend with chunk-and-pray RAG is a day your system confidently returns semantically similar but structurally wrong answers. The knowledge graph isn't a nice-to-have optimization. It's the minimal viable intelligence for multi-hop reasoning.
Stop wasting tokens on dumb retrieval. Star EdgeQuake on GitHub, run curl -fsSL https://raw.githubusercontent.com/raphaelmansuy/edgequake/edgequake-main/quickstart.sh | sh, and watch your documents become a reasoning engine. The graph is waiting. Your questions are getting harder. Match them with intelligence that scales.