VoidLLM: The Privacy-First LLM Proxy Teams Are Secretly Switching To
What if every prompt your team sent to ChatGPT was being logged, analyzed, and potentially leaked by the proxy sitting in between? Here's the uncomfortable truth: most LLM proxies and AI gateways on the market today store your prompts by default. They call it "debugging." They call it "analytics." What they don't call it is what it actually is—a massive privacy liability waiting to explode.
If you're a CTO, platform engineer, or security-conscious developer, you've felt this tension. Your team needs OpenAI's smarts, Anthropic's safety, Azure's enterprise guarantees, and Ollama's cost efficiency. But stitching them together means either sharing raw API keys in Slack (disaster) or trusting some SaaS middleman with your most sensitive data (bigger disaster). Every healthcare startup, every legal tech platform, every fintech handling customer data— they're all one proxy log away from a headline they can't recover from.
Enter VoidLLM.
This isn't another "we take privacy seriously" footnote in a Terms of Service. VoidLLM is architected as a zero-knowledge LLM proxy from the ground up. No prompt storage. No response logging. No "enable_content_logging" toggle that shouldn't exist in the first place. Just pure, self-hosted infrastructure that routes, balances, governs, and protects—while remaining utterly blind to what your users are actually asking. One Go binary. Sub-2ms overhead. Total control. And it's why engineering teams are quietly abandoning centralized AI gateways for this open-source alternative.
What Is VoidLLM?
VoidLLM is a self-hosted LLM proxy and AI gateway built by VoidMind that sits between your applications and any LLM provider—OpenAI, Anthropic, Azure, Ollama, vLLM, or custom endpoints. Think of it as the intelligent traffic controller for your organization's AI consumption, but one that fundamentally cannot read your mail.
Born from the frustration of teams sharing API keys like office WiFi passwords, VoidLLM delivers organization-wide access control, virtual API key management, granular usage tracking, rate limiting, and multi-deployment load balancing—all without ever persisting prompt or response content. The architecture is deliberate: content passes through memory only, while metadata (who, what model, how many tokens, how long) gets tracked for governance.
The project is written in Go (1.23+), ships as a single static binary, and offers a modern React-based web UI for administration. It's gaining serious traction because it solves the trilemma that has plagued AI infrastructure: How do you get enterprise governance without enterprise bloat, and privacy without sacrificing observability?
VoidLLM answers with a clean separation: governance through metadata, privacy through architecture. Not through promises. Through code that makes content logging physically impossible.
Key Features That Separate VoidLLM from the Herd
Zero-Knowledge Proxy Architecture
This isn't marketing fluff. VoidLLM's core design eliminates the very possibility of prompt logging:
- No request body in logs, database, or persistent storage
- No response body anywhere durable
- No prompt caching to disk—content lives in memory only during transit
- No
enable_content_loggingoption exists in the codebase
The only data that persists: who made the request (key/team/org), which model, token counts, cost estimates, and duration. This makes GDPR compliance dramatically simpler—no personal data from prompts enters your database.
OpenAI-Compatible Universal Routing
Drop-in replacement for OpenAI's API. Change your base_url, keep your SDK. Supports /v1/chat/completions, embeddings, images, audio, and streaming. Your existing Python, JavaScript, or Go code works unchanged.
Multi-Provider Load Balancing with Failover
Configure multiple deployments per model with round-robin, least-latency, weighted, or priority routing. Automatic retry on 5xx/timeout, circuit breakers, and health-aware routing mean provider outages become invisible to your applications.
Granular RBAC and Virtual Keys
Org > Team > User > Key hierarchy with four distinct roles. Instead of sharing raw OpenAI keys, create scoped virtual keys (vl_uk_...) with specific permissions, rate limits, and budgets. Revoke instantly without rotating provider credentials.
MCP Gateway and Code Mode
VoidLLM functions as a Model Context Protocol (MCP) gateway, proxying external MCP servers with access control and automatic session management. The standout Code Mode lets LLMs write JavaScript that orchestrates multiple tool calls in a single execution—running in a WASM-sandboxed QuickJS runtime with no filesystem, network, or host access. Result: 30-80% token reduction versus one-tool-per-turn patterns.
Enterprise-Grade Observability
Prometheus metrics for latency, tokens, active streams, routing decisions, and health. Optional OpenTelemetry OTLP/gRPC export with request ID correlation. SQLite default, PostgreSQL for production. Helm charts for Kubernetes. Graceful shutdown handling.
Real-World Use Cases Where VoidLLM Dominates
1. Healthcare AI: HIPAA-Compliant LLM Access
Medical startups using LLMs for clinical documentation face a nightmare: patient data in prompts can't touch third-party logs. VoidLLM's zero-knowledge architecture means PHI never enters persistent storage. Self-host on your VPC, route to Azure OpenAI with private endpoints, and maintain full audit trails of who accessed which model when—without ever recording what was asked.
2. Multi-Team SaaS Platform: Cost Control Chaos
Your engineering team burns through GPT-4 tokens for code review. Marketing experiments with Claude for copy. Support tests fine-tuned models. Without governance, budgets collide. VoidLLM assigns per-team token budgets and rate limits with automatic enforcement. Marketing hits their daily cap? They get a graceful error, not a $50K surprise invoice.
3. Financial Services: Zero-Trust AI Infrastructure
Banks and fintechs can't risk prompt content leaking through a managed proxy's logging infrastructure. VoidLLM deploys entirely on-premise or in your cloud account, with SSO/OIDC integration (Enterprise tier). Every request authenticated, every action audited, no content ever retained. The proxy becomes a compliance asset, not a liability.
4. AI-Native Development: Seamless Provider Migration
You built on OpenAI. GPT-5 gets delayed, prices spike, or you need Claude's longer context. Normally: refactor every openai.ChatCompletion call. With VoidLLM: update your model alias configuration. Clients still call model: "default" or model: "smart"—you control where it routes. A/B test providers, implement fallbacks, optimize costs without touching application code.
5. MCP-Powered IDE Workflows
Connect Claude Code, Cursor, or Windsurf to VoidLLM's MCP endpoint. Your IDE gains access to managed tool orchestration with usage tracking and governance. Code Mode's multi-tool execution slashes token costs while the WASM sandbox prevents rogue tool calls from accessing your system.
Step-by-Step Installation & Setup Guide
Prerequisites
- Docker (recommended) or Go 1.23+ with Node 20+ for source builds
- OpenSSL for key generation
- 512MB RAM minimum, 2GB recommended for production
Docker Deployment (Fastest Path)
# Generate cryptographically secure keys
export VOIDLLM_ADMIN_KEY=$(openssl rand -base64 32)
export VOIDLLM_ENCRYPTION_KEY=$(openssl rand -base64 32)
# Create configuration from example
cp voidllm.yaml.example voidllm.yaml
# Start the LLM proxy
docker run -p 8080:8080 \
-e VOIDLLM_ADMIN_KEY -e VOIDLLM_ENCRYPTION_KEY \
-v $(pwd)/voidllm.yaml:/etc/voidllm/voidllm.yaml:ro \
-v voidllm_data:/data \
ghcr.io/voidmind-io/voidllm:latest
Critical: On first boot, VoidLLM prints bootstrap credentials to stdout exactly once:
========================================
BOOTSTRAP COMPLETE - COPY THESE NOW
========================================
API Key: vl_uk_a3f2...
Email: admin@voidllm.local
Password: <random>
========================================
Save these immediately. Open http://localhost:8080, authenticate, and begin configuration.
Binary Installation (No Docker)
# Linux amd64 — adjust for your platform
curl -sL https://github.com/voidmind-io/voidllm/releases/latest/download/voidllm-linux-amd64.tar.gz | tar xz
export VOIDLLM_ADMIN_KEY=$(openssl rand -base64 32)
export VOIDLLM_ENCRYPTION_KEY=$(openssl rand -base64 32)
./voidllm
Platforms available: Linux (amd64, arm64), Windows (amd64, arm64), macOS (amd64, arm64).
Docker Compose (Development)
cp voidllm.yaml.example voidllm.yaml
export VOIDLLM_ADMIN_KEY=$(openssl rand -base64 32)
export VOIDLLM_ENCRYPTION_KEY=$(openssl rand -base64 32)
docker-compose up
Kubernetes (Production)
helm install voidllm chart/voidllm/ \
--set secrets.adminKey=$(openssl rand -base64 32) \
--set secrets.encryptionKey=$(openssl rand -base64 32) \
--set config.models[0].name=my-model \
--set config.models[0].provider=ollama \
--set config.models[0].base_url=http://ollama:11434/v1
Optional subcharts: PostgreSQL for production database, Redis for distributed caching (multi-pod Code Mode support coming).
From Source (Contributors/Custom Builds)
# Prerequisites verified: Go 1.23+, Node 20+
cd ui && npm ci && npm run build && cd ..
go run ./cmd/voidllm --config voidllm.yaml
First API Call Test
curl http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer vl_uk_your_key_here" \
-H "Content-Type: application/json" \
-d '{"model":"default","messages":[{"role":"user","content":"Verify VoidLLM is working"}]}'
Any OpenAI-compatible SDK works—just change base_url to your VoidLLM instance.
REAL Code Examples from the Repository
Example 1: Production Configuration with Multi-Provider Failover
This YAML configuration from VoidLLM's documentation demonstrates enterprise-grade routing with environment variable interpolation for secrets:
server:
proxy:
port: 8080
models:
# Single endpoint for local development
- name: dolphin-mistral
provider: ollama
base_url: http://localhost:11434/v1
timeout: 30s
aliases: [default] # Clients call "default", routes here
pricing:
input_per_1m: 0.15 # $0.15 per million input tokens
output_per_1m: 0.60 # Enables cost tracking and budgets
# Load-balanced with automatic failover across cloud providers
- name: gpt-4o
strategy: round-robin # Also supports: least-latency, weighted, priority
aliases: [smart] # Application code uses "smart", not provider names
deployments:
- name: azure-east
provider: azure
base_url: https://eastus.openai.azure.com
api_key: ${AZURE_EAST_KEY} # Never hardcode secrets
azure_deployment: gpt-4o
priority: 1 # Primary: tried first
- name: openai-fallback
provider: openai
base_url: https://api.openai.com/v1
api_key: ${OPENAI_KEY}
priority: 2 # Fallback: used if priority 1 fails health checks
# MCP server integration for tool-augmented workflows
mcp_servers:
- name: AWS Knowledge
alias: aws
url: https://knowledge-mcp.global.api.aws
auth_type: none
settings:
admin_key: ${VOIDLLM_ADMIN_KEY}
encryption_key: ${VOIDLLM_ENCRYPTION_KEY}
mcp:
code_mode:
enabled: true # Enable WASM-sandboxed multi-tool execution
Key insight: The aliases abstraction is VoidLLM's secret weapon for operational agility. Your application code references model: "default" or model: "smart". Behind the scenes, you can migrate from Ollama to Azure, add failover deployments, or A/B test providers—zero application changes required.
The ${VAR} syntax for environment variable interpolation ensures secrets never enter version control. The pricing block enables real-time cost estimation and budget enforcement at the proxy level, before requests reach expensive endpoints.
Example 2: Code Mode WASM Sandbox Configuration
VoidLLM's Code Mode configuration controls the JavaScript execution environment for multi-tool MCP orchestration:
mcp:
code_mode:
enabled: true
pool_size: 8 # Concurrent WASM QuickJS runtimes for parallel execution
memory_limit_mb: 16 # Per-execution memory cap prevents resource exhaustion
timeout: 30s # Hard timeout kills runaway LLM-generated code
max_tool_calls: 50 # Upper bound on tool invocations per execution
Why this matters: Traditional MCP workflows require one LLM turn per tool call—massively inefficient for complex research or data analysis tasks. Code Mode lets the LLM generate a single JavaScript program that sequences multiple tool calls, processes intermediate results, and returns a synthesized answer.
The security model is critical: QuickJS in WASM provides a deterministic, sandboxed environment with no filesystem access, no network access, and no host system access. Even if an LLM generates malicious code, the blast radius is contained. The pool_size controls concurrency for throughput, while memory_limit_mb and timeout prevent denial-of-service from pathological generated code.
Example 3: MCP IDE Integration
Connect your development environment to VoidLLM's managed MCP infrastructure:
{
"mcpServers": {
"voidllm": {
"type": "http",
"url": "http://your-voidllm-instance:8080/api/v1/mcp",
"headers": {
"Authorization": "Bearer vl_uk_your_key"
}
}
}
}
Implementation pattern: This JSON configuration plugs into Claude Code, Cursor, Windsurf, or any MCP-compatible IDE. The vl_uk_ key carries RBAC permissions—so your IDE access is governed by the same team/org hierarchy as your API consumers.
Management tools (list_models, get_usage, create_key) auto-register at /api/v1/mcp/voidllm. External MCP servers you configure appear at /api/v1/mcp/:alias. This unified namespace means your IDE discovers and uses tools through governed, metered, access-controlled channels—not direct connections that bypass your infrastructure policies.
Example 4: Database Migration and License Management
VoidLLM includes CLI utilities for operational tasks:
# Bidirectional database migration — SQLite to PostgreSQL for production scaling
voidllm migrate --from sqlite:///data/voidllm.db --to postgres://user:pass@host/db
# Enterprise license verification (offline-capable JWT validation)
voidllm license verify < license.jwt
Operational note: The migration tool supports bidirectional transfers, enabling rollback strategies and environment synchronization. For Enterprise deployments, license verification runs entirely offline—no phoning home, consistent with VoidLLM's privacy-first philosophy.
Advanced Usage & Best Practices
Rate Limit Strategy: Most-Restrictive-Wins
VoidLLM enforces rate limits at org, team, user, and key levels simultaneously. When conflicts arise, the most restrictive limit applies. Configure generous org defaults, then tighten at team or key granularity for specific use cases.
Health-Aware Routing Optimization
Enable least-latency strategy for user-facing applications where TTFT (Time To First Token) matters. Use priority routing with cost-optimized secondary deployments for batch processing. Monitor Prometheus metrics voidllm_routing_decision_total to validate routing behavior.
Secret Rotation Without Downtime
Because applications use virtual keys (vl_uk_...), rotate underlying provider API keys by updating environment variables and restarting VoidLLM. Application credentials remain valid—no coordinated deployment required across services.
MCP Server Hardening
For external MCP servers, enforce team-scoped access rather than global availability. Block specific tools from Code Mode via the per-tool blocklist if they perform destructive operations. Monitor voidllm_mcp_tool_call_duration_seconds for anomalous execution patterns.
Backup Your SQLite Database
For single-node deployments, the SQLite database at /data/voidllm.db contains all governance metadata. Back up this file alongside your voidllm.yaml configuration. For zero-downtime upgrades, use PostgreSQL with managed backups.
Comparison with Alternatives
| Capability | VoidLLM | LiteLLM Proxy | Kong AI Gateway | Cloudflare AI Gateway |
|---|---|---|---|---|
| Self-hosted | ✅ Full control | ✅ Yes | ✅ Yes | ❌ SaaS only |
| Zero-knowledge prompts | ✅ By architecture | ❌ Logs configurable | ❌ Logs configurable | ❌ Cloudflare sees all |
| Prompt content logging toggle | ❌ Does not exist | ✅ Present (risk) | ✅ Present (risk) | N/A |
| Multi-provider routing | ✅ OpenAI, Anthropic, Azure, Ollama, vLLM, custom | ✅ 100+ providers | ✅ Via plugins | ✅ Limited set |
| Load balancing + failover | ✅ Built-in, multiple strategies | ✅ Basic | ✅ Via upstream config | ❌ Single endpoint |
| RBAC hierarchy | ✅ Org > Team > User > Key | ✅ Team/key level | ❌ Enterprise plugin | ❌ Per-account only |
| MCP Gateway | ✅ Built-in with Code Mode | ❌ Not supported | ❌ Not supported | ❌ Not supported |
| WASM-sandboxed tool execution | ✅ QuickJS isolation | ❌ N/A | ❌ N/A | ❌ N/A |
| Pricing model | Flat ($49-149/mo) or free | Usage-based + seat fees | Enterprise contract | Per-request charges |
| OpenTelemetry export | ✅ Enterprise tier | ❌ Limited | ✅ Via plugins | ❌ Proprietary |
| Source available | ✅ BSL 1.1 → Apache 2.0 | ✅ MIT | ❌ Proprietary | ❌ Proprietary |
The verdict: If privacy is non-negotiable and you need MCP gateway capabilities, VoidLLM stands alone. LiteLLM offers broader provider coverage but lacks architectural privacy guarantees and MCP support. Enterprise gateways like Kong require heavy customization for AI-specific workflows. Cloudflare's offering forces trust in their infrastructure—precisely what VoidLLM eliminates.
FAQ
Is VoidLLM actually zero-knowledge, or is there a hidden logging setting?
Architecturally zero-knowledge. There is no configuration option to enable content logging—it does not exist in the codebase. Prompt and response bodies never enter logs, database, or disk. Only metadata (identity, model, tokens, duration) persists for governance.
Can I use VoidLLM with my existing OpenAI SDK code?
Yes—drop-in replacement. Change your base_url to your VoidLLM instance and your api_key to a VoidLLM virtual key. All endpoints (/v1/chat/completions, embeddings, streaming) remain identical.
What happens when my primary LLM provider goes down?
Automatic failover. Configure multiple deployments per model alias with priority levels. VoidLLM retries on 5xx/timeout, uses circuit breakers, and health-checks endpoints. Your application sees continuous service.
How does Code Mode reduce token usage by 30-80%?
Traditional MCP requires one LLM turn per tool call—each turn includes full context window. Code Mode lets the LLM generate one JavaScript program that makes multiple tool calls, processes results, and returns a final answer. Fewer round-trips, dramatically fewer tokens.
Is the WASM sandbox actually secure?
Yes—defense in depth. QuickJS runs inside WebAssembly with no filesystem, network, or host access. The runtime pool isolates executions. Memory limits and timeouts prevent resource exhaustion. Even malicious generated code cannot escape.
What's the catch with the Business Source License?
Self-hosting is fully permitted. The BSL 1.1 prohibits competing hosted services (you can't sell VoidLLM-as-a-Service). Every release converts to Apache 2.0 after four years. You're free to use, modify, and deploy internally without restrictions.
Do I need Kubernetes, or can I run this on a single VM?
Single VM works perfectly. SQLite default, single binary, minimal resources. Kubernetes with Helm is recommended for high-availability production, but Docker Compose on one machine handles substantial team loads.
Conclusion
The AI infrastructure landscape is crowded with proxies that promise governance while quietly retaining the right to read everything you send. VoidLLM is the rare exception that proves privacy and control aren't trade-offs—they're architectural foundations.
For teams handling sensitive data, managing multi-provider complexity, or simply refusing to outsource their AI traffic to yet another black box, VoidLLM delivers something genuinely different: a self-hosted gateway that's faster than the alternatives, more private than the alternatives, and now more capable with its integrated MCP gateway and Code Mode execution.
The sub-2ms overhead means you're not sacrificing performance for principles. The flat pricing means predictable costs as you scale. And the zero-knowledge guarantee means you'll never have to explain to your board why customer prompts were found in a third-party log file.
Ready to take control? Deploy VoidLLM in 60 seconds with Docker, explore the full documentation, and join the teams who've already made the switch. The repository is waiting—your prompts aren't.
→ Star VoidLLM on GitHub | → Deploy on Railway | → Read the Docs