PromptHub
Developer Tools AI Infrastructure

VoidLLM: The Privacy-First LLM Proxy Teams Are Secretly Switching To

B

Bright Coding

Author

15 min read
61 views
VoidLLM: The Privacy-First LLM Proxy Teams Are Secretly Switching To

VoidLLM: The Privacy-First LLM Proxy Teams Are Secretly Switching To

What if every prompt your team sent to ChatGPT was being logged, analyzed, and potentially leaked by the proxy sitting in between? Here's the uncomfortable truth: most LLM proxies and AI gateways on the market today store your prompts by default. They call it "debugging." They call it "analytics." What they don't call it is what it actually is—a massive privacy liability waiting to explode.

If you're a CTO, platform engineer, or security-conscious developer, you've felt this tension. Your team needs OpenAI's smarts, Anthropic's safety, Azure's enterprise guarantees, and Ollama's cost efficiency. But stitching them together means either sharing raw API keys in Slack (disaster) or trusting some SaaS middleman with your most sensitive data (bigger disaster). Every healthcare startup, every legal tech platform, every fintech handling customer data— they're all one proxy log away from a headline they can't recover from.

Enter VoidLLM.

This isn't another "we take privacy seriously" footnote in a Terms of Service. VoidLLM is architected as a zero-knowledge LLM proxy from the ground up. No prompt storage. No response logging. No "enable_content_logging" toggle that shouldn't exist in the first place. Just pure, self-hosted infrastructure that routes, balances, governs, and protects—while remaining utterly blind to what your users are actually asking. One Go binary. Sub-2ms overhead. Total control. And it's why engineering teams are quietly abandoning centralized AI gateways for this open-source alternative.


What Is VoidLLM?

VoidLLM is a self-hosted LLM proxy and AI gateway built by VoidMind that sits between your applications and any LLM provider—OpenAI, Anthropic, Azure, Ollama, vLLM, or custom endpoints. Think of it as the intelligent traffic controller for your organization's AI consumption, but one that fundamentally cannot read your mail.

Born from the frustration of teams sharing API keys like office WiFi passwords, VoidLLM delivers organization-wide access control, virtual API key management, granular usage tracking, rate limiting, and multi-deployment load balancing—all without ever persisting prompt or response content. The architecture is deliberate: content passes through memory only, while metadata (who, what model, how many tokens, how long) gets tracked for governance.

The project is written in Go (1.23+), ships as a single static binary, and offers a modern React-based web UI for administration. It's gaining serious traction because it solves the trilemma that has plagued AI infrastructure: How do you get enterprise governance without enterprise bloat, and privacy without sacrificing observability?

VoidLLM answers with a clean separation: governance through metadata, privacy through architecture. Not through promises. Through code that makes content logging physically impossible.


Key Features That Separate VoidLLM from the Herd

Zero-Knowledge Proxy Architecture

This isn't marketing fluff. VoidLLM's core design eliminates the very possibility of prompt logging:

  • No request body in logs, database, or persistent storage
  • No response body anywhere durable
  • No prompt caching to disk—content lives in memory only during transit
  • No enable_content_logging option exists in the codebase

The only data that persists: who made the request (key/team/org), which model, token counts, cost estimates, and duration. This makes GDPR compliance dramatically simpler—no personal data from prompts enters your database.

OpenAI-Compatible Universal Routing

Drop-in replacement for OpenAI's API. Change your base_url, keep your SDK. Supports /v1/chat/completions, embeddings, images, audio, and streaming. Your existing Python, JavaScript, or Go code works unchanged.

Multi-Provider Load Balancing with Failover

Configure multiple deployments per model with round-robin, least-latency, weighted, or priority routing. Automatic retry on 5xx/timeout, circuit breakers, and health-aware routing mean provider outages become invisible to your applications.

Granular RBAC and Virtual Keys

Org > Team > User > Key hierarchy with four distinct roles. Instead of sharing raw OpenAI keys, create scoped virtual keys (vl_uk_...) with specific permissions, rate limits, and budgets. Revoke instantly without rotating provider credentials.

MCP Gateway and Code Mode

VoidLLM functions as a Model Context Protocol (MCP) gateway, proxying external MCP servers with access control and automatic session management. The standout Code Mode lets LLMs write JavaScript that orchestrates multiple tool calls in a single execution—running in a WASM-sandboxed QuickJS runtime with no filesystem, network, or host access. Result: 30-80% token reduction versus one-tool-per-turn patterns.

Enterprise-Grade Observability

Prometheus metrics for latency, tokens, active streams, routing decisions, and health. Optional OpenTelemetry OTLP/gRPC export with request ID correlation. SQLite default, PostgreSQL for production. Helm charts for Kubernetes. Graceful shutdown handling.


Real-World Use Cases Where VoidLLM Dominates

1. Healthcare AI: HIPAA-Compliant LLM Access

Medical startups using LLMs for clinical documentation face a nightmare: patient data in prompts can't touch third-party logs. VoidLLM's zero-knowledge architecture means PHI never enters persistent storage. Self-host on your VPC, route to Azure OpenAI with private endpoints, and maintain full audit trails of who accessed which model when—without ever recording what was asked.

2. Multi-Team SaaS Platform: Cost Control Chaos

Your engineering team burns through GPT-4 tokens for code review. Marketing experiments with Claude for copy. Support tests fine-tuned models. Without governance, budgets collide. VoidLLM assigns per-team token budgets and rate limits with automatic enforcement. Marketing hits their daily cap? They get a graceful error, not a $50K surprise invoice.

3. Financial Services: Zero-Trust AI Infrastructure

Banks and fintechs can't risk prompt content leaking through a managed proxy's logging infrastructure. VoidLLM deploys entirely on-premise or in your cloud account, with SSO/OIDC integration (Enterprise tier). Every request authenticated, every action audited, no content ever retained. The proxy becomes a compliance asset, not a liability.

4. AI-Native Development: Seamless Provider Migration

You built on OpenAI. GPT-5 gets delayed, prices spike, or you need Claude's longer context. Normally: refactor every openai.ChatCompletion call. With VoidLLM: update your model alias configuration. Clients still call model: "default" or model: "smart"—you control where it routes. A/B test providers, implement fallbacks, optimize costs without touching application code.

5. MCP-Powered IDE Workflows

Connect Claude Code, Cursor, or Windsurf to VoidLLM's MCP endpoint. Your IDE gains access to managed tool orchestration with usage tracking and governance. Code Mode's multi-tool execution slashes token costs while the WASM sandbox prevents rogue tool calls from accessing your system.


Step-by-Step Installation & Setup Guide

Prerequisites

  • Docker (recommended) or Go 1.23+ with Node 20+ for source builds
  • OpenSSL for key generation
  • 512MB RAM minimum, 2GB recommended for production

Docker Deployment (Fastest Path)

# Generate cryptographically secure keys
export VOIDLLM_ADMIN_KEY=$(openssl rand -base64 32)
export VOIDLLM_ENCRYPTION_KEY=$(openssl rand -base64 32)

# Create configuration from example
cp voidllm.yaml.example voidllm.yaml

# Start the LLM proxy
docker run -p 8080:8080 \
  -e VOIDLLM_ADMIN_KEY -e VOIDLLM_ENCRYPTION_KEY \
  -v $(pwd)/voidllm.yaml:/etc/voidllm/voidllm.yaml:ro \
  -v voidllm_data:/data \
  ghcr.io/voidmind-io/voidllm:latest

Critical: On first boot, VoidLLM prints bootstrap credentials to stdout exactly once:

========================================
 BOOTSTRAP COMPLETE - COPY THESE NOW
========================================
  API Key:    vl_uk_a3f2...
  Email:      admin@voidllm.local
  Password:   <random>
========================================

Save these immediately. Open http://localhost:8080, authenticate, and begin configuration.

Binary Installation (No Docker)

# Linux amd64 — adjust for your platform
curl -sL https://github.com/voidmind-io/voidllm/releases/latest/download/voidllm-linux-amd64.tar.gz | tar xz

export VOIDLLM_ADMIN_KEY=$(openssl rand -base64 32)
export VOIDLLM_ENCRYPTION_KEY=$(openssl rand -base64 32)

./voidllm

Platforms available: Linux (amd64, arm64), Windows (amd64, arm64), macOS (amd64, arm64).

Docker Compose (Development)

cp voidllm.yaml.example voidllm.yaml
export VOIDLLM_ADMIN_KEY=$(openssl rand -base64 32)
export VOIDLLM_ENCRYPTION_KEY=$(openssl rand -base64 32)
docker-compose up

Kubernetes (Production)

helm install voidllm chart/voidllm/ \
  --set secrets.adminKey=$(openssl rand -base64 32) \
  --set secrets.encryptionKey=$(openssl rand -base64 32) \
  --set config.models[0].name=my-model \
  --set config.models[0].provider=ollama \
  --set config.models[0].base_url=http://ollama:11434/v1

Optional subcharts: PostgreSQL for production database, Redis for distributed caching (multi-pod Code Mode support coming).

From Source (Contributors/Custom Builds)

# Prerequisites verified: Go 1.23+, Node 20+
cd ui && npm ci && npm run build && cd ..
go run ./cmd/voidllm --config voidllm.yaml

First API Call Test

curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer vl_uk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"model":"default","messages":[{"role":"user","content":"Verify VoidLLM is working"}]}'

Any OpenAI-compatible SDK works—just change base_url to your VoidLLM instance.


REAL Code Examples from the Repository

Example 1: Production Configuration with Multi-Provider Failover

This YAML configuration from VoidLLM's documentation demonstrates enterprise-grade routing with environment variable interpolation for secrets:

server:
  proxy:
    port: 8080

models:
  # Single endpoint for local development
  - name: dolphin-mistral
    provider: ollama
    base_url: http://localhost:11434/v1
    timeout: 30s
    aliases: [default]  # Clients call "default", routes here
    pricing:
      input_per_1m: 0.15    # $0.15 per million input tokens
      output_per_1m: 0.60   # Enables cost tracking and budgets

  # Load-balanced with automatic failover across cloud providers
  - name: gpt-4o
    strategy: round-robin   # Also supports: least-latency, weighted, priority
    aliases: [smart]        # Application code uses "smart", not provider names
    deployments:
      - name: azure-east
        provider: azure
        base_url: https://eastus.openai.azure.com
        api_key: ${AZURE_EAST_KEY}  # Never hardcode secrets
        azure_deployment: gpt-4o
        priority: 1         # Primary: tried first
      - name: openai-fallback
        provider: openai
        base_url: https://api.openai.com/v1
        api_key: ${OPENAI_KEY}
        priority: 2         # Fallback: used if priority 1 fails health checks

# MCP server integration for tool-augmented workflows
mcp_servers:
  - name: AWS Knowledge
    alias: aws
    url: https://knowledge-mcp.global.api.aws
    auth_type: none

settings:
  admin_key: ${VOIDLLM_ADMIN_KEY}
  encryption_key: ${VOIDLLM_ENCRYPTION_KEY}
  mcp:
    code_mode:
      enabled: true         # Enable WASM-sandboxed multi-tool execution

Key insight: The aliases abstraction is VoidLLM's secret weapon for operational agility. Your application code references model: "default" or model: "smart". Behind the scenes, you can migrate from Ollama to Azure, add failover deployments, or A/B test providers—zero application changes required.

The ${VAR} syntax for environment variable interpolation ensures secrets never enter version control. The pricing block enables real-time cost estimation and budget enforcement at the proxy level, before requests reach expensive endpoints.

Example 2: Code Mode WASM Sandbox Configuration

VoidLLM's Code Mode configuration controls the JavaScript execution environment for multi-tool MCP orchestration:

mcp:
  code_mode:
    enabled: true
    pool_size: 8          # Concurrent WASM QuickJS runtimes for parallel execution
    memory_limit_mb: 16   # Per-execution memory cap prevents resource exhaustion
    timeout: 30s          # Hard timeout kills runaway LLM-generated code
    max_tool_calls: 50    # Upper bound on tool invocations per execution

Why this matters: Traditional MCP workflows require one LLM turn per tool call—massively inefficient for complex research or data analysis tasks. Code Mode lets the LLM generate a single JavaScript program that sequences multiple tool calls, processes intermediate results, and returns a synthesized answer.

The security model is critical: QuickJS in WASM provides a deterministic, sandboxed environment with no filesystem access, no network access, and no host system access. Even if an LLM generates malicious code, the blast radius is contained. The pool_size controls concurrency for throughput, while memory_limit_mb and timeout prevent denial-of-service from pathological generated code.

Example 3: MCP IDE Integration

Connect your development environment to VoidLLM's managed MCP infrastructure:

{
  "mcpServers": {
    "voidllm": {
      "type": "http",
      "url": "http://your-voidllm-instance:8080/api/v1/mcp",
      "headers": { 
        "Authorization": "Bearer vl_uk_your_key" 
      }
    }
  }
}

Implementation pattern: This JSON configuration plugs into Claude Code, Cursor, Windsurf, or any MCP-compatible IDE. The vl_uk_ key carries RBAC permissions—so your IDE access is governed by the same team/org hierarchy as your API consumers.

Management tools (list_models, get_usage, create_key) auto-register at /api/v1/mcp/voidllm. External MCP servers you configure appear at /api/v1/mcp/:alias. This unified namespace means your IDE discovers and uses tools through governed, metered, access-controlled channels—not direct connections that bypass your infrastructure policies.

Example 4: Database Migration and License Management

VoidLLM includes CLI utilities for operational tasks:

# Bidirectional database migration — SQLite to PostgreSQL for production scaling
voidllm migrate --from sqlite:///data/voidllm.db --to postgres://user:pass@host/db

# Enterprise license verification (offline-capable JWT validation)
voidllm license verify < license.jwt

Operational note: The migration tool supports bidirectional transfers, enabling rollback strategies and environment synchronization. For Enterprise deployments, license verification runs entirely offline—no phoning home, consistent with VoidLLM's privacy-first philosophy.


Advanced Usage & Best Practices

Rate Limit Strategy: Most-Restrictive-Wins

VoidLLM enforces rate limits at org, team, user, and key levels simultaneously. When conflicts arise, the most restrictive limit applies. Configure generous org defaults, then tighten at team or key granularity for specific use cases.

Health-Aware Routing Optimization

Enable least-latency strategy for user-facing applications where TTFT (Time To First Token) matters. Use priority routing with cost-optimized secondary deployments for batch processing. Monitor Prometheus metrics voidllm_routing_decision_total to validate routing behavior.

Secret Rotation Without Downtime

Because applications use virtual keys (vl_uk_...), rotate underlying provider API keys by updating environment variables and restarting VoidLLM. Application credentials remain valid—no coordinated deployment required across services.

MCP Server Hardening

For external MCP servers, enforce team-scoped access rather than global availability. Block specific tools from Code Mode via the per-tool blocklist if they perform destructive operations. Monitor voidllm_mcp_tool_call_duration_seconds for anomalous execution patterns.

Backup Your SQLite Database

For single-node deployments, the SQLite database at /data/voidllm.db contains all governance metadata. Back up this file alongside your voidllm.yaml configuration. For zero-downtime upgrades, use PostgreSQL with managed backups.


Comparison with Alternatives

Capability VoidLLM LiteLLM Proxy Kong AI Gateway Cloudflare AI Gateway
Self-hosted ✅ Full control ✅ Yes ✅ Yes ❌ SaaS only
Zero-knowledge prompts By architecture ❌ Logs configurable ❌ Logs configurable ❌ Cloudflare sees all
Prompt content logging toggle Does not exist ✅ Present (risk) ✅ Present (risk) N/A
Multi-provider routing ✅ OpenAI, Anthropic, Azure, Ollama, vLLM, custom ✅ 100+ providers ✅ Via plugins ✅ Limited set
Load balancing + failover ✅ Built-in, multiple strategies ✅ Basic ✅ Via upstream config ❌ Single endpoint
RBAC hierarchy ✅ Org > Team > User > Key ✅ Team/key level ❌ Enterprise plugin ❌ Per-account only
MCP Gateway ✅ Built-in with Code Mode ❌ Not supported ❌ Not supported ❌ Not supported
WASM-sandboxed tool execution ✅ QuickJS isolation ❌ N/A ❌ N/A ❌ N/A
Pricing model Flat ($49-149/mo) or free Usage-based + seat fees Enterprise contract Per-request charges
OpenTelemetry export ✅ Enterprise tier ❌ Limited ✅ Via plugins ❌ Proprietary
Source available ✅ BSL 1.1 → Apache 2.0 ✅ MIT ❌ Proprietary ❌ Proprietary

The verdict: If privacy is non-negotiable and you need MCP gateway capabilities, VoidLLM stands alone. LiteLLM offers broader provider coverage but lacks architectural privacy guarantees and MCP support. Enterprise gateways like Kong require heavy customization for AI-specific workflows. Cloudflare's offering forces trust in their infrastructure—precisely what VoidLLM eliminates.


FAQ

Is VoidLLM actually zero-knowledge, or is there a hidden logging setting?

Architecturally zero-knowledge. There is no configuration option to enable content logging—it does not exist in the codebase. Prompt and response bodies never enter logs, database, or disk. Only metadata (identity, model, tokens, duration) persists for governance.

Can I use VoidLLM with my existing OpenAI SDK code?

Yes—drop-in replacement. Change your base_url to your VoidLLM instance and your api_key to a VoidLLM virtual key. All endpoints (/v1/chat/completions, embeddings, streaming) remain identical.

What happens when my primary LLM provider goes down?

Automatic failover. Configure multiple deployments per model alias with priority levels. VoidLLM retries on 5xx/timeout, uses circuit breakers, and health-checks endpoints. Your application sees continuous service.

How does Code Mode reduce token usage by 30-80%?

Traditional MCP requires one LLM turn per tool call—each turn includes full context window. Code Mode lets the LLM generate one JavaScript program that makes multiple tool calls, processes results, and returns a final answer. Fewer round-trips, dramatically fewer tokens.

Is the WASM sandbox actually secure?

Yes—defense in depth. QuickJS runs inside WebAssembly with no filesystem, network, or host access. The runtime pool isolates executions. Memory limits and timeouts prevent resource exhaustion. Even malicious generated code cannot escape.

What's the catch with the Business Source License?

Self-hosting is fully permitted. The BSL 1.1 prohibits competing hosted services (you can't sell VoidLLM-as-a-Service). Every release converts to Apache 2.0 after four years. You're free to use, modify, and deploy internally without restrictions.

Do I need Kubernetes, or can I run this on a single VM?

Single VM works perfectly. SQLite default, single binary, minimal resources. Kubernetes with Helm is recommended for high-availability production, but Docker Compose on one machine handles substantial team loads.


Conclusion

The AI infrastructure landscape is crowded with proxies that promise governance while quietly retaining the right to read everything you send. VoidLLM is the rare exception that proves privacy and control aren't trade-offs—they're architectural foundations.

For teams handling sensitive data, managing multi-provider complexity, or simply refusing to outsource their AI traffic to yet another black box, VoidLLM delivers something genuinely different: a self-hosted gateway that's faster than the alternatives, more private than the alternatives, and now more capable with its integrated MCP gateway and Code Mode execution.

The sub-2ms overhead means you're not sacrificing performance for principles. The flat pricing means predictable costs as you scale. And the zero-knowledge guarantee means you'll never have to explain to your board why customer prompts were found in a third-party log file.

Ready to take control? Deploy VoidLLM in 60 seconds with Docker, explore the full documentation, and join the teams who've already made the switch. The repository is waiting—your prompts aren't.

→ Star VoidLLM on GitHub | → Deploy on Railway | → Read the Docs

Comments (0)

Comments are moderated before appearing.

No comments yet. Be the first to share your thoughts!

Support us! ☕