PromptHub
Developer Tools Database Technologies

Stop Running Two Databases! ParadeDB Brings Elastic Search to Postgres

B

Bright Coding

Author

15 min read
36 views
Stop Running Two Databases! ParadeDB Brings Elastic Search to Postgres

Stop Running Two Databases! ParadeDB Brings Elastic Search to Postgres

What if I told you that your search infrastructure is a ticking time bomb?

Every day, thousands of engineering teams wake up to the same nightmare: their application database (Postgres) and their search engine (Elasticsearch, OpenSearch, or Algolia) have drifted out of sync. Users can't find products that exist. Analytics return stale data. And the "simple" fix? A fragile pipeline of CDC streams, message queues, and prayer.

Here's the dirty secret nobody talks about: running two databases for one application is a tax on your sanity. You're not just paying double in infrastructure costs. You're paying in operational complexity, data consistency bugs, and the creeping dread of 3 AM pages when your sync pipeline chokes on a malformed document.

But what if you could delete Elasticsearch entirely?

Enter ParadeDB — a PostgreSQL extension that brings Elastic-quality full-text search and analytics directly inside your existing Postgres database. No sync pipelines. No dual writes. No operational Jenga tower. Just pure SQL, blazing-fast search, and the sweet relief of architectural simplicity.

This isn't theoretical. The paradedb/paradedb repository is already powering production workloads, and it's about to change how you think about search infrastructure forever. Ready to burn your Elasticsearch cluster to the ground? Let's dive in.


What is ParadeDB?

ParadeDB is an open-source PostgreSQL extension that transforms your familiar relational database into a world-class search and analytics engine. Built by a team of database engineers who clearly suffered through the same dual-database pain as the rest of us, it embeds battle-tested Rust libraries directly into Postgres — giving you Elasticsearch-quality capabilities without ever leaving your SQL comfort zone.

The project was founded with a radical premise: search isn't a separate concern, it's a query pattern. And Postgres — with its extensible architecture, robust transaction model, and massive ecosystem — deserves to handle it natively.

ParadeDB achieves this through three core Rust-powered dependencies:

  • pgrx — The bridge that lets Rust code run safely inside Postgres, handling memory management, type conversions, and the extension lifecycle
  • Tantivy — The same Lucene-inspired search library that powers Quickwit, adapted for embedded use within Postgres
  • Apache DataFusion — A query execution framework that brings columnar analytics and OLAP performance to your PostgreSQL queries

The result? A single database that handles your transactional workloads, your full-text search, and your analytical aggregations — with ACID guarantees across all of them.

ParadeDB is licensed under the GNU Affero General Public License v3.0, with commercial licensing available for enterprise deployments. The project maintains active development with a public roadmap tracking upcoming features like vector search and hybrid search capabilities.


Key Features That Make ParadeDB Insane

ParadeDB isn't a toy project or a thin wrapper around external services. It's a deep integration that redefines what's possible inside PostgreSQL. Here's what you're actually getting:

Full-Text Search Engine (BM25, Top-K, Highlighting)

ParadeDB implements BM25 scoring — the same relevance algorithm that powers Elasticsearch and Google Search's early iterations. This isn't string matching with ILIKE; it's proper inverted-index search with term frequency, inverse document frequency, and field-length normalization.

You get Top-K retrieval for lightning-fast result ranking without sorting entire result sets, plus highlighting that returns snippet contexts with matched terms emphasized — critical for search result UIs.

Advanced Tokenization Pipeline

Search quality lives or dies on tokenization. ParadeDB exposes custom tokenizers and token filters that let you configure:

  • Language-specific stemming (English, French, etc.)
  • N-gram generation for partial matching
  • Edge n-grams for autocomplete
  • Stop word filtering
  • Lowercasing and ASCII folding

This is the same flexibility Elasticsearch offers through its analyzer chains, now configurable via SQL.

Columnar Analytics with Aggregations

Here's where ParadeDB transcends typical search extensions. Using Apache DataFusion and columnar storage, it delivers true OLAP performance for:

  • Bucket aggregations — date histograms, term aggregations, range bucketing
  • Metrics aggregations — averages, percentiles, cardinality, extended stats
  • Faceted navigation — the "filter by category, brand, price range" patterns that power e-commerce

The columnar storage engine means these aggregations scan compressed, memory-efficient formats rather than row-oriented heap pages.

JOIN-Aware Search

This is the killer feature that separate search engines can't touch. Because ParadeDB runs inside Postgres, your search queries can JOIN against any other table with full transactional consistency. No denormalization required. No eventual consistency windows. Your search index and your relational data are literally the same storage engine.

SQL-Native Interface

Every ParadeDB operation is expressed in standard SQL. No proprietary query DSL to learn. No JSON query builders. Just CREATE INDEX, SELECT ... WHERE, and the same PostgreSQL you've used for years. This means immediate compatibility with every ORM, every BI tool, every migration framework in the Postgres ecosystem.


Real-World Use Cases Where ParadeDB Destroys the Competition

E-Commerce Product Search

You're building the next Shopify competitor. Products have variants, inventory levels, pricing rules, and vendor relationships — all in normalized tables. With Elasticsearch, you'd need to flatten this into a search index, sync on every inventory change, and handle the inevitable "search shows in-stock, cart says out-of-stock" bugs.

ParadeDB solution: Search directly across joined tables. Filter by real-time inventory from your inventory table. Apply customer-specific pricing from your pricing_rules table. One query, one transaction, zero sync lag.

SaaS Application Search

Your B2B SaaS has multi-tenant data with complex permission models. Users can only search documents they own or have been shared. Traditional search requires either per-tenant indexes (expensive) or filtering post-query (slow and leaky).

ParadeDB solution: Row-level security policies + search in one database. The same USING clauses that protect your data also constrain search results. No data ever leaks because the search engine is your authorization layer.

Log Analytics Without the ELK Stack

You're ingesting application logs, want full-text search for debugging, plus aggregations for error rate dashboards. The ELK stack means three more services to operate, plus the infamous JVM tuning sessions.

ParadeDB solution: Logs go straight to Postgres. Search with BM25 relevance when investigating incidents. Aggregate with DataFusion-powered columnar queries for dashboards. One backup strategy, one monitoring target, one less 3 AM page.

Document Management with Faceted Navigation

Legal tech, research platforms, content management — anywhere users need to drill down through document collections using multiple filter dimensions.

ParadeDB solution: Native faceted aggregations that return counts per category alongside search results, with sub-second performance on millions of documents. Update documents transactionally without cache invalidation gymnastics.


Step-by-Step Installation & Setup Guide

Getting ParadeDB running takes literal seconds. Here's the complete path from zero to searching.

Quick Install (Recommended)

The fastest way to get ParadeDB is through the official install script:

# Download and execute the ParadeDB installer
curl -fsSL https://paradedb.com/install.sh | sh

This script detects your platform and installs the appropriate ParadeDB package, which bundles a Postgres distribution with the extension pre-compiled.

Docker Deployment

For containerized environments or local development:

# Pull the official ParadeDB image
docker pull paradedb/paradedb

# Run with persistent storage
docker run -d \
  --name paradedb \
  -e POSTGRES_PASSWORD=your_secure_password \
  -v paradedb_data:/var/lib/postgresql/data \
  -p 5432:5432 \
  paradedb/paradedb

The Docker image is production-hardened with thousands of pulls tracked on Docker Hub.

Enable the Extension

Once your ParadeDB instance is running, connect with any PostgreSQL client and activate the extension:

-- Create the extension in your database
CREATE EXTENSION IF NOT EXISTS paradedb;

-- Verify installation
SELECT paradedb.version();

Cloud Platform Deployment

ParadeDB publishes guides for major platforms:

  • Railway — One-click deploy with automatic HTTPS
  • Render — Managed PostgreSQL with ParadeDB pre-installed
  • DigitalOcean — Droplet-based deployment with full control

Heroku support is on the roadmap. For production deployments requiring high availability, connection pooling, or dedicated support, explore ParadeDB Enterprise options.

Environment Configuration

For optimal performance, configure your postgresql.conf:

# Memory for ParadeDB's columnar operations
shared_preload_libraries = 'paradedb'

# Work memory for sort operations and aggregations
work_mem = '256MB'

# Parallel query execution for large aggregations
max_parallel_workers_per_gather = 4

Restart Postgres after modifying these values.


REAL Code Examples from ParadeDB

Let's see ParadeDB in action with actual patterns from the repository and documentation. These aren't toy examples — they're production-ready implementations.

Example 1: Creating a BM25 Search Index

The foundation of ParadeDB search is the BM25 index, built on Tantivy's inverted index structure:

-- Create a table for our searchable content
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    description TEXT,
    category VARCHAR(100),
    price DECIMAL(10,2),
    created_at TIMESTAMP DEFAULT NOW()
);

-- Insert sample data
INSERT INTO products (name, description, category, price) VALUES
    ('Wireless Noise-Canceling Headphones', 
     'Premium over-ear headphones with 30-hour battery life and adaptive noise cancellation', 
     'Electronics', 299.99),
    ('Mechanical Keyboard', 
     'Hot-swappable switches with RGB backlighting and programmable macros', 
     'Electronics', 149.50),
    ('Organic Green Tea', 
     'Single-origin sencha from Shizuoka prefecture, 50g tin', 
     'Food & Beverage', 24.99);

-- Create the BM25 search index on name and description fields
-- This builds Tantivy's inverted index inside Postgres
CREATE INDEX idx_products_search ON products
USING bm25 (id, name, description)
WITH (key_field='id');  -- 'id' is the unique document identifier for the index

The key_field parameter tells ParadeDB which column uniquely identifies each document — critical for index maintenance and result correlation.

Example 2: Full-Text Search with Relevance Scoring

Now query with BM25 relevance ranking:

-- Search for products matching 'keyboard' with relevance scoring
-- The paradedb.score() function returns the BM25 relevance score
SELECT 
    id,
    name,
    description,
    paradedb.score(id) AS relevance  -- BM25 score: higher = more relevant
FROM products
WHERE id @@@ 'keyboard'  -- @@@ is ParadeDB's search operator
ORDER BY relevance DESC;

-- Results:
-- id | name                | description                                      | relevance
-- ----+---------------------+--------------------------------------------------+-----------
--  2  | Mechanical Keyboard | Hot-swappable switches with RGB backlighting...  | 2.8473

-- Search with highlighting to show matched terms in context
SELECT 
    id,
    name,
    paradedb.highlight(id, 'description') AS highlighted_description,
    -- highlight() wraps matched terms in <b> tags for UI rendering
    paradedb.score(id) AS relevance
FROM products
WHERE id @@@ 'noise battery'
ORDER BY relevance DESC;

The @@@ operator is ParadeDB's search syntax — cleaner than Elasticsearch's JSON DSL and immediately composable with standard SQL AND, OR, and NOT logic.

Example 3: Filtering and Aggregations

Combine search with relational filtering and columnar aggregations:

-- Search within a category, with price filtering
SELECT 
    id,
    name,
    price,
    paradedb.score(id) AS relevance
FROM products
WHERE 
    id @@@ 'premium wireless'  -- full-text search
    AND category = 'Electronics'  -- standard SQL filter (uses B-tree index)
    AND price < 500  -- range filter
ORDER BY relevance DESC
LIMIT 10;

-- Faceted aggregation: category counts for search results
-- This uses DataFusion's columnar engine for fast aggregation
SELECT 
    category,
    COUNT(*) AS product_count,
    AVG(price) AS avg_price,
    MIN(price) AS min_price,
    MAX(price) AS max_price
FROM products
WHERE id @@@ 'electronics OR food'
GROUP BY category
ORDER BY product_count DESC;

-- Date histogram aggregation for time-series analytics
SELECT 
    DATE_TRUNC('month', created_at) AS month,
    COUNT(*) AS new_products,
    SUM(price) AS total_value
FROM products
WHERE id @@@ 'organic OR premium'
GROUP BY month
ORDER BY month;

Notice how standard SQL GROUP BY, COUNT, AVG, and DATE_TRUNC work seamlessly with ParadeDB search. No proprietary aggregation syntax to learn.

Example 4: Top-K Efficient Retrieval

For large result sets where you only need the best matches:

-- Top-K retrieval: get only the 5 most relevant results
-- More efficient than ORDER BY + LIMIT for large datasets
-- Tantivy optimizes this to avoid scoring all documents
SELECT 
    id,
    name,
    paradedb.score(id) AS relevance
FROM products
WHERE id @@@ 'keyboard'
ORDER BY relevance DESC
LIMIT 5;

-- For pagination with stable ordering, use keyset pagination
SELECT 
    id,
    name,
    paradedb.score(id) AS relevance
FROM products
WHERE 
    id @@@ 'electronics'
    AND paradedb.score(id) < 3.5  -- Continue from previous page's last score
ORDER BY relevance DESC
LIMIT 5;

Advanced Usage & Best Practices

Index Design for Performance

  • Composite indexes: Create single BM25 indexes covering multiple fields rather than separate indexes per field. ParadeDB's index is already column-aware.
  • Selective indexing: Use WHERE clauses in index creation for partial indexes on frequently filtered subsets.
  • Analyze after bulk loads: Run ANALYZE after large data imports to update statistics for query planning.

Query Optimization

  • Push filters down: Place restrictive WHERE clauses before search terms when possible — ParadeDB can leverage both B-tree and BM25 indexes.
  • Avoid SELECT *: The columnar engine benefits from projection pushdown; specify only needed columns.
  • Use LIMIT aggressively: Tantivy's Top-K optimization only triggers with explicit limits.

Operational Considerations

  • Index maintenance: BM25 indexes update synchronously on writes. For high-ingest workloads, consider batching writes or using REINDEX during low-traffic windows.
  • Monitoring: Track paradedb.query_duration and index bloat through standard Postgres monitoring tools.
  • Backup strategy: BM25 indexes are physical structures included in pg_basebackup — no separate snapshot needed.

ParadeDB vs. The World: Why Ditch Your Current Stack?

Capability ParadeDB Elasticsearch OpenSearch Algolia
Deployment complexity Single Postgres instance 3+ node cluster minimum 3+ node cluster minimum Managed only, vendor lock-in
Query language Standard SQL Proprietary JSON DSL Proprietary JSON DSL Proprietary JSON API
Transactional consistency ACID (same as Postgres) Eventual consistency Eventual consistency Eventual consistency
JOIN support Native, any complexity Denormalization required Denormalization required Limited, index-level
Operational overhead Postgres expertise only Dedicated ES team often needed Similar to ES Minimal (managed) but expensive
Cost at scale Postgres instance cost only 2-3x infrastructure + operational cost Similar to ES $$$ per search operation
Analytics integration Native via DataFusion Requires separate stack Requires separate stack Not designed for analytics
Ecosystem compatibility All Postgres tools ES-specific tooling ES-compatible tooling Algolia-specific SDKs
Self-hosting ✅ Fully open source ✅ Open core ✅ Apache 2.0 ❌ SaaS only

The verdict: If you're already running Postgres and need search, ParadeDB eliminates an entire infrastructure tier. If you're starting fresh, it lets you defer the "do we need a separate search engine?" decision indefinitely.


Frequently Asked Questions

Is ParadeDB production-ready?

Yes. ParadeDB is deployed in production environments with millions of documents. The core search engine builds on Tantivy and DataFusion — both mature, battle-tested projects. As with any database extension, test thoroughly with your specific workload patterns.

Can I migrate from Elasticsearch to ParadeDB?

Migration is straightforward for search use cases: recreate your indexes in ParadeDB, transform your query DSL to SQL (usually simpler), and point your application at Postgres. The main consideration is whether you use Elasticsearch-specific features like complex percolator queries or machine learning — these may need alternative approaches.

Does ParadeDB support real-time search?

Yes. BM25 indexes update synchronously on committed transactions, so search results reflect the latest data immediately. There's no "index refresh interval" or near-real-time window as with Elasticsearch.

What Postgres versions are supported?

ParadeDB targets modern Postgres versions (14+). Check the compatibility matrix for specifics, as pgrx bindings evolve with Postgres releases.

How does ParadeDB handle high write throughput?

Synchronous index updates add write overhead comparable to maintaining any Postgres index. For extreme ingest rates, consider batching writes, using COPY, or temporarily dropping and recreating indexes for bulk loads.

Is vector search available?

Vector search is on the public roadmap with active development. For hybrid vector + keyword search, you'll need to combine ParadeDB with pgvector today — they're complementary extensions.

Can I use ParadeDB with my existing ORM?

Absolutely. ParadeDB publishes official integrations for Django, SQLAlchemy, and Rails. Any ORM that generates SQL works out of the box — just use raw SQL or custom query methods for @@@ search syntax.


Conclusion: The Future of Search is Postgres-Shaped

Here's what I've learned after years of wrestling with search infrastructure: complexity is the enemy of reliability. Every additional service, every sync pipeline, every "eventually consistent" promise is a place where systems fail and engineers burn out.

ParadeDB isn't just a faster way to search or a cheaper alternative to Elasticsearch. It's a fundamental rethinking of where search belongs in your architecture. By embedding Elastic-quality capabilities directly into PostgreSQL, it collapses the dual-database anti-pattern into a single, coherent system that your team already knows how to operate.

The technical execution is impressive — Rust-powered performance, Tantivy's proven search algorithms, DataFusion's analytical muscle. But the real magic is simplicity. One database. One query language. One backup. One thing to monitor at 3 AM.

If you're running Postgres today and maintaining separate search infrastructure, you owe it to your future self to evaluate ParadeDB. The installation takes seconds. The migration path is clear. And the operational relief is immediate.

Ready to simplify your stack? Head to github.com/paradedb/paradedb, star the repository to track development, and join their Slack community if you need help along the way. Your search infrastructure will thank you — and so will whoever's on call next weekend.


Have you tried ParadeDB in production? Drop your experience in the comments — I'm particularly curious about migration stories from Elasticsearch and performance benchmarks on large datasets.

Comments (0)

Comments are moderated before appearing.

No comments yet. Be the first to share your thoughts!

Support us! ☕