Vibium: The Revolutionary Browser Automation Tool
Browser automation for AI agents and humans.
Tired of wrestling with complex browser automation setups that feel like they were built for a different era? You're not alone. Today's AI agents need seamless, native access to the web, but existing tools force you through clunky APIs and proprietary protocols. Enter Vibium – a game-changing solution that gives your AI agents instant browser superpowers through simple CLI commands, MCP servers, and elegant client libraries. In this deep dive, you'll discover how Vibium's lightweight architecture, WebDriver BiDi foundation, and AI-first design make it the essential tool for modern automation workflows.
What is Vibium?
Vibium is a next-generation browser automation framework created by VibiumDev that fundamentally reimagines how AI agents and developers interact with web browsers. Unlike traditional automation tools that treat browser control as an afterthought, Vibium was built from the ground up with AI agents as first-class citizens. At its core, Vibium is a single ~10MB binary that transforms your terminal into a powerful browser command center.
The tool leverages the cutting-edge WebDriver BiDi protocol – a modern, bidirectional web standard that replaces the outdated JSON Wire Protocol used by legacy solutions. This standards-based approach means no more vendor lock-in or corporate-controlled APIs. Vibium automatically downloads Chrome for Testing and chromedriver on first install, eliminating the notorious browser setup headaches that plague developers.
What makes Vibium truly revolutionary is its skill-based architecture. By installing Vibium as a skill, your AI agent instantly learns 81 browser automation tools without any additional training. Whether you're using Claude Code, GitHub Copilot, Gemini, or custom LLM agents, Vibium integrates seamlessly through multiple interfaces: CLI commands for bash scripting, MCP (Model Context Protocol) servers for structured tool use, and native JavaScript/TypeScript and Python client libraries for programmatic control.
The project has gained rapid traction in the AI development community because it solves a critical pain point: giving agents reliable, deterministic browser access. While other tools require complex configuration and deep browser internals knowledge, Vibium's zero-config philosophy means you can go from installation to your first automated workflow in under five minutes.
Key Features That Make Vibium Essential
AI-Native Skill Architecture
Vibium's most groundbreaking feature is its skill-based installation. When you run npx skills add https://github.com/VibiumDev/vibium --skill vibe-check, you're not just installing a tool – you're teaching your AI agent 81 distinct browser automation capabilities. This approach transforms LLMs from passive code generators into active browser operators. The skill system stores commands in {project}/.agents/skills/vibium, making them automatically discoverable by agent frameworks.
Zero-Configuration Deployment
Forget about manual browser downloads, driver version mismatches, and complex PATH configurations. Vibium's installer automatically detects your platform (Linux x64, macOS Intel/ARM64, or Windows x64) and downloads the appropriate Chrome for Testing binary and chromedriver to your platform's cache directory. The browser runs visible by default during development, making debugging intuitive, while supporting headless mode for production deployments.
WebDriver BiDi Foundation
Built on the WebDriver BiDi standard, Vibium offers bidirectional communication with the browser. This modern protocol enables real-time event listening, network interception, and more responsive automation compared to the unidirectional HTTP polling of traditional WebDriver. You're no longer locked into proprietary protocols controlled by large corporations – Vibium embraces open web standards for maximum interoperability.
Multi-Interface Flexibility
Vibium adapts to your workflow, not the other way around. Use it as:
- CLI Skill: Direct bash commands for scripting and agent integration
- MCP Server: Structured tool definitions for Claude Code, Gemini CLI, and other MCP-compatible agents
- JS/TS Library: Both synchronous and asynchronous APIs for Node.js applications
- Python Library: Native sync and async support for Python automation scripts
Ultra-Lightweight Footprint
At approximately 10MB, the Vibium binary is a fraction of the size of competing frameworks. No runtime dependencies mean faster installations, smaller Docker images, and reduced attack surfaces for production deployments. This minimalist design philosophy extends to memory usage and CPU overhead during automation sessions.
Cross-Platform Reliability
Vibium supports all major platforms and architectures: Linux x64, macOS on both Intel and Apple Silicon, and Windows x64. The unified codebase ensures consistent behavior across environments, eliminating the "works on my machine" syndrome that plagues browser automation projects.
Real-World Use Cases Where Vibium Dominates
1. AI-Powered Web Scraping and Data Extraction
Modern data collection requires more than simple HTTP requests – you need JavaScript execution, form interaction, and dynamic content handling. Vibium enables AI agents to intelligently navigate complex websites, fill search forms, handle pagination, and extract structured data. Imagine an agent that can research competitors by automatically browsing their sites, taking screenshots of pricing pages, and compiling reports – all through natural language commands.
2. Automated Testing for AI-Generated Code
When LLMs generate web applications, you need automated validation. Vibium integrates seamlessly into CI/CD pipelines to test AI-generated UIs. Your agent can spin up browsers, verify that generated forms work correctly, check responsive layouts at different viewport sizes, and capture screenshots for visual regression testing. The vibium viewport command allows instant resolution switching to test mobile, tablet, and desktop layouts.
3. Robotic Process Automation (RPA) for Legacy Systems
Many enterprises still rely on web-based legacy systems without APIs. Vibium becomes your digital workforce, automating repetitive browser tasks. An AI agent can log into portals, download reports, update records, and navigate multi-step workflows – all while maintaining session state through cookie and localStorage management. The vibium storage commands let you save and restore complete browser states between sessions.
4. AI Assistant Browser Integration
Build AI assistants that can actually do things on the web. A customer service AI can pull up order information by navigating admin panels. A research assistant can gather sources by browsing academic databases. A shopping assistant can find products and compare prices across vendors. Vibium's MCP server integration makes these capabilities available to agent frameworks with proper tool schemas and error handling.
5. Visual Monitoring and Screenshot Automation
Track visual changes on critical web pages automatically. Schedule Vibium to capture screenshots of dashboards, competitor sites, or your own applications. Use the vibium geolocation command to test location-specific content, and vibium media --color-scheme dark to verify dark mode implementations. Combine with AI image analysis for intelligent change detection.
Step-by-Step Installation & Setup Guide
Global CLI Installation
Start by installing Vibium globally via npm. This downloads the binary and Chrome automatically:
# Install Vibium CLI and download Chrome
npm install -g vibium
This command performs several actions:
- Downloads the ~10MB Vibium binary for your platform
- Fetches Chrome for Testing and matching chromedriver
- Stores browser binaries in your platform cache:
- Linux:
~/.cache/vibium/ - macOS:
~/Library/Caches/vibium/ - Windows:
%LOCALAPPDATA%\vibium\
- Linux:
Adding Vibium as an AI Skill
Transform your AI agent into a browser operator by installing Vibium as a skill:
# Install the vibium skill for AI agents
npx skills add https://github.com/VibiumDev/vibium --skill vibe-check
This creates a skill manifest in {project}/.agents/skills/vibium/ containing 81 tool definitions. Your agent can now discover and execute commands like vibium go, vibium click, and vibium screenshot through natural language.
MCP Server Configuration
For structured tool use with Claude Code or Gemini, set up the MCP server:
# For Claude Code
claude mcp add vibium -- npx -y vibium mcp
# For Gemini CLI
gemini mcp add vibium npx -y vibium mcp
The MCP server exposes Vibium's capabilities through the Model Context Protocol, providing agents with typed function definitions, parameter validation, and structured error responses.
Client Library Installation
For programmatic access, install the client library in your project:
# JavaScript/TypeScript
npm install vibium
# Python
pip install vibium
To skip automatic browser download (if you manage browsers separately):
VIBIUM_SKIP_BROWSER_DOWNLOAD=1 npm install vibium
Verification
Confirm installation by checking the version and available commands:
vibium --version
vibium --help
You should see all 81 commands listed, ready for immediate use.
REAL Code Examples from the Repository
Example 1: Complete CLI Command Reference
Vibium's CLI provides 81 commands covering every browser interaction. Here's the comprehensive quick reference from the official documentation:
# Core navigation and interaction
vibium go https://example.com # Navigate to URL
vibium click "a" # Click element by CSS selector
vibium fill "input" "hello" # Clear and fill input field
vibium type "input" "hello" # Type into element (append)
vibium screenshot -o page.png # Capture full page screenshot
vibium eval "document.title" # Execute JavaScript in page context
# Data extraction
vibium text # Get all visible page text
vibium url # Get current page URL
vibium title # Get page title
# Viewport and window management
vibium viewport # Get current viewport dimensions
vibium viewport 1920 1080 # Set viewport size (width height)
vibium window # Get window dimensions
vibium window --state maximized # Maximize browser window
# Configuration and overrides
vibium geolocation 40.7 -74.0 # Override geolocation (lat long)
vibium content "<h1>Hi</h1>" # Replace entire page HTML
vibium media --color-scheme dark # Override CSS media queries
# State verification
vibium is visible "h1" # Check if element is visible
vibium is enabled "button" # Check if element is enabled
# Element location strategies
vibium find "a" # Find first element by CSS selector
vibium find "a" --all # Find all matching elements
vibium find text "Sign In" # Find element by exact text match
vibium find role button # Find element by ARIA role
# Waiting strategies
vibium wait ".loaded" # Wait for element to appear
vibium wait url "/dashboard" # Wait for URL to contain string
vibium wait text "Welcome" # Wait for text to appear
vibium wait load # Wait for page load event
# Advanced interactions
vibium page new https://example.com # Open new browser tab/page
vibium page switch 1 # Switch to page by index
vibium mouse click 100 200 # Click at specific coordinates
vibium scroll into-view "#footer" # Scroll element into viewport
# Session management
vibium cookies # Get all cookies as JSON
vibium cookies "session" "abc123" # Set a cookie (name value)
vibium storage # Export full storage state
vibium storage restore state.json # Restore state from file
Each command follows a consistent pattern: vibium <action> <target> <options>, making them easily discoverable by AI agents.
Example 2: JavaScript Synchronous API
For scripts that don't require async/await, Vibium offers a synchronous API that blocks until each operation completes:
// Import the synchronous API (CommonJS style)
const fs = require('fs')
const { browser } = require('vibium/sync')
// Start browser instance (blocks until ready)
const bro = browser.start()
// Create a new page/tab
const vibe = bro.page()
// Navigate to URL (blocks until page loads)
vibe.go('https://example.com')
// Capture screenshot as PNG buffer
const png = vibe.screenshot()
// Save screenshot to file
fs.writeFileSync('screenshot.png', png)
// Find first anchor element
const link = vibe.find('a')
// Click the link (blocks until navigation completes)
link.click()
// Clean up: stop browser instance
bro.stop()
Key points:
- Synchronous API is perfect for simple scripts and linear automation flows
- Each method call blocks until the operation completes or times out
- No callbacks or promises needed – straightforward imperative code
- Automatically manages browser lifecycle
Example 3: JavaScript Asynchronous API
For modern applications and concurrent operations, use the async API with Promises:
// Import the asynchronous API (ES Module style)
import { browser } from 'vibium'
import { writeFile } from 'fs/promises'
async function automate() {
// Start browser asynchronously
const bro = await browser.start()
// Create page instance
const vibe = await bro.page()
// Navigate with await
await vibe.go('https://example.com')
// Take screenshot asynchronously
const png = await vibe.screenshot()
// Save file using async fs
await writeFile('screenshot.png', png)
// Find and click link
const link = await vibe.find('a')
await link.click()
// Graceful shutdown
await bro.stop()
}
// Run the automation
automate().catch(console.error)
Key points:
- Async API enables non-blocking operations and parallel execution
- Essential for web servers, concurrent tasks, and responsive applications
- Uses modern ES Modules and async/await syntax
- Same functionality as sync API but with Promise-based flow
Example 4: Python Synchronous API
Python developers get an equally elegant synchronous interface that feels native:
# Import the synchronous browser API
from vibium import browser
# Start browser instance
bro = browser.start()
# Create page object
vibe = bro.page()
# Navigate to URL
vibe.go("https://example.com")
# Capture screenshot as bytes
png = vibe.screenshot()
# Save to file
with open("screenshot.png", "wb") as f:
f.write(png)
# Find element by CSS selector
link = vibe.find("a")
# Click the link
link.click()
# Stop browser
bro.stop()
Key points:
- Clean, Pythonic API with no async/await complexity
- Perfect for Jupyter notebooks, data scripts, and simple automation
- Methods return Python objects and primitives, not complex wrappers
- Automatic resource cleanup with context managers (optional)
Example 5: Python Asynchronous API
For asyncio-based applications and high-performance scraping:
import asyncio
from vibium.async_api import browser
async def main():
# Start browser asynchronously
bro = await browser.start()
# Get page instance
vibe = await bro.page()
# Navigate with await
await vibe.go("https://example.com")
# Take screenshot
png = await vibe.screenshot()
# Write file
with open("screenshot.png", "wb") as f:
f.write(png)
# Find and click element
link = await vibe.find("a")
await link.click()
# Shutdown browser
await bro.stop()
# Run the async event loop
asyncio.run(main())
Key points:
- Native asyncio support for Python 3.7+
- Enables concurrent browser automation tasks
- Ideal for FastAPI, aiohttp, and other async frameworks
- Same method names as sync API for easy migration
Advanced Usage & Best Practices
Headless Production Deployment
For server environments, run Chrome in headless mode:
# Set environment variable before starting
export VIBIUM_HEADLESS=1
vibium go https://example.com
Custom Browser Paths
If you manage browsers separately, specify custom paths:
export VIBIUM_BROWSER_PATH=/path/to/chrome
export VIBIUM_DRIVER_PATH=/path/to/chromedriver
npm install vibium
Parallel Execution
Launch multiple isolated browser instances for concurrent tasks:
// JavaScript async parallel execution
const bro1 = await browser.start()
const bro2 = await browser.start()
const [page1, page2] = await Promise.all([
bro1.page(),
bro2.page()
])
await Promise.all([
page1.go('https://site1.com'),
page2.go('https://site2.com')
])
Session Persistence
Save and restore complete browser sessions including cookies, localStorage, and sessionStorage:
# Export current state
vibium storage > session.json
# Later, restore it
vibium storage restore session.json
vibium go https://dashboard.example.com # Already logged in!
Robust Waiting Strategies
Always prefer explicit waits over sleep timers:
// Bad: Flaky and slow
await new Promise(r => setTimeout(r, 5000))
// Good: Reliable and fast
await vibe.wait('.results-loaded') // Waits exactly as long as needed
Element Location Best Practices
Use semantic locators over brittle CSS selectors:
# Prefer ARIA roles and text
vibium find role button "Submit"
vibium find text "Add to Cart"
# Avoid brittle XPath or complex selectors
# Bad: vibium find "div:nth-child(3) > .btn.primary"
Comparison: Vibium vs. Traditional Tools
| Feature | Vibium | Playwright | Selenium | Puppeteer |
|---|---|---|---|---|
| AI Agent Integration | ✅ Native skill system (81 tools) | ❌ Manual tool definition | ❌ Manual tool definition | ❌ Manual tool definition |
| Protocol | WebDriver BiDi (modern standard) | Custom CDP-based | JSON Wire Protocol (legacy) | Chrome DevTools Protocol |
| Browser Setup | Zero-config auto-download | Auto-download available | Manual driver management | Bundled Chromium |
| Binary Size | ~10MB (ultra-lightweight) | ~50MB+ | Runtime dependencies | ~300MB (full Chromium) |
| Client Languages | JS/TS, Python (sync + async) | JS/TS, Python, Java, .NET | Multi-language (verbose APIs) | JS/TS only |
| MCP Server | ✅ Built-in | ❌ Third-party only | ❌ No | ❌ No |
| CLI Interface | ✅ 81 native commands | ❌ Requires custom scripts | ❌ Limited CLI | ❌ Limited CLI |
| Standard Compliance | ✅ Web standard (no lock-in) | ❌ Microsoft-controlled | ❌ Proprietary extensions | ❌ Google-controlled |
| Learning Curve | Minimal (intuitive commands) | Moderate (complex API) | Steep (verbose setup) | Moderate (CDP knowledge) |
| Use Case Focus | AI agents & humans | General automation | Legacy enterprise testing | Chrome-specific tasks |
Why Choose Vibium?
For AI Development: No other tool offers native skill installation that teaches your agent 81 browser commands instantly. The MCP server integration provides structured tool definitions that LLMs understand natively.
For Modern Standards: WebDriver BiDi ensures your automation won't break when browser vendors update their protocols. You're building on open web standards, not corporate APIs.
For Simplicity: The CLI interface means you can automate browsers with bash scripts, Makefiles, and cron jobs without writing any code. Commands are self-documenting and follow predictable patterns.
For Performance: The 10MB binary starts in milliseconds and consumes minimal resources. Perfect for serverless functions, containerized deployments, and edge computing.
Frequently Asked Questions
Q: Does Vibium support browsers other than Chrome?
A: Currently, Vibium focuses on Chrome for Testing via WebDriver BiDi. This ensures perfect protocol compliance and reliable automation. Firefox WebDriver BiDi support is planned for a future release.
Q: Can I use Vibium in production CI/CD pipelines?
A: Absolutely! Vibium's lightweight design and headless mode make it ideal for CI/CD. Set VIBIUM_HEADLESS=1 and use the storage commands to handle authentication states between runs.
Q: How does Vibium handle dynamic content and SPAs?
A: Vibium excels with modern web apps. Use vibium wait commands to wait for elements, text, or network conditions. The WebDriver BiDi protocol enables listening for DOM mutations and network events in real-time.
Q: Is the MCP server compatible with all LLM agents?
A: The MCP server follows the Model Context Protocol specification, making it compatible with Claude Code, Gemini CLI, and any MCP-compliant agent framework. It provides structured schemas for all 81 tools.
Q: What's the difference between fill and type commands?
A: vibium fill clears the input before typing, perfect for forms. vibium type appends text to existing content, useful for rich text editors and incremental input.
Q: Can I run multiple browser instances simultaneously?
A: Yes! Each browser.start() call creates an isolated Chrome instance with separate cookies, cache, and storage. Run dozens of parallel instances for high-throughput automation.
Q: How do I debug when something goes wrong?
A: Run with VIBIUM_DEBUG=1 for verbose logging. The browser runs visible by default, so you can watch automation live. Use vibium screenshot liberally to capture state at each step.
Conclusion: The Future of Browser Automation is Here
Vibium represents a paradigm shift in browser automation. By prioritizing AI agent integration, embracing modern web standards, and eliminating configuration complexity, it delivers a developer experience that feels like magic. The ability to teach your LLM 81 browser tools through a single skill installation is revolutionary – turning passive AI assistants into active digital workers.
The WebDriver BiDi foundation ensures longevity and standard compliance, while the multi-interface design (CLI, MCP, JS/TS, Python) provides unmatched flexibility. Whether you're building AI agents, automating tests, or orchestrating complex web workflows, Vibium's lightweight architecture and intuitive API slash development time from hours to minutes.
What truly sets Vibium apart is its zero-config philosophy. In a world where developers waste countless hours on browser driver setup, Vibium just works. The auto-downloading Chrome for Testing, the 10MB binary, the visible-by-default debugging – every design decision prioritizes developer productivity.
The repository is actively maintained by VibiumDev with a clear roadmap including Java client support, a Cortex memory layer for intelligent navigation, and Retina recording extensions. The Apache 2.0 license means you can use it freely in commercial projects.
Ready to supercharge your AI agents with browser superpowers?
🚀 Get started with Vibium today – zero to hello world in 5 minutes. Install the CLI, add the skill, and watch your agents conquer the web. The future of automation is standard-based, AI-native, and unbelievably simple. That's Vibium.