PromptHub
Developer Tools Artificial Intelligence

UI Act: The Secret Linux Tool That Lets AI Use Its Own Mouse

B

Bright Coding

Author

13 min read
31 views
UI Act: The Secret Linux Tool That Lets AI Use Its Own Mouse

UI Act: The Secret Linux Tool That Lets AI Use Its Own Mouse

What if your AI assistant didn't hijack your cursor? Imagine typing a prompt, leaning back, and watching an intelligent agent navigate your desktop—while you keep working. No frozen mouse. No interrupted workflow. No staring at a screen you can't touch for twenty minutes.

Here's the brutal truth most developers have accepted: computer automation tools are selfish. They grab your input devices, lock your screen, and force you to wait. Whether it's legacy RPA software, clunky VNC setups, or basic scripting tools, the experience feels like lending your car to someone who refuses to let you ride shotgun.

But what if there was another way?

Enter UI Act—a free, open-source Computer Use agent for Linux that operates using its own mouse and keyboard. Built by Tobias Norlund and powered by Anthropic's computer use capabilities, UI Act leverages a decades-old but underutilized X11 feature called Multi-Pointer X (MPX) to create something genuinely revolutionary: true parallel human-AI collaboration on the same desktop.

This isn't science fiction. This isn't a concept demo. This is installable today on Ubuntu Desktop 24.04, and it's about to change how you think about AI automation forever.

Ready to discover why top Linux developers are quietly switching to this approach? Let's dive deep.


What is UI Act?

UI Act is a Computer Use/GUI agent software designed specifically for Linux desktop environments. Unlike conventional automation tools that monopolize your input devices, UI Act creates separate virtual mouse and keyboard devices through which the AI agent operates—leaving your physical hardware entirely free for your own use.

Created by Tobias Norlund, a developer focused on practical AI tooling, UI Act represents a philosophical shift in how we design human-AI interfaces. The project is completely free and open source under the Apache License 2.0, with no data harvesting: you bring your own Anthropic API key, keeping full control over your prompts, screenshots, and credentials.

The tool's architecture centers on Multi-Pointer X (MPX), a feature of the X windowing system that enables multiple independent mouse pointers on a single display. Originally developed by Peter Hutterer in 2008, MPX has languished in relative obscurity—primarily used in niche multi-user scenarios and research projects. UI Act repurposees this capability for AI agents, creating a dedicated "input master" that the agent controls while your original pointer remains yours alone.

Why is this trending now? Three converging forces:

  • Anthropic's Computer Use API (launched October 2024) finally made capable GUI agents accessible to developers
  • Linux desktop adoption is accelerating among technical professionals seeking control and privacy
  • The productivity paradox: existing tools are too disruptive for real workflows, creating demand for seamless integration

UI Act sits at this intersection, solving the user experience problem that threatens to bottleneck Computer Use adoption. It's not just an agent—it's a collaboration framework.

Note: An earlier project named "UI Act" (a computer use model) has been moved to a separate repository. The current ui-act is the active agent software.


Key Features That Set UI Act Apart

Let's dissect what makes UI Act technically distinctive and practically powerful:

True Parallel Operation via MPX

The headline feature: UI Act creates a separate xinput master device with its own virtual pointer and keyboard. You can literally watch two cursors on your screen—yours and the agent's—moving independently. This isn't simulated; it's genuine hardware-level separation through Linux's UInput subsystem and X11's MPX extension.

Dual Operating Modes

  • Full Desktop Mode: The agent sees and can interact with your entire screen—every window, every application
  • Single Window Mode: Restricted scope for focused tasks. The target window is automatically set to "Always on top" to prevent obstruction, making this ideal for delegated workflows where you want the agent contained

GNOME Shell Integration

A native GNOME extension provides:

  • Hotkey activation: CTRL + Space launches the agent instantly
  • Top panel indicator: Visual status and quick access
  • Settings GUI: Configure your API key without touching config files

Transparent Telemetry with Opt-Out

Anonymous usage statistics are collected by default for product improvement, but zero user data (prompts, screenshots, API keys) is transmitted. Skeptical? The telemetry code is open and auditable. Disable entirely with --no-telemetry.

Reasoning Visibility

The CLI prints the agent's step-by-step reasoning in real-time. You're not watching a black box—you're observing a thought process, with natural breakpoints for clarification or course correction.

Interrupt Safety

CTRL+C immediately halts the agent. Combined with the physical separation of input devices, this creates multiple safety layers against runaway automation.

Privacy-First Architecture

No cloud service to sign up for. No data retention policy to parse. Your API key, your compute, your control.


Real-World Use Cases Where UI Act Dominates

1. Research and Data Gathering

You're preparing a competitive analysis. Instead of manually browsing competitor websites, copying prices, and organizing screenshots, you prompt UI Act: "In the open browser, go to Amazon and find me some Ray-Ban Meta Glasses"—then continue writing your executive summary while the agent navigates, extracts, and reports.

2. Cross-Application Workflows

Modern knowledge work spans dozens of tools. UI Act can bridge them: "Open the Q3 spreadsheet, copy the revenue figures, paste them into the presentation template, and export as PDF." You review Slack messages while it executes.

3. UI Testing and QA

Developers can delegate repetitive manual testing: "Click through the onboarding flow, fill random valid data, and report any error states or console warnings." Your hands stay on your own keyboard for debugging the issues found.

4. Accessibility Assistance

For users with repetitive strain injuries or motor limitations, UI Act enables voice-activated (via speech-to-text input) complex multi-step computer operations without requiring specialized accessibility infrastructure.

5. Long-Running Administrative Tasks

System configuration, batch file organization, or media processing that requires GUI interaction: start the agent, verify its initial steps, then physically walk away knowing your own workspace remains usable if you return early.


Step-by-Step Installation & Setup Guide

UI Act currently supports Ubuntu Desktop 24.04 and later, distributed as a Debian package. Follow precisely:

Prerequisites Check

First, verify you're running X11, not Wayland:

echo $XDG_SESSION_TYPE

If it prints wayland, switch before proceeding:

  1. Log out completely
  2. Click your username to activate the password field
  3. Click the gear icon (bottom right)
  4. Select "Ubuntu on Xorg"
  5. Log in

Download and Install

# Fetch the latest .deb release using GitHub's API
curl -s https://api.github.com/repos/TobiasNorlund/ui-act/releases/latest \
  | jq -r '.assets[] | select(.name | endswith(".deb")) | .browser_download_url' \
  | xargs wget

# Install the downloaded package
sudo apt install ./ui-act_*.deb

Permissions Configuration

The agent requires UInput device creation privileges:

# Add your user to the "input" group
sudo usermod -aG input $USER

Critical: Log out and log back in for group membership to take effect.

Enable GNOME Extension

# Activate the UI Act GNOME shell extension
gnome-extensions enable ui-act@tobiasnorlund.github.com

# Open settings to configure your API key
gnome-extensions prefs ui-act@tobiasnorlund.github.com

Enter your Anthropic API key in the settings window that appears.

Verification

Press CTRL + Space. A prompt dialog should appear. Type something simple like "Open the calculator" and observe the agent operating with its own visible cursor.


REAL Code Examples from the Repository

Let's examine actual implementation patterns from UI Act's codebase and documentation.

Example 1: Basic CLI Invocation

The simplest way to run UI Act is directly from terminal:

# Full desktop mode - agent sees everything and can click anywhere
ui-act "Open Firefox, navigate to news.ycombinator.com, and summarize the top three stories"

What's happening under the hood:

  1. UI Act initializes the Anthropic client with your stored API key
  2. Creates a new xinput master named "UI Act pointer" via X11's XI2 extension
  3. Attaches virtual UInput mouse and keyboard devices to this master
  4. Begins screenshot-capture → LLM reasoning → action execution loop
  5. Your physical mouse remains on its original master, completely unaffected

Example 2: Single Window Mode with Explicit Window ID

For contained, focused automation:

# First, obtain the window ID of your target application
xwininfo
# Click on the target window; note the "Window id:" output (e.g., 0x3a00003)

# Run agent restricted to that window only
ui-act --window 0x3a00003 "Fill out the contact form with test data and submit"

Technical significance: The --window parameter triggers single-window mode, where:

  • Screenshots are cropped to the window's geometry
  • Click coordinates are translated relative to window position
  • The window receives _NET_WM_STATE_ABOVE (Always on Top) via EWMH to prevent accidental occlusion
  • The agent cannot accidentally interact with other applications

This is the mode used by the GNOME extension's CTRL + Space shortcut.

Example 3: Advanced CLI with All Options

# Specify model, disable telemetry, target specific window
ui-act \
  --window 0x4c00002 \
  --model claude-opus-4-6 \
  --no-telemetry \
  "Compare the prices in this spreadsheet with the website open in the other window"

Parameter breakdown:

  • --model claude-opus-4-6: Overrides default model selection; useful for cost/performance tradeoffs or testing newer Anthropic releases
  • --no-telemetry: Guarantees zero network traffic beyond Anthropic API calls—essential for air-gapped or compliance-sensitive environments
  • The positional prompt accepts natural language with complex multi-step instructions

Example 4: Manual Cleanup After Agent Exit

Due to application compatibility constraints, the xinput master persists after agent completion:

# Identify the UI Act master device ID
master_id=$(xinput list | grep "UI Act pointer" | grep -o 'id=[0-9]*' | cut -d= -f2)

# Remove it to restore normal keyboard behavior in all applications
xinput remove-master $master_id

Why this matters: Some applications (notably Chrome) may stop receiving keyboard input when orphaned xinput masters exist. This cleanup script belongs in your shell aliases or a desktop automation trigger for seamless operation.

Example 5: Monitoring Active Input Devices

Debug or observe MPX behavior in real-time:

# Watch xinput list update every 2 seconds
watch xinput

When UI Act runs, you'll see entries like:

⎜   ↳ UI Act pointer                          id=15   [floating slave]
⎜   ↳ UI Act keyboard                         id=16   [floating slave]

This confirms the virtual device creation and attachment to the floating slave hierarchy—distinct from your physical Virtual core pointer.


Advanced Usage & Best Practices

Optimize Prompt Engineering

UI Act's effectiveness depends on prompt clarity. Structure instructions with:

  • Explicit starting states: "In the already-open Firefox window..."
  • Verification checkpoints: "After logging in, confirm you see the dashboard before proceeding"
  • Failure handling: "If a CAPTCHA appears, stop and ask for direction"

Window Management Strategy

For complex multi-application workflows, pre-arrange windows before invoking the agent. Single-window mode reduces cognitive load for the LLM and prevents accidental misclicks.

API Key Rotation

Since keys are stored in GNOME extension preferences, implement regular rotation. The settings are stored in dconf; back up or migrate with:

dconf dump /org/gnome/shell/extensions/ui-act/ > ui-act-backup.txt

Telemetry Audit

Review exactly what's transmitted:

# Clone and inspect the telemetry implementation
git clone https://github.com/TobiasNorlund/ui-act.git
cat ui-act/ui_act/src/telemetry.rs

Headless/Background Operation (Future-Proofing)

The roadmap includes Xephyr support for true background agents. For now, consider running UI Act in a nested X session manually:

Xephyr :1 -screen 1920x1080 &
DISPLAY=:1 ui-act "Your prompt here"

This previews upcoming functionality and isolates agent operations entirely.


Comparison with Alternatives

Feature UI Act Traditional RPA (Selenium/Playwright) macOS Computer Use Virtual Machine Agents
Parallel human use ✅ Native via MPX ❌ Blocks or requires VNC ❌ Cursor hijacking ✅ But requires full VM
Linux native ✅ Yes ⚠️ Partial ❌ macOS only ✅ Any host
Open source ✅ Apache 2.0 ⚠️ Mixed ❌ Proprietary ⚠️ Mixed
Setup complexity Low (single .deb) Medium-High Low (built-in) High
Real desktop integration ✅ Seamless ❌ Browser-limited ✅ Native ❌ Isolated
Cost Free (BYO API key) Free Paid (Claude subscription) Infrastructure costs
Multi-window workflows ✅ Full desktop mode ❌ Browser only
Privacy ✅ Local processing ❌ Cloud-dependent

Verdict: UI Act uniquely combines native Linux integration, genuine parallel operation, open-source transparency, and zero vendor lock-in. The tradeoff is X11 dependency and current single-user design.


FAQ: Common Developer Concerns

Does UI Act work on Wayland?

No—MPX is X11-specific. You must run Ubuntu on Xorg. Check with echo $XDG_SESSION_TYPE and switch at the login screen if needed.

Is my data sent to any servers besides Anthropic?

Only anonymous usage statistics (opt-out with --no-telemetry). Zero prompts, screenshots, or keys leave your machine otherwise. Verify in the open-source telemetry code.

Can I use OpenAI or local models instead of Anthropic?

Not yet. Anthropic is the sole supported backend, but the roadmap explicitly lists OpenAI and self-hosted models as planned contributions.

What happens if the agent goes wrong?

Press CTRL+C in the terminal to interrupt immediately. The separate input devices mean the agent cannot physically prevent you from regaining control.

Why does Chrome stop accepting keyboard input after UI Act runs?

A known issue with orphaned xinput masters. Remove manually with the xinput remove-master command shown in the code examples above.

Can I run multiple agents simultaneously?

Theoretically possible with multiple xinput masters, but untested and not officially supported. The GNOME extension launches single-window mode only.

Is this suitable for production automation?

UI Act is currently best for interactive assistance and productivity augmentation. The roadmap includes guardrails and background operation for eventual production reliability.


Conclusion: The Future of Human-AI Collaboration Starts Here

UI Act isn't just another automation tool—it's a fundamental reimagining of how intelligent agents should integrate into human workflows. By exploiting the underutilized MPX capability of X11, Tobias Norlund has created something that feels obvious in retrospect yet revolutionary in practice: true shared desktop operation without sacrifice.

The current limitations—X11 dependency, manual cleanup, single-backend support—are real but temporary. The core architecture is sound, the code is open, and the roadmap is ambitious. For Linux developers and technical professionals who refuse to choose between AI assistance and uninterrupted productivity, UI Act represents the path forward.

My take? Install it this week. Experiment with single-window delegation. Contribute to the roadmap if you have vision. And watch closely as this approach inevitably influences how macOS and Windows evolve their own computer use implementations.

The agents are coming. UI Act ensures they knock before entering—and bring their own equipment.

👉 Star UI Act on GitHub, install the .deb, and press CTRL + Space to experience the future of Linux automation.


Found this breakdown valuable? Share it with your Linux automation circle, and subscribe for deep dives into emerging developer tools that actually ship.

Comments (0)

Comments are moderated before appearing.

No comments yet. Be the first to share your thoughts!

Support us! ☕