llmfit: The Smart Tool Every AI Developer Needs
Tired of downloading 30GB models only to discover they won't run on your machine? You're not alone. Every day, thousands of developers waste hours—and bandwidth—trying to figure out which LLM will actually work with their RAM, CPU, and GPU specs. The frustration is real. The trial-and-error is painful. But what if you could know instantly which models fit your hardware perfectly?
Enter llmfit, a revolutionary terminal tool that ends the guesswork forever. This powerful Rust-based CLI analyzes your system in seconds and matches you with hundreds of LLM models across multiple providers. No more manual calculations. No more disappointing crashes. Just pure, data-driven recommendations tailored to your exact setup.
In this deep dive, you'll discover how llmfit transforms local AI development, explore its game-changing features, and get hands-on with real code examples. We'll walk through installation, master the interactive TUI, unlock advanced CLI patterns, and reveal pro tips that make local LLM deployment effortless. Ready to stop guessing and start running? Let's go.
What Is llmfit and Why It's Changing the Game
llmfit is a blazing-fast command-line tool written in Rust that intelligently matches Large Language Models to your hardware specifications. Created by Alex Jones, this open-source utility tackles one of the most annoying problems in modern AI development: hardware compatibility uncertainty.
The tool ships with a comprehensive database of hundreds of models from providers like Ollama, llama.cpp, MLX, and Docker Model Runner. It detects your system's CPU cores, RAM capacity, GPU name, and VRAM, then scores each model across four critical dimensions: quality, speed, fit, and context window. The result? A ranked list of models that will actually run well on your machine—not just technically launch, but perform optimally.
What makes llmfit especially powerful is its dual-interface design. Launch it without arguments, and you get a sleek, interactive terminal UI (TUI) with Vim-inspired keybindings. Need automation? The classic CLI mode outputs JSON for easy scripting and integration into larger workflows. This flexibility makes it perfect for both interactive exploration and headless server deployments.
The tool has exploded in popularity because it solves a genuine pain point. As local LLM adoption skyrockets, developers face an overwhelming number of models, quantization options, and hardware requirements. llmfit cuts through the noise with one simple command. It's become an essential part of the modern AI developer's toolkit, joining the ranks of tools like Ollama and Hugging Face CLI—but with a laser focus on hardware-aware model selection.
Key Features That Make llmfit Essential
Hardware Auto-Detection: llmfit automatically scans your system on startup. It identifies CPU architecture, core count, total system RAM, GPU models, and available VRAM. This happens in under a second, giving you an accurate baseline for all recommendations.
Intelligent Scoring Algorithm: Each model receives a composite score based on multiple factors. The algorithm considers not just whether a model fits in memory, but whether it belongs there. A 70B parameter model might technically load on your 32GB RAM system, but llmfit will flag it as "Marginal" and recommend smaller, faster alternatives that deliver better real-world performance.
Multi-Dimensional Analysis: Beyond simple memory checks, llmfit evaluates:
- Quality: Model capability and benchmark performance
- Speed: Estimated tokens per second for your specific hardware
- Fit: Perfect, Good, Marginal, or Unrunnable categorization
- Context: Maximum context window supported
Dynamic Quantization Selection: The tool doesn't just tell you which model to run—it tells you how to run it. For each model, llmfit recommends the optimal quantization level (Q4_0, Q8_0, MLX-4bit, etc.) that balances quality and performance for your exact specs.
Provider Ecosystem Support: llmfit integrates deeply with the local LLM ecosystem. It detects installed models from Ollama, suggests llama.cpp-compatible GGUF files, recommends MLX-optimized versions for Apple Silicon, and even supports Docker Model Runner. This provider-aware approach ensures you get the most native, performant version of each model.
Advanced TUI with Vim Keys: The interactive interface is a productivity powerhouse. Navigate with j/k, search with /, filter with f, and enter Visual mode (v) for bulk operations. The UI includes Plan mode for hardware planning, compare views for model analysis, and six built-in color themes.
REST API Server: Run llmfit serve to expose a node-level HTTP API. This enables cluster schedulers, Kubernetes operators, and automation pipelines to query model recommendations programmatically. Perfect for building AI infrastructure at scale.
Multi-GPU and MoE Awareness: For power users, llmfit understands complex setups. It properly calculates memory requirements for Mixture-of-Experts models and supports multi-GPU configurations, showing which models can be sharded across devices.
Real-World Use Cases Where llmfit Shines
1. The Local AI Enthusiast
You've got a MacBook Pro with M3 Max and 64GB unified memory. You want to run the best possible model for coding assistance. Instead of manually researching MLX-compatible models and their memory footprints, you simply run llmfit. The tool instantly shows that Qwen3-30B-A3B runs at "Perfect" fit with MLX-4bit quantization, delivering 45 tok/s. You press d to download directly through MLX, and you're coding with AI in minutes—not hours.
2. The Enterprise DevOps Engineer
Your company runs a Kubernetes cluster with heterogeneous GPU nodes. You need to schedule LLM workloads efficiently. By deploying llmfit as a DaemonSet with the REST API enabled, each node advertises its optimal models. Your scheduler queries http://node:8787/recommend?use-case=chat&limit=3 and gets JSON responses tailored to each machine's capabilities. This enables intelligent pod placement and prevents OOM kills.
3. The Researcher with Limited Resources
You're a graduate student with a single RTX 3060 (12GB VRAM) and 32GB system RAM. You need to run experiments across multiple model sizes but can't afford trial-and-error. llmfit recommend --use-case research --json gives you a ranked list of feasible models, showing that Phi-3-mini-4k runs at 85 tok/s with perfect fit, while Llama-3-8B requires CPU offloading and runs slower. You make informed decisions before downloading anything.
4. The AI Application Developer
You're building a RAG application that needs to run offline for privacy reasons. Your target deployment is a fleet of edge devices with varying specs. Using llmfit's CLI mode, you script a compatibility matrix: llmfit --cli --fit perfect | grep -E "(Model|Mem|Quant)". This generates a report showing which models fit within your 16GB RAM constraint, allowing you to standardize on Mistral-7B-Q5_K_M across all devices.
Step-by-Step Installation & Setup Guide
Getting started with llmfit takes less than two minutes. Choose your installation method based on your operating system and preferences.
Windows Installation
The fastest way on Windows is through Scoop:
scoop install llmfit
If you don't have Scoop installed, follow the official Scoop installation guide first. Scoop handles updates automatically, keeping you on the latest version.
macOS and Linux Installation
Option 1: Homebrew (Recommended)
brew install llmfit
Homebrew provides seamless updates and dependency management. This works on both macOS and Linux distributions that support Homebrew.
Option 2: Quick Install Script For a direct binary installation without package managers:
curl -fsSL https://llmfit.axjns.dev/install.sh | sh
This downloads the latest release from GitHub and installs to /usr/local/bin (or ~/.local/bin if sudo isn't available). For a user-local installation:
curl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local
Option 3: Docker / Podman Perfect for containerized workflows or testing without local installation:
# Basic run (outputs JSON recommendations)
docker run ghcr.io/alexsjones/llmfit
# With custom command and jq parsing
podman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'
Building From Source
For developers who want the latest features or need custom modifications:
git clone https://github.com/AlexsJones/llmfit.git
cd llmfit
cargo build --release
# Binary is at target/release/llmfit
You'll need Rust 1.70+ installed. Building from source ensures you have the absolute latest commits, including features that haven't been released yet.
Post-Installation Verification
After installation, verify everything works:
llmfit --version
llmfit system
The system command displays your detected hardware specs. If this matches your actual configuration, you're ready to start exploring models.
Real Code Examples from the Repository
Let's dive into practical examples straight from the llmfit README. These commands work exactly as shown—copy, paste, and start optimizing your local LLM setup.
Example 1: Basic TUI Launch and System Check
# Launch the interactive terminal UI (default behavior)
llmfit
# Display detected system specifications
llmfit system
Explanation: The first command launches llmfit's signature TUI, showing a scrollable table of models ranked by fit score. The second command prints your hardware specs in a clean format. This is your starting point—always run llmfit system first to confirm accurate hardware detection.
Example 2: CLI Mode for Scripting and Automation
# Get top 5 perfectly fitting models in JSON format
llmfit recommend --json --limit 5
# Filter recommendations for coding use case
llmfit recommend --json --use-case coding --limit 3
# Show only perfectly fitting models, limited to top 5
llmfit fit --perfect -n 5
Explanation: These commands unlock llmfit's automation potential. The --json flag outputs machine-readable data perfect for CI/CD pipelines. The --use-case filter narrows results to models optimized for specific tasks like coding, chat, research, or multimodal. The fit --perfect combination shows only models that will run flawlessly on your hardware.
Example 3: Hardware Planning with the Plan Command
# Plan hardware requirements for a specific model configuration
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192
# Plan with custom quantization
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --quant mlx-4bit
# Plan with performance target and JSON output
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --target-tps 25 --json
Explanation: The plan command inverts the typical workflow. Instead of asking "what fits my hardware?", it answers "what hardware do I need for this model?" This is invaluable for capacity planning and upgrade decisions. The --target-tps flag lets you specify desired tokens-per-second performance, and --json enables integration with infrastructure-as-code tools.
Example 4: REST API for Cluster Integration
# Start the REST API server
llmfit serve --host 0.0.0.0 --port 8787
Explanation: This single command transforms llmfit into a network service. When run on Kubernetes nodes or cluster workers, it exposes endpoints like /recommend, /system, and /plan. Your orchestrator can query these endpoints to make intelligent scheduling decisions. The API returns the same rich data as the CLI, enabling dynamic workload placement based on real-time hardware capabilities.
Example 5: Docker Integration for Containerized Workflows
# Run llmfit in Docker and parse output with jq
docker run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'
# Podman equivalent with provider filtering
podman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'
Explanation: These commands demonstrate llmfit's container-native design. The Docker image is lightweight and perfect for ephemeral environments. Piping to jq extracts just the model names, which you can feed into download scripts or configuration management tools. This pattern is ideal for GitHub Actions, GitLab CI, or any containerized deployment pipeline.
Advanced Usage & Best Practices
Leverage Visual Mode for Bulk Analysis: Press v in the TUI to enter Visual mode, then select a range of models with j/k. Press c to compare all selected models side-by-side. This is perfect for narrowing down candidates before committing to a download.
Theme Persistence for Team Consistency: Use t to cycle themes in the TUI. Your selection saves automatically to ~/.config/llmfit/theme. For teams, consider standardizing on a theme like Nord or Solarized and sharing the config file across machines for a consistent experience.
Combine Filters for Precision: In the TUI, use Select mode (V) to apply column-specific filters. For example, filter the Params column to show only 7-14B models, then filter Fit to "Perfect". This two-stage filtering quickly surfaces the sweet spot for your hardware.
Script with JSON and jq: For automation, always use --json output. Combine with jq for powerful queries: llmfit recommend --json --limit 10 | jq '.models[] | select(.fit == "perfect") | .name'. This extracts only perfectly fitting model names for scripting.
Plan Mode for Upgrade Decisions: Before buying new hardware, use llmfit plan on your target model. It shows minimum vs. recommended specs and upgrade deltas. This data-driven approach prevents overspending on unnecessary upgrades.
API Rate Limiting in Production: When running llmfit serve in production, implement rate limiting at your reverse proxy. The API is lightweight but can be hammered by aggressive schedulers. A simple nginx limit of 10 req/s per IP prevents abuse.
Cache Installed Models: Run llmfit --refresh (or press r in TUI) after installing new models through external tools. This ensures llmfit's "Installed" filter stays accurate, showing you which models are ready to run.
Comparison with Alternatives
| Feature | llmfit | Ollama CLI | Hugging Face CLI | LocalAI |
|---|---|---|---|---|
| Hardware Detection | ✅ Automatic | ❌ Manual | ❌ Manual | ❌ Manual |
| Model Scoring | ✅ Multi-dimensional | ❌ Basic list | ❌ None | ❌ None |
| Interactive TUI | ✅ Full-featured | ❌ Minimal | ❌ None | ❌ None |
| Quantization Advice | ✅ Per-model | ❌ Generic | ❌ User-driven | ❌ Generic |
| Multi-Provider | ✅ Ollama, llama.cpp, MLX, Docker | ✅ Ollama only | ✅ Hugging Face only | ✅ Multiple |
| Speed Estimation | ✅ Tok/s prediction | ❌ None | ❌ None | ❌ None |
| REST API | ✅ Built-in | ❌ Requires separate service | ❌ None | ✅ Yes |
| MoE Support | ✅ Advanced | ❌ Basic | ❌ Basic | ❌ Basic |
| Plan Mode | ✅ Hardware planning | ❌ None | ❌ None | ❌ None |
| Vim Keybindings | ✅ Full support | ❌ None | ❌ None | ❌ None |
Why Choose llmfit? While Ollama excels at running models and Hugging Face dominates model distribution, llmfit fills a critical gap: intelligent selection. It doesn't just list models—it curates them for your hardware. The TUI alone saves hours of manual research, and the scoring algorithm prevents the common mistake of running oversized models that thrash your system. For developers serious about local AI, llmfit is the essential planning layer that makes other tools more effective.
Frequently Asked Questions
Q: Does llmfit download models for me?
A: Yes! In the TUI, press d on any model to download it. llmfit will prompt you to choose a provider if multiple options exist. It integrates with Ollama, llama.cpp, and MLX download mechanisms.
Q: How accurate are the tok/s speed estimates? A: Estimates are based on benchmarking data from the community and provider-reported metrics. They're typically within 15-20% of real-world performance. Actual speed depends on system load, quantization, and context length.
Q: Can I add custom models to llmfit's database?
A: Currently, llmfit ships with a curated database updated regularly. For custom models, use the plan command to calculate requirements. Future versions may support custom model definitions via JSON configuration.
Q: Does it work on Apple Silicon? A: Absolutely! llmfit has first-class support for Apple Silicon. It detects MLX-compatible models and recommends optimal quantization levels specifically for M1/M2/M3 chips, leveraging unified memory architecture.
Q: How often should I update llmfit?
A: Update whenever you see new models released that interest you. The database updates frequently. If using Homebrew or Scoop, run brew upgrade llmfit or scoop update llmfit weekly to stay current.
Q: Can I use llmfit in CI/CD pipelines?
A: Yes! The --json flag and Docker image make it perfect for CI/CD. Use it to validate that your deployment environment can run required models, or to dynamically select models based on available hardware in ephemeral runners.
Q: What's the difference between "Runnable" and "Perfect" fit? A: "Runnable" means the model will load but may be slow or use swap. "Perfect" means it fits comfortably with headroom for context windows and system processes. Always aim for "Perfect" or "Good" for production use.
Conclusion: Your Local AI Journey Starts Here
llmfit isn't just another CLI tool—it's the missing bridge between AI potential and hardware reality. By eliminating guesswork, it empowers developers to focus on what matters: building amazing applications with local LLMs. The combination of intelligent scoring, a gorgeous TUI, and robust automation capabilities makes it indispensable for anyone serious about running AI offline.
Whether you're a researcher maximizing limited resources, a DevOps engineer orchestrating clusters, or a developer exploring local AI for the first time, llmfit delivers immediate value. The hardware-aware recommendations save time, prevent frustration, and optimize performance automatically.
The project's rapid adoption proves it solves a real, painful problem. As local LLMs become more capable and hardware more diverse, tools like llmfit will only grow in importance. It's the smart foundation every local AI stack needs.
Ready to find your perfect model? Install llmfit today with brew install llmfit or curl -fsSL https://llmfit.axjns.dev/install.sh | sh. Visit the GitHub repository to star the project, report issues, and join the growing community of developers who've stopped guessing and started building. Your hardware has a perfect match—let llmfit find it for you.