A Rust-based cross-platform CLI tool that right-sizes LLM models to your system's RAM, CPU, and GPU by detecting specs and recommending optimal models and quantization strategies. Covers 206 models from 57 providers.
Overview#
llmfit is a cross-platform CLI tool written in Rust designed to solve hardware compatibility issues when running LLMs locally. It automatically detects system resources (CPU, GPU including NVIDIA/AMD/Intel/Apple Silicon, and RAM), scores a database of 206 models across quality, speed, fit, and context dimensions, and recommends optimal quantization levels.
Core Capabilities#
Hardware Detection#
- CPU: Core count via sysinfo
- RAM: Total and available memory
- GPU Support:
- NVIDIA: Via nvidia-smi, multi-GPU configurations supported
- AMD: Via rocm-smi
- Intel Arc: Discrete via sysfs, integrated via lspci
- Apple Silicon: Unified memory via system_profiler
- Backend Detection: Auto-identifies CUDA/Metal/ROCm/SYCL acceleration
Model Recommendation#
- Model Database: 206 models, 57 providers (Meta Llama, Mistral, Qwen, Google Gemma, Microsoft Phi, DeepSeek, xAI Grok, etc.)
- Dynamic Quantization: From Q8_0 (best quality) to Q2_K (highest compression)
- MoE Architecture Support: Auto-detects Mixtral, DeepSeek-V2/V3
Multi-dimensional Scoring (0-100)#
- Quality: Parameter count, model family reputation, quantization penalty
- Speed: Estimated via K/params_b × quant_speed_multiplier formula
- Fit: Memory utilization efficiency (optimal: 50-80%)
- Context: Context window capability
Fit Levels#
- Perfect: Recommended memory fully meets GPU requirements
- Good: Fits with headroom, optimal for MoE offload
- Marginal: Tight fit, or CPU-only execution
- Too Tight: Insufficient hardware resources
Interface Modes#
- TUI Mode: Interactive terminal interface with search, sort, theme switching (6 built-in themes)
- CLI Mode: Pure command-line output for scripting
- JSON Output: For Agent or programmatic integration
Installation#
# One-liner (macOS / Linux)
curl -fsSL https://llmfit.axjns.dev/install.sh | sh
# Homebrew (macOS)
brew tap AlexsJones/llmfit
brew install llmfit
# Cargo (Universal)
cargo install llmfit
Common Commands#
llmfit # TUI mode (default)
llmfit --cli # CLI table mode
llmfit fit --perfect -n 5 # Top 5 perfect-fit models
llmfit system # Show detected hardware specs
llmfit recommend --json # JSON format recommendations
llmfit search "llama 8b" # Search specific model
llmfit --memory=24G # Override GPU memory detection
Ollama Integration#
- Auto-detects local Ollama instance (localhost:11434)
- Supports remote instances:
OLLAMA_HOST="http://192.168.1.100:11434" llmfit - Press
din TUI to pull models directly
Speed Estimation Constants#
| Backend | K Value |
|---|---|
| CUDA | 220 |
| Metal | 160 |
| ROCm | 180 |
| SYCL | 100 |
| CPU ARM | 90 |
| CPU x86 | 70 |
Supported Model Categories#
General-purpose, Code models (CodeLlama, StarCoder2, Qwen2.5-Coder), Reasoning models (DeepSeek-R1, Orca-2), Multimodal/Vision models (Llama 3.2 Vision, Qwen2.5-VL), Chat models, Embedding models