llmfit

A Rust-based cross-platform CLI tool that right-sizes LLM models to your system's RAM, CPU, and GPU by detecting specs and recommending optimal models and quantization strategies. Covers 206 models from 57 providers.

Overview#

llmfit is a cross-platform CLI tool written in Rust designed to solve hardware compatibility issues when running LLMs locally. It automatically detects system resources (CPU, GPU including NVIDIA/AMD/Intel/Apple Silicon, and RAM), scores a database of 206 models across quality, speed, fit, and context dimensions, and recommends optimal quantization levels.

Core Capabilities#

Hardware Detection#

CPU: Core count via sysinfo
RAM: Total and available memory
GPU Support:
- NVIDIA: Via nvidia-smi, multi-GPU configurations supported
- AMD: Via rocm-smi
- Intel Arc: Discrete via sysfs, integrated via lspci
- Apple Silicon: Unified memory via system_profiler
Backend Detection: Auto-identifies CUDA/Metal/ROCm/SYCL acceleration

Model Recommendation#

Model Database: 206 models, 57 providers (Meta Llama, Mistral, Qwen, Google Gemma, Microsoft Phi, DeepSeek, xAI Grok, etc.)
Dynamic Quantization: From Q8_0 (best quality) to Q2_K (highest compression)
MoE Architecture Support: Auto-detects Mixtral, DeepSeek-V2/V3

Multi-dimensional Scoring (0-100)#

Quality: Parameter count, model family reputation, quantization penalty
Speed: Estimated via K/params_b × quant_speed_multiplier formula
Fit: Memory utilization efficiency (optimal: 50-80%)
Context: Context window capability

Fit Levels#

Perfect: Recommended memory fully meets GPU requirements
Good: Fits with headroom, optimal for MoE offload
Marginal: Tight fit, or CPU-only execution
Too Tight: Insufficient hardware resources

Interface Modes#

TUI Mode: Interactive terminal interface with search, sort, theme switching (6 built-in themes)
CLI Mode: Pure command-line output for scripting
JSON Output: For Agent or programmatic integration

Installation#

# One-liner (macOS / Linux)
curl -fsSL https://llmfit.axjns.dev/install.sh | sh

# Homebrew (macOS)
brew tap AlexsJones/llmfit
brew install llmfit

# Cargo (Universal)
cargo install llmfit

Common Commands#

llmfit                  # TUI mode (default)
llmfit --cli            # CLI table mode
llmfit fit --perfect -n 5  # Top 5 perfect-fit models
llmfit system           # Show detected hardware specs
llmfit recommend --json # JSON format recommendations
llmfit search "llama 8b" # Search specific model
llmfit --memory=24G     # Override GPU memory detection

Ollama Integration#

Auto-detects local Ollama instance (localhost:11434)
Supports remote instances: OLLAMA_HOST="http://192.168.1.100:11434" llmfit
Press d in TUI to pull models directly

Speed Estimation Constants#

Backend	K Value
CUDA	220
Metal	160
ROCm	180
SYCL	100
CPU ARM	90
CPU x86	70

Supported Model Categories#

General-purpose, Code models (CodeLlama, StarCoder2, Qwen2.5-Coder), Reasoning models (DeepSeek-R1, Orca-2), Multimodal/Vision models (Llama 3.2 Vision, Qwen2.5-VL), Chat models, Embedding models