DISCOVER THE FUTURE OF AI AGENTSarrow_forward

llmfit

calendar_todayAdded Feb 25, 2026
categoryAgent & Tooling
codeOpen Source
Rust大语言模型CLIAgent & ToolingModel & Inference FrameworkDeveloper Tools & CodingModel Training & Inference

A Rust-based cross-platform CLI tool that right-sizes LLM models to your system's RAM, CPU, and GPU by detecting specs and recommending optimal models and quantization strategies. Covers 206 models from 57 providers.

Overview#

llmfit is a cross-platform CLI tool written in Rust designed to solve hardware compatibility issues when running LLMs locally. It automatically detects system resources (CPU, GPU including NVIDIA/AMD/Intel/Apple Silicon, and RAM), scores a database of 206 models across quality, speed, fit, and context dimensions, and recommends optimal quantization levels.

Core Capabilities#

Hardware Detection#

  • CPU: Core count via sysinfo
  • RAM: Total and available memory
  • GPU Support:
    • NVIDIA: Via nvidia-smi, multi-GPU configurations supported
    • AMD: Via rocm-smi
    • Intel Arc: Discrete via sysfs, integrated via lspci
    • Apple Silicon: Unified memory via system_profiler
  • Backend Detection: Auto-identifies CUDA/Metal/ROCm/SYCL acceleration

Model Recommendation#

  • Model Database: 206 models, 57 providers (Meta Llama, Mistral, Qwen, Google Gemma, Microsoft Phi, DeepSeek, xAI Grok, etc.)
  • Dynamic Quantization: From Q8_0 (best quality) to Q2_K (highest compression)
  • MoE Architecture Support: Auto-detects Mixtral, DeepSeek-V2/V3

Multi-dimensional Scoring (0-100)#

  • Quality: Parameter count, model family reputation, quantization penalty
  • Speed: Estimated via K/params_b × quant_speed_multiplier formula
  • Fit: Memory utilization efficiency (optimal: 50-80%)
  • Context: Context window capability

Fit Levels#

  • Perfect: Recommended memory fully meets GPU requirements
  • Good: Fits with headroom, optimal for MoE offload
  • Marginal: Tight fit, or CPU-only execution
  • Too Tight: Insufficient hardware resources

Interface Modes#

  • TUI Mode: Interactive terminal interface with search, sort, theme switching (6 built-in themes)
  • CLI Mode: Pure command-line output for scripting
  • JSON Output: For Agent or programmatic integration

Installation#

# One-liner (macOS / Linux)
curl -fsSL https://llmfit.axjns.dev/install.sh | sh

# Homebrew (macOS)
brew tap AlexsJones/llmfit
brew install llmfit

# Cargo (Universal)
cargo install llmfit

Common Commands#

llmfit                  # TUI mode (default)
llmfit --cli            # CLI table mode
llmfit fit --perfect -n 5  # Top 5 perfect-fit models
llmfit system           # Show detected hardware specs
llmfit recommend --json # JSON format recommendations
llmfit search "llama 8b" # Search specific model
llmfit --memory=24G     # Override GPU memory detection

Ollama Integration#

  • Auto-detects local Ollama instance (localhost:11434)
  • Supports remote instances: OLLAMA_HOST="http://192.168.1.100:11434" llmfit
  • Press d in TUI to pull models directly

Speed Estimation Constants#

BackendK Value
CUDA220
Metal160
ROCm180
SYCL100
CPU ARM90
CPU x8670

Supported Model Categories#

General-purpose, Code models (CodeLlama, StarCoder2, Qwen2.5-Coder), Reasoning models (DeepSeek-R1, Orca-2), Multimodal/Vision models (Llama 3.2 Vision, Qwen2.5-VL), Chat models, Embedding models

Related Projects

View All arrow_forward

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.

rocket_launch