A unified, efficient fine-tuning framework for 100+ LLMs and VLMs, published at ACL 2024. It integrates various fine-tuning methods (LoRA, QLoRA, Full) and training algorithms (DPO, PPO), enabling efficient training on consumer-grade GPUs. Featuring a Web UI (LLaMA Board), it significantly lowers the barrier from data preparation to model deployment.
One-Minute Overview#
LLaMA Factory is an "all-in-one" toolkit for Large Language Model (LLM) fine-tuning, designed to make tuning models as easy as using pre-trained ones. Whether you are building a domain-specific chatbot or conducting academic research, it helps you complete the task via a visual interface or command-line tools.
Core Value: It enables the fine-tuning of the latest and most powerful open-source models (like Qwen3, Llama 3, DeepSeek) on limited hardware resources (e.g., a single consumer-grade GPU) and offers Day-0 support for cutting-edge models.
Quick Start#
Installation Difficulty: Low - Supports one-click pip installation or Docker deployment, with comprehensive example configs provided.
# Clone repo and install
git clone --depth 1 https://github.com/hiyouga/LlamaFactory.git
cd LlamaFactory
pip install -e .
Is this suitable for me?
- ✅ Fine-tune locally: Supports training 70B models with <24GB VRAM.
- ✅ Need latest models: Offers Day-0/Day-1 adaptation for models like Qwen3, Gemma 3.
- ✅ Avoid complex code: Provides a Gradio-based Web UI for mouse-click training.
- ❌ Pre-training from scratch: While supported, its main strength lies in fine-tuning.
Core Capabilities#
1. Extensive Model Support - Eliminates Selection Anxiety#
Supports 100+ models, including LLaMA 3/4, Qwen2/3, Mistral, DeepSeek, GLM, Phi, covering both text and vision-language modalities. Actual Value: No need to adapt codebases for different models; one unified tool for all mainstream open-source models.
2. Resource Efficiency - Lowers Hardware Barriers#
Significantly reduces VRAM usage via advanced algorithms like GaLore, LoRA+, and QLoRA. For example, fine-tuning a 7B model with QLoRA requires only 4GB VRAM. Actual Value: Empowers individual developers and SMEs to train LLMs without expensive enterprise servers.
3. Cutting-Edge Integration - Stays Ahead#
Typically supports the latest released models (e.g., DeepSeek R1, Qwen3) on Day 0 or Day 1 of release. Actual Value: Helps researchers access and adapt the newest model capabilities immediately.
4. Full-Stack Workflow - One-Stop Experience#
Integrates data synthesis, training, evaluation, model exporting, and OpenAI-style API deployment. Actual Value: Eliminates the need to switch between different tools, significantly boosting R&D efficiency.
Tech Stack & Integration#
Languages: Python Key Dependencies: PyTorch, Transformers, PEFT, TRL, Gradio Integration: CLI (Command Line Interface), Web UI, Python SDK
Documentation & Learning Resources#
- Documentation Quality: Comprehensive (Official docs, blog, and online course available)
- Official Docs: https://llamafactory.readthedocs.io/
- Example Code: Rich (Dozens of scenario configs in
examples/) - Online Trial: LLaMA Factory Online (No local setup required)