Trinity-RFT is a general-purpose, flexible and user-friendly framework for LLM reinforcement fine-tuning (RFT). It decouples RFT into three coordinated components: Explorer, Trainer, and Buffer, enabling users with different backgrounds to train LLM-powered agents for specific domains.
One-Minute Overview#
Trinity-RFT is a general-purpose framework for reinforcement fine-tuning of large language models, consisting of three coordinated components: Explorer, Trainer, and Buffer. It enables AI application developers, reinforcement learning researchers, and data engineers to efficiently train and optimize LLM-powered agents.
Core Value: Modular architecture supports flexible RFT modes, works without GPUs, and provides rich data pipelines and algorithm support.
Quick Start#
Installation Difficulty: Medium - Requires Python 3.10-3.12, GPU version needs CUDA≥12.8 and at least 2 GPUs, but offers Tinker backend for no-GPU environments
# Install with CPU backend (suitable for no-GPU users)
pip install -e ".[tinker]"
# Install with GPU support
pip install -e ".[vllm,flash_attn]"
Is this suitable for me?
- ✅ AI Application Development: Train LLM agents for specific domains to enhance professional capabilities
- ✅ RL Research: Design, implement and validate new RL algorithms
- ✅ Data Engineering: Create RFT datasets and build data pipelines
- ❌ Simple Classification Tasks: This framework focuses on reinforcement fine-tuning, not simple model fine-tuning needs
- ❌ Single Machine Use: While supports CPU mode, optimal performance requires distributed training environment
Core Capabilities#
1. Flexible RFT Modes - Meeting Diverse Training Needs#
- Supports synchronous/asynchronous, online/offline, on-policy/off-policy RL
- Inference and training can run independently across devices for improved sample and time efficiency User Value: Users can choose optimal training modes based on computing resources and task requirements
2. Agentic RL Support - Training Complex Multi-step Tasks#
- Supports both concatenated and general multi-step agentic workflows
- Can directly train agent applications developed using frameworks like AgentScope User Value: Simplifies the process from development to training, making complex agent training straightforward
3. Full-lifecycle Data Pipelines - Improving Data Quality and Efficiency#
- Enables pipeline processing of rollout tasks and experience samples
- Supports active data management (prioritization, cleaning, augmentation) throughout RFT lifecycle User Value: Enhances training effectiveness and model performance through data preprocessing and optimization
Tech Stack & Integration#
Development Language: Python 3.10-3.12 Main Dependencies: PyTorch, Ray, vLLM, verl, Data-Juicer Integration Method: Library/API framework
Ecosystem & Extensions#
- Algorithm Support: Multiple RL algorithms including PPO, GRPO, CHORD, REC series
- Framework Compatibility: Compatible with Huggingface and ModelScope model/dataset ecosystems
- Visualization Tools: Provides web interface for configuration and supports Wandb/TensorBoard/MLFlow monitoring
Maintenance Status#
- Development Activity: Actively developed with frequent releases
- Recent Updates: v0.4.1 released in January 2026 with continuous feature improvements
- ** Community Response**: Clear contribution guidelines and community engagement welcome
Commercial & Licensing#
License: Apache-2.0
- ✅ Commercial Use: Allowed
- ✅ Modification: Allowed
- ⚠️ Restrictions: None
Documentation & Learning Resources#
- Documentation Quality: Comprehensive
- Official Documentation: Included in repository
- Example Code: Rich tutorials and examples including GRPO quick start on GSM8k