AgentGym-RL is a framework for training LLM agents through multi-turn reinforcement learning, supporting diverse real-world scenarios and mainstream RL algorithms. It significantly enhances open-source 7B-scale models to match or surpass commercial performance across 27 tasks in diverse environments.
One Minute Overview#
AgentGym-RL is an innovative framework specifically designed to enhance long-term decision-making capabilities of LLM agents through multi-turn reinforcement learning. It provides researchers and developers with a comprehensive training environment covering various real-world scenarios, supporting mainstream RL algorithms, and particularly addressing the exploration-exploitation balance challenge in multi-turn interactions. This framework is ideal for research teams and companies aiming to train AI agents capable of handling complex, multi-step tasks.
Core Value: Through progressive interaction scaling strategy, it achieves stable and efficient multi-turn reinforcement learning training, enabling breakthrough performance improvements in complex tasks.
Quick Start#
Installation Difficulty: Medium - Requires CUDA environment, PyTorch, and specific dependencies
echo \"Preparing environment for agentgym-rl...\"
conda create -n agentgym-rl python==3.10 -y
conda activate agentgym-rl
pip3 install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
# install flash-atten
FLASH_ATTENTION_URL=\"https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl\"
FLASH_ATTENTION_NAME=\"flash_attn-2.7.3+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl\"
wget -q $FLASH_ATTENTION_URL -O $FLASH_ATTENTION_NAME
pip3 install $FLASH_ATTENTION_NAME
rm -f $FLASH_ATTENTION_NAME
# for RL
cd AgentGym-RL
pip3 install -e .
# for agentgym
echo \"Preparing environment for agentenv...\"
cd AgentGym/agentenv
pip3 install -e .
pip3 install transformers==4.51.3
Is this suitable for me?
- ✅ Complex Task Agent Development: When you need to train agents capable of handling multi-turn interactions like web navigation or game decision-making
- ✅ RL Research: If you want to explore multi-turn reinforcement learning applications in large model training
- ✅ Commercial Model Benchmarking: When you want open-source models to match or exceed commercial model performance
- ❌ Simple One-Time Tasks: If your task only requires single-turn decisions, this framework may be overly complex
Core Capabilities#
1. Modular System Design - Simplifies Complex System Development#
AgentGym-RL adopts a modular and decoupled design with three main components: environment module, agent module, and training module, each with clear responsibilities for easy extension and maintenance. Actual Value: Developers can flexibly replace or upgrade specific components without rebuilding the entire system, significantly improving development efficiency.
2. Rich Scenario Environments - Comprehensive Real-World Coverage#
Provides multiple environments including Web Navigation, Deep Search, Digital Games, Embodied Tasks, and Scientific Tasks, covering various real-world scenarios like online shopping, discussion forums, and collaborative development. Actual Value: Trained agents can adapt to various practical application scenarios, improving deployment utility.
3. Diverse Training Strategies - Meets Different Training Needs#
Supports mainstream online RL algorithms (PPO, GRPO, RLOO, REINFORCE++) as well as complementary training paradigms (SFT, DPO, AgentEvol). Actual Value: Researchers can select the most suitable training method based on specific task requirements, improving training efficiency and performance.
4. ScalingInter-RL Innovative Method - Solves Multi-Turn Training Challenges#
Uses a progressive interaction scaling strategy, starting with shorter interaction rounds to establish basic capabilities and gradually extending the interaction horizon to achieve stable and efficient long-term capability improvement. Actual Value: Solves the exploration-exploitation balance challenge in multi-turn reinforcement learning, significantly improving training efficiency and final performance.
5. Visual Interactive Interface - Convenient Debugging and Analysis#
Provides an interactive visualization interface for replaying and examining complete interaction trajectories, facilitating data analysis and model behavior research. Actual Value: Developers can intuitively understand the agent's decision process, quickly identify issues, and accelerate iterative optimization.
Tech Stack & Integration#
Development Language: Python Key Dependencies: PyTorch, Flash Attention, Transformers, Verl Integration Method: Library/Framework
Maintenance Status#
- Development Activity: Active (clear development roadmap and recent updates)
- Recent Updates: Recent (paper, dataset, and tutorials released in September 2025)
- Community Response: Strong (backed by academic institutions including Fudan University and Shanghai AI Laboratory)
Documentation & Learning Resources#
- Documentation Quality: Comprehensive
- Official Documentation: Included in README, covering environment setup, training, evaluation, and UI usage
- Example Code: Available (examples/train and examples/eval directories)