AgentGym-RL

AgentGym-RL is a framework for training LLM agents through multi-turn reinforcement learning, supporting diverse real-world scenarios and mainstream RL algorithms. It significantly enhances open-source 7B-scale models to match or surpass commercial performance across 27 tasks in diverse environments.

One Minute Overview#

AgentGym-RL is an innovative framework specifically designed to enhance long-term decision-making capabilities of LLM agents through multi-turn reinforcement learning. It provides researchers and developers with a comprehensive training environment covering various real-world scenarios, supporting mainstream RL algorithms, and particularly addressing the exploration-exploitation balance challenge in multi-turn interactions. This framework is ideal for research teams and companies aiming to train AI agents capable of handling complex, multi-step tasks.

Core Value: Through progressive interaction scaling strategy, it achieves stable and efficient multi-turn reinforcement learning training, enabling breakthrough performance improvements in complex tasks.

Quick Start#

Installation Difficulty: Medium - Requires CUDA environment, PyTorch, and specific dependencies

echo \"Preparing environment for agentgym-rl...\"
conda create -n agentgym-rl python==3.10 -y
conda activate agentgym-rl
pip3 install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
# install flash-atten
FLASH_ATTENTION_URL=\"https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl\"
FLASH_ATTENTION_NAME=\"flash_attn-2.7.3+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl\"
wget -q $FLASH_ATTENTION_URL -O $FLASH_ATTENTION_NAME
pip3 install $FLASH_ATTENTION_NAME
rm -f $FLASH_ATTENTION_NAME
# for RL
cd AgentGym-RL
pip3 install -e .
# for agentgym
echo \"Preparing environment for agentenv...\"
cd AgentGym/agentenv
pip3 install -e .
pip3 install transformers==4.51.3

Is this suitable for me?

✅ Complex Task Agent Development: When you need to train agents capable of handling multi-turn interactions like web navigation or game decision-making

✅ RL Research: If you want to explore multi-turn reinforcement learning applications in large model training

✅ Commercial Model Benchmarking: When you want open-source models to match or exceed commercial model performance

❌ Simple One-Time Tasks: If your task only requires single-turn decisions, this framework may be overly complex

Core Capabilities#

1. Modular System Design - Simplifies Complex System Development#

AgentGym-RL adopts a modular and decoupled design with three main components: environment module, agent module, and training module, each with clear responsibilities for easy extension and maintenance. Actual Value: Developers can flexibly replace or upgrade specific components without rebuilding the entire system, significantly improving development efficiency.

2. Rich Scenario Environments - Comprehensive Real-World Coverage#

Provides multiple environments including Web Navigation, Deep Search, Digital Games, Embodied Tasks, and Scientific Tasks, covering various real-world scenarios like online shopping, discussion forums, and collaborative development. Actual Value: Trained agents can adapt to various practical application scenarios, improving deployment utility.

3. Diverse Training Strategies - Meets Different Training Needs#

Supports mainstream online RL algorithms (PPO, GRPO, RLOO, REINFORCE++) as well as complementary training paradigms (SFT, DPO, AgentEvol). Actual Value: Researchers can select the most suitable training method based on specific task requirements, improving training efficiency and performance.

4. ScalingInter-RL Innovative Method - Solves Multi-Turn Training Challenges#

Uses a progressive interaction scaling strategy, starting with shorter interaction rounds to establish basic capabilities and gradually extending the interaction horizon to achieve stable and efficient long-term capability improvement. Actual Value: Solves the exploration-exploitation balance challenge in multi-turn reinforcement learning, significantly improving training efficiency and final performance.

5. Visual Interactive Interface - Convenient Debugging and Analysis#

Provides an interactive visualization interface for replaying and examining complete interaction trajectories, facilitating data analysis and model behavior research. Actual Value: Developers can intuitively understand the agent's decision process, quickly identify issues, and accelerate iterative optimization.

Tech Stack & Integration#

Development Language: Python Key Dependencies: PyTorch, Flash Attention, Transformers, Verl Integration Method: Library/Framework

Maintenance Status#

Development Activity: Active (clear development roadmap and recent updates)
Recent Updates: Recent (paper, dataset, and tutorials released in September 2025)
Community Response: Strong (backed by academic institutions including Fudan University and Shanghai AI Laboratory)

Documentation & Learning Resources#

Documentation Quality: Comprehensive
Official Documentation: Included in README, covering environment setup, training, evaluation, and UI usage
Example Code: Available (examples/train and examples/eval directories)

One Minute Overview#

Quick Start#

Core Capabilities#

1. Modular System Design - Simplifies Complex System Development#

2. Rich Scenario Environments - Comprehensive Real-World Coverage#

3. Diverse Training Strategies - Meets Different Training Needs#

4. ScalingInter-RL Innovative Method - Solves Multi-Turn Training Challenges#

5. Visual Interactive Interface - Convenient Debugging and Analysis#

Tech Stack & Integration#

Maintenance Status#

Documentation & Learning Resources#

Related Projects

Zylos Core

verl

Kalshi AI Trading Bot

STAY UPDATED