DISCOVER THE FUTURE OF AI AGENTS

verl

Added Apr 25, 2026
Model & Inference Framework
Open Source
PythonPyTorchLarge Language ModelsMultimodalTransformersDeep LearningReinforcement LearningvLLMCLIModel & Inference FrameworkOtherEducation & Research ResourcesModel Training & Inference

A flexible, efficient, and production-ready post-training reinforcement learning framework for LLMs

verl (Volcano Engine Reinforcement Learning for LLMs) is an open-source post-training reinforcement learning framework for large language models, originally initiated by ByteDance's Seed team and now maintained by the verl-project community. The project's core innovations lie in the HybridFlow programming model and the 3D-HybridEngine: the former employs a two-layer design with single controllers (inter-node communication) and multi controllers (intra-node computation) to flexibly express the complex data flows of RL post-training (rollout-reward-update); the latter addresses the model sharding inconsistency between the training phase (FSDP/Megatron parallel sharding) and the generation phase (tensor parallel sharding), achieving zero-memory-redundancy actor model resharding with reported throughput improvements of 1.53× to 20.57× over SOTA baselines.

The framework features a highly modular architecture: training backends support FSDP, FSDP2, and Megatron-LM, while inference backends support vLLM, SGLang, and HF Transformers, all freely combinable. It covers 17+ RL algorithms including PPO, GRPO, DAPO, REINFORCE++, RLOO, and ReMax, supporting both model-based rewards and function-based rewards (verifiable rewards) for math and code scenarios.

Advanced capabilities include multimodal RL (Qwen2.5-vl, Kimi-VL), multi-turn dialogue and tool-calling Agent RL training, multi-GPU LoRA RL for memory efficiency, Expert Parallelism scaling to 671B models, FP8 RL and NVFP4 QAT low-precision training, sequence parallelism and sequence packing, plus experimental asynchronous architectures (Off-policy, Fully Async Policy). Hardware support spans NVIDIA GPUs (CUDA ≥ 12.8), AMD GPUs (ROCm), and Huawei Ascend NPUs.

Typical use cases include LLM alignment (RLHF), DeepSeek R1-style reasoning model training, code and math capability enhancement, VLM multimodal RL, and Agent RL training. The project provides complete Docker images and quickstart guides, with algorithm recipes managed through the separate verl-recipe repository for reproducibility. The research was published at EuroSys 2025.

Related Projects

View All

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.