An RL training environment building library for LLMs, providing complete infrastructure from development and testing to scaled rollout collection, with built-in RLVR scenarios and tool-calling support.
NeMo Gym is an RL training environment building library for LLMs introduced by NVIDIA. Open-sourced under the Apache 2.0 license (Copyright 2025 NVIDIA), the project is currently in its early Beta stage with continuously evolving APIs. Its core value lies in decoupling RL environment development from the training loop, enabling independent end-to-end environment and throughput testing. The underlying architecture relies on FastAPI, Uvicorn, and uvloop to build asynchronous HTTP servers hosting environments, utilizes Ray for distributed rollout collection, and employs Pydantic with orjson for data validation and serialization performance. The project abstracts an OpenAI-compatible API layer, seamlessly connecting to inference backends like OpenAI, Azure, and vLLM, while being compatible with training frameworks such as NeMo RL, OpenRLHF, and Unsloth. It features a rich set of built-in scenarios covering mathematical reasoning (e.g., GSM8k, Lean4), code generation (e.g., BIRD SQL), knowledge enhancement, complex Agent tasks (multi-hop QA, financial report analysis), instruction following, and safety alignment. It comes with a comprehensive suite of ng_* CLI tools covering the full lifecycle from server startup to rollout collection, reward profiling, and HF dataset synchronization, driven by a Hydra + OmegaConf configuration system. Development standards are strict, requiring a test coverage of no less than 96%, integrated with modern Python quality tools like Ruff and Mypy.