slime
✨An LLM post-training framework for RL scaling by Tsinghua THUDM, deeply integrating Megatron-LM training with SGLang inference engine for distributed reinforcement learning on large models like GLM, Qwen, DeepSeek, and Llama.
An LLM post-training framework for RL scaling by Tsinghua THUDM, deeply integrating Megatron-LM training with SGLang inference engine for distributed reinforcement learning on large models like GLM, Qwen, DeepSeek, and Llama.
The official inference framework for 1-bit Large Language Models by Microsoft. It features optimized kernels for lossless, high-speed inference on CPUs and GPUs, drastically reducing energy consumption and enabling 100B+ parameter models to run on local consumer hardware.
AirLLM optimizes inference memory usage, enabling 70B large language models to run on a single 4GB GPU card without quantization, distillation, or pruning. It now also supports running 405B Llama3.1 models on 8GB VRAM.
An open-source 314B parameter large language model with Mixture of Experts (MoE) architecture, providing researchers and developers with accessible implementation of ultra-large-scale AI models.
A benchmark platform featuring 100 PhD-level research tasks across 22 distinct fields, systematically evaluating Deep Research Agents (DRAs) on report generation quality and information retrieval capabilities.
A repository demonstrating how to train a GPT-2 (124M) model with modern techniques on a single GPU, achieving high performance fine-tuning in under an hour.
FlashMLA is an LLM inference kernel that provides efficient attention with variable-length cache and precise memory management, significantly reducing memory waste and improving inference throughput.
MiniMax-M2.1 is a state-of-the-art AI model designed for real-world development and agent scenarios. It excels in multilingual software development, complex workflow execution, and full-stack application development, providing open, controllable, and transparent AI agent capabilities。
GLM-4.5 series are foundation models designed for intelligent agents, unifying reasoning, coding, and agent capabilities in a single framework. They offer both thinking mode for complex reasoning and tool usage, and non-thinking mode for immediate responses, making them suitable for complex intelligent agent applications.
A next-generation training engine built for ultra-large MoE (Mixture of Experts) models, offering high-efficiency and scalable training solutions for large language models.
Page 1 / 2 · 12 total
Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.