DeepResearch is an open-source deep research agent developed by Alibaba, designed for long-horizon, deep information-seeking tasks. With 30.5 billion total parameters but only 3.3 billion activated per token, it demonstrates state-of-the-art performance across various agentic search benchmarks like Humanity's Last Exam, BrowseComp, and WebWalkerQA.
One-Minute Overview#
DeepResearch is an open-source deep research agent developed by Alibaba's Tongyi Lab, specifically designed for long-horizon, deep information-seeking tasks. With 30.5 billion total parameters but only 3.3 billion activated per token, it delivers state-of-the-art performance across various agentic search benchmarks. The agent offers two inference paradigms: ReAct for evaluating core abilities and an IterResearch-based "Heavy" mode for maximum performance.
Core Value: Provides efficient and accurate deep information retrieval and analysis through automated data generation, reinforcement learning, and flexible inference paradigms
Quick Start#
Installation Difficulty: High - Requires Python 3.10.0, multiple API keys, and model weight files
# Create environment
conda create -n deepresearch_env python=3.10.0
conda activate deepresearch_env
# Install dependencies
pip install -r requirements.txt
# Configure environment variables
cp .env.example .env
# Edit .env file to add your API keys
Is this suitable for me?
- ✅ Academic Research: Ideal for research projects requiring extensive literature review, data analysis, and knowledge discovery
- ✅ Business Intelligence: Suitable for market research, competitive analysis, and industry trend studies
- ❌ Simple Q&A Tasks: Not designed for quick, straightforward queries
- ❌ Resource-Constrained Environments: Not suitable for deployment with limited computational resources
Core Capabilities#
1. Fully Automated Synthetic Data Generation Pipeline#
- Provides a highly scalable data synthesis pipeline that fully supports agentic pre-training, supervised fine-tuning, and reinforcement learning Actual Value: Significantly reduces data preparation time while improving model training efficiency and performance
2. Large-Scale Continual Pre-training on Agentic Data#
- Leverages diverse, high-quality agentic interaction data for continual pre-training, extending model capabilities and maintaining knowledge freshness Actual Value: Enables the model to process the latest information, improving capabilities for long-term tracking and dynamic information analysis
3. End-to-End Reinforcement Learning#
- Employs a strictly on-policy RL approach based on a customized Group Relative Policy Optimization framework, with token-level policy gradients, leave-one-out advantage estimation, and selective filtering of negative samples Actual Value: Stabilizes training in non-stationary environments, improving model performance and reliability in real-world applications
4. Dual Inference Paradigm Compatibility#
- Compatible with both ReAct (for evaluating core intrinsic abilities) and an IterResearch-based "Heavy" mode (using test-time scaling to unlock maximum performance) Actual Value: Provides flexible usage options, allowing selection of the appropriate inference mode based on specific needs
Technology Stack & Integration#
Development Language: Python Key Dependencies: Transformers, PyTorch, OpenAI API Integration Method: API / Library
Maintenance Status#
- Development Activity: High - Multiple commits per week with continuous updates
- Recent Updates: Recently released Tongyi-DeepResearch-30B-A3B version
- Community Response: Active - Clear recruitment information and communication channels available
Commercial & Licensing#
License: Apache-2.0
- ✅ Commercial Use: Allowed
- ✅ Modification: Allowed
- ⚠️ Restrictions: Must include original copyright and license notices
Documentation & Learning Resources#
- Documentation Quality: Comprehensive
- Official Documentation: https://github.com/Alibaba-NLP/DeepResearch
- Example Code: Includes inference and evaluation scripts
- Learning Resources: Technical blog posts and research papers available