DeepResearch

DeepResearch is an open-source deep research agent developed by Alibaba, designed for long-horizon, deep information-seeking tasks. With 30.5 billion total parameters but only 3.3 billion activated per token, it demonstrates state-of-the-art performance across various agentic search benchmarks like Humanity's Last Exam, BrowseComp, and WebWalkerQA.

One-Minute Overview#

DeepResearch is an open-source deep research agent developed by Alibaba's Tongyi Lab, specifically designed for long-horizon, deep information-seeking tasks. With 30.5 billion total parameters but only 3.3 billion activated per token, it delivers state-of-the-art performance across various agentic search benchmarks. The agent offers two inference paradigms: ReAct for evaluating core abilities and an IterResearch-based "Heavy" mode for maximum performance.

Core Value: Provides efficient and accurate deep information retrieval and analysis through automated data generation, reinforcement learning, and flexible inference paradigms

Quick Start#

Installation Difficulty: High - Requires Python 3.10.0, multiple API keys, and model weight files

# Create environment
conda create -n deepresearch_env python=3.10.0
conda activate deepresearch_env

# Install dependencies
pip install -r requirements.txt

# Configure environment variables
cp .env.example .env
# Edit .env file to add your API keys

Is this suitable for me?

✅ Academic Research: Ideal for research projects requiring extensive literature review, data analysis, and knowledge discovery

✅ Business Intelligence: Suitable for market research, competitive analysis, and industry trend studies

❌ Simple Q&A Tasks: Not designed for quick, straightforward queries

❌ Resource-Constrained Environments: Not suitable for deployment with limited computational resources

Core Capabilities#

1. Fully Automated Synthetic Data Generation Pipeline#

Provides a highly scalable data synthesis pipeline that fully supports agentic pre-training, supervised fine-tuning, and reinforcement learning Actual Value: Significantly reduces data preparation time while improving model training efficiency and performance

2. Large-Scale Continual Pre-training on Agentic Data#

Leverages diverse, high-quality agentic interaction data for continual pre-training, extending model capabilities and maintaining knowledge freshness Actual Value: Enables the model to process the latest information, improving capabilities for long-term tracking and dynamic information analysis

3. End-to-End Reinforcement Learning#

Employs a strictly on-policy RL approach based on a customized Group Relative Policy Optimization framework, with token-level policy gradients, leave-one-out advantage estimation, and selective filtering of negative samples Actual Value: Stabilizes training in non-stationary environments, improving model performance and reliability in real-world applications

4. Dual Inference Paradigm Compatibility#

Compatible with both ReAct (for evaluating core intrinsic abilities) and an IterResearch-based "Heavy" mode (using test-time scaling to unlock maximum performance) Actual Value: Provides flexible usage options, allowing selection of the appropriate inference mode based on specific needs

Technology Stack & Integration#

Development Language: Python Key Dependencies: Transformers, PyTorch, OpenAI API Integration Method: API / Library

Maintenance Status#

Development Activity: High - Multiple commits per week with continuous updates
Recent Updates: Recently released Tongyi-DeepResearch-30B-A3B version
Community Response: Active - Clear recruitment information and communication channels available

Commercial & Licensing#

License: Apache-2.0

✅ Commercial Use: Allowed
✅ Modification: Allowed
⚠️ Restrictions: Must include original copyright and license notices

Documentation & Learning Resources#

Documentation Quality: Comprehensive
Official Documentation: https://github.com/Alibaba-NLP/DeepResearch
Example Code: Includes inference and evaluation scripts
Learning Resources: Technical blog posts and research papers available

One-Minute Overview#

Quick Start#

Core Capabilities#

1. Fully Automated Synthetic Data Generation Pipeline#

2. Large-Scale Continual Pre-training on Agentic Data#

3. End-to-End Reinforcement Learning#

4. Dual Inference Paradigm Compatibility#

Technology Stack & Integration#

Maintenance Status#

Commercial & Licensing#

Documentation & Learning Resources#

Related Projects

oh-my-codex

Ironcurtain

vibe-remote

STAY UPDATED