DISCOVER THE FUTURE OF AI AGENTSarrow_forward

DATAGEN

calendar_todayAdded Feb 23, 2026
categoryAgent & Tooling
codeOpen Source
PythonWorkflow Automation大语言模型Multi-Agent SystemLangGraphAI AgentsMachine LearningAgent FrameworkAgent & ToolingOtherAutomation, Workflow & RPAEducation & Research ResourcesData Analytics, BI & Visualization

An AI-driven multi-agent research assistant based on LangGraph that automates the entire research workflow from hypothesis generation, data analysis, and visualization to comprehensive report writing.

Project Overview#

DATAGEN (formerly AI-Data-Analysis-MultiAgent) is an enterprise-grade platform for automated data analysis. It leverages multiple specialized AI agents working collaboratively to simulate and execute core tasks of human researchers, achieving end-to-end automation from raw data to insight reports.

Core Architecture#

Progressive Disclosure Architecture#

The system employs an innovative three-level loading strategy to address context overflow in multi-agent long-horizon tasks:

  • Level 1 (Metadata): Loads only Agent name, description, and available skills list (~100 tokens) for routing decisions
  • Level 2 (Instructions): Loads complete System Prompt (AGENT.md) and global rules when Agent is activated
  • Level 3 (Resources): Loads detailed skill documentation (SKILL.md), MCP resources, and external files only during actual execution

Agent Specialization System#

The system features 9 specialized agents:

AgentResponsibility
process_agentSupervises and orchestrates the entire research workflow
hypothesis_agentAutomatically generates and refines research hypotheses
search_agent / searcher_agentExecutes web and literature searches
code_agentWrites and executes data analysis code
visualization_agentGenerates interactive data visualizations
report_agentDrafts research reports
quality_review_agentPerforms quality reviews on analysis processes and results
note_agentHandles state tracking and context retention throughout
refiner_agentPolishes and optimizes the final report

Multi-Model Support#

Supports assigning different underlying LLMs to different Agents:

  • OpenAI: GPT series
  • Anthropic: Claude series
  • Google: Gemini series
  • Groq: High-performance inference
  • Ollama: Local model support

Core Capabilities#

Research Automation#

  • AI-driven hypothesis generation and validation
  • Automated research direction optimization
  • Real-time hypothesis refinement

Data Processing#

  • Robust data cleaning and transformation
  • Scalable analysis pipelines
  • Automated quality assurance

Visualization & Reporting#

  • Interactive data visualization
  • Custom report generation
  • Automated insight extraction

Smart Memory Management#

  • Note Taker agent for state tracking
  • Efficient context retention system

Quick Start#

Requirements#

  • Python 3.10+
  • Conda (recommended)
  • ChromeDriver (for web automation search)

Installation#

# Clone repository
git clone https://github.com/starpig1129/DATAGEN.git

# Create Conda environment
conda create -n datagen python=3.10
conda activate datagen

# Install dependencies
pip install -r requirements.txt

Configuration#

  1. Rename .env Example to .env
  2. Configure required items: WORKING_DIRECTORY, CONDA_ENV, CHROMEDRIVER_PATH
  3. Configure API Keys (as needed): OPENAI_API_KEY, GOOGLE_API_KEY, ANTHROPIC_API_KEY, etc.

Usage Example#

user_input = '''
datapath:YourDataName.csv
Use machine learning to perform data analysis and write complete graphical reports
'''

Configuration File Structure#

  • config/agent_models.yaml — Agent model configuration
  • config/agents/{agent_name}/AGENT.md — System prompts
  • config/agents/{agent_name}/config.yaml — Tools, skills, MCP settings
  • config/skills/{skill-name}/SKILL.md — Reusable skills
  • config/mcp.yaml — MCP server global configuration

Use Cases#

  • Data Science & Exploratory Data Analysis (EDA)
  • Academic Research Assistance (hypothesis validation & literature review)
  • Automated Business Analysis Report Generation
  • Complex task orchestration with multi-model collaboration

Important Notes#

  • Ensure sufficient API balance; the system makes multiple API calls
  • The entire research workflow may take considerable time depending on task complexity
  • Recommend backing up data before use; agent system may modify analyzed data

Related Projects

View All arrow_forward

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.

rocket_launch