DISCOVER THE FUTURE OF AI AGENTSarrow_forward

AI Data Science Team

calendar_todayAdded Feb 23, 2026
categoryAgent & Tooling
codeOpen Source
PythonWorkflow AutomationMulti-Agent SystemLangGraphLangChainAI AgentsMachine LearningAgent FrameworkStreamlitWeb ApplicationAgent & ToolingOtherAutomation, Workflow & RPAModel Training & InferenceData Analytics, BI & Visualization

An AI-powered data science team of agents that automates data loading, cleaning, feature engineering, EDA, visualization, and machine learning modeling (H2O + MLflow) through specialized agent collaboration, featuring a Streamlit visual pipeline studio to perform common data science tasks 10X faster.

Overview#

AI Data Science Team is a Python library designed to build a virtual AI data science team. It leverages Large Language Models (LLM) to drive multiple specialized Agents, automating the entire workflow from data loading, cleaning, wrangling, EDA to machine learning modeling.

Core Capabilities#

Data Processing#

  • Data Loading & Inspection: Support for common formats (CSV, Excel, etc.)
  • Data Cleaning: Automatically handle missing values, outliers, duplicates
  • Data Wrangling: Format conversion, pivot tables, merging, etc.

Feature & Analysis#

  • Feature Engineering: Automatic feature generation/selection
  • EDA (Exploratory Data Analysis): Automatic statistical summaries and charts
  • Visualization: Code-generated charting capabilities

Data Source Interaction#

  • SQL Interaction: Natural language to SQL queries, database interaction

Modeling Capabilities#

  • H2O AutoML: Integrated H2O for automated modeling
  • MLflow Integration: Experiment tracking and model management
  • Model Evaluation: Automated evaluation metrics generation

Agent Module System#

Base Agents (agents/)#

  • data_loader_tools_agent: Data loading
  • data_cleaning_agent: Data cleaning
  • data_wrangling_agent: Data wrangling
  • data_visualization_agent: Visualization
  • feature_engineering_agent: Feature engineering
  • sql_database_agent: SQL database operations
  • workflow_planner_agent: Workflow planning

Data Science Agents (ds_agents/)#

  • eda_tools_agent: Focused on EDA toolchain

Machine Learning Agents (ml_agents/)#

  • h2o_ml_agent: Execute H2O machine learning tasks
  • mlflow_tools_agent: Manage MLflow tools
  • model_evaluation_agent: Focused on model evaluation

Multi-Agent System (multiagents/)#

  • pandas_data_analyst: Pandas data analysis expert
  • sql_data_analyst: SQL data analysis expert
  • supervisor_ds_team: Supervisor Agent coordinating other Agents

Flagship Application: AI Pipeline Studio#

An interactive application built on Streamlit as the graphical frontend:

  • Pipeline-first Workspace: Integrated visual editor, table viewer, chart generator, and code viewer
  • Hybrid Mode: Support for manual and AI automated steps
  • Project Management: Save projects (metadata-only or full-data), support rehydrate (reload from source data)
  • Context Memory: Short-term memory for multi-turn conversation context
  • Debugging: Verbose logs mode, output to logs/ directory

Architecture#

ai_data_science_team/
├── orchestration.py     # Orchestration logic (core flow control)
├── agents/              # Base data science agents
├── ds_agents/           # Extended DS agents
├── ml_agents/           # Extended ML agents
├── multiagents/         # Multi-agent collaboration logic
├── parsers/             # Output parsers
├── templates/           # Prompt templates
├── tools/               # Low-level tool functions
└── utils/               # General utility functions

Installation & Quick Start#

Requirements#

  • Python 3.10+
  • OpenAI API Key (recommended) or locally running Ollama instance

Installation#

# PyPI installation
pip install ai-data-science-team

# Source development installation
git clone https://github.com/business-science/ai-data-science-team.git
cd ai-data-science-team
pip install -e .

Run AI Pipeline Studio#

streamlit run apps/ai-pipeline-studio-app/app.py

LLM Configuration#

OpenAI (Cloud)

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-4.1-mini")

Ollama (Local)

ollama serve
ollama pull llama3.1:8b
from langchain_ollama import ChatOllama
llm = ChatOllama(model="llama3.1:8b")

Example Resources#

Rich Jupyter Notebook examples provided:

  • data_cleaning_agent.ipynb
  • data_loader_tools_agent.ipynb
  • data_visualization_agent.ipynb
  • data_wrangling_agent.ipynb
  • feature_engineering_agent.ipynb
  • sql_database_agent.ipynb

And advanced topic directories: advanced_topics/, ds_agents/, ml_agents/, multiagents/, teams_of_agents/

Related Projects

View All arrow_forward

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.

rocket_launch