Databricks AI Dev Kit

A development toolkit that equips AI coding assistants (Claude Code, Cursor, Windsurf) with Databricks platform capabilities, featuring 50+ MCP tools, 19 skill modules, and a Python core library. Supports SQL execution, Jobs management, Unity Catalog, Spark Declarative Pipelines, AI Agent development (Knowledge Assistants, Genie Spaces), and Model Serving.

Core Components#

databricks-tools-core/#

Python core library providing high-level Databricks function abstractions.

Main Modules:

sql/ - SQL execution, warehouse management, table statistics
jobs/ - Job CRUD, run management
unity_catalog/ - Unity Catalog operations
compute/ - Cluster management, remote execution
spark_declarative_pipelines/ - SDP operations

databricks-mcp-server/#

FastMCP-based MCP server exposing 50+ tools to AI assistants via @mcp.tool decorators.

databricks-skills/#

19 markdown skill documents covering:

AI & Agents: agent-bricks, genie, model-serving, vector-search
MLflow: agent-evaluation, analyze-mlflow-chat-session, analyze-mlflow-trace, instrumenting-with-mlflow-tracing
Analytics: aibi-dashboards, unity-catalog
Data Engineering: spark-declarative-pipelines, jobs, synthetic-data-generation
Development: asset-bundles, app-apx, app-python, python-sdk, config
Reference: docs, lakebase-provisioned

databricks-builder-app/#

Full-stack web application with Claude Code integration.

Feature Matrix#

SQL Operations#

execute_sql - Execute single SQL query with parameter binding
execute_sql_multi - Execute multiple SQL statements with dependency-aware parallelism
list_warehouses, get_best_warehouse - List/auto-select best warehouse
get_table_details - Get table statistics (NONE/SIMPLE/DETAILED)

Jobs Management#

Lifecycle: create_job, get_job, list_jobs, update_job, delete_job
Run control: run_job_now, get_run, list_runs, cancel_run, wait_for_run
Defaults to serverless compute

Unity Catalog Operations#

catalogs.list_catalogs(), schemas.create_schema(), tables.create_table()

Compute Operations#

list_clusters, get_best_cluster, execute_databricks_command, run_python_file_on_databricks

File Operations#

upload_folder - Parallel folder upload
upload_file - Single file upload

Spark Declarative Pipelines (SDP)#

create_pipeline, get_pipeline, update_pipeline, delete_pipeline
start_update, get_update, stop_pipeline
create_or_update_pipeline, find_pipeline_by_name

AI & Agents#

Knowledge Assistants (KA): manage_ka - RAG-based document Q&A assistant
Genie Spaces: create_or_update_genie, get_genie, find_genie_by_name, delete_genie - Natural language data exploration
Supervisor Agent (MAS): manage_mas - Multi-agent system coordination
AI/BI Dashboards: create_or_update_dashboard, get_dashboard, list_dashboards
Model Serving: get_serving_endpoint_status, query_serving_endpoint, list_serving_endpoints

Architecture Design#

AI Coding Assistant (Claude Code/Cursor/Windsurf)
    │
    ├── Skills (.claude/skills/) ←→ MCP Tools (.claude/mcp.json)
    │
    └── MCP Protocol (stdio)
            │
            ▼
    databricks-mcp-server (FastMCP)
    tools/sql.py | compute.py | file.py | jobs.py | pipelines.py | agent_bricks.py | aibi_dashboards.py | serving.py
            │
            └── Python imports
                    │
                    ▼
            databricks-tools-core
            sql/ | compute/ | jobs/ | pipelines/ | unity_catalog/
                    │
                    └── Databricks SDK
                            │
                            ▼
                    Databricks Workspace

Code Examples#

SQL Execution#

from databricks_tools_core.sql import execute_sql

# Simple query
result = execute_sql("SELECT * FROM my_catalog.my_schema.customers LIMIT 10")

# Parameterized query
result = execute_sql(
    sql_query="SELECT COUNT(*) as cnt FROM customers",
    warehouse_id="abc123def456",
    catalog="my_catalog",
    schema="my_schema",
)

Jobs Management#

from databricks_tools_core.jobs import create_job, run_job_now, wait_for_run

# Create job (defaults to serverless)
tasks = [
    {
        "task_key": "etl_task",
        "notebook_task": {
            "notebook_path": "/Workspace/ETL/process_data",
            "source": "WORKSPACE",
        },
    }
]
job = create_job(name="my_etl_job", tasks=tasks)

# Run and wait for completion
run_id = run_job_now(job_id=job["job_id"])
result = wait_for_run(run_id=run_id, timeout=3600)

Multi-user Authentication#

from databricks_tools_core.auth import (
    set_databricks_auth,
    clear_databricks_auth,
    get_workspace_client,
)

async def handle_request(user_host: str, user_token: str):
    set_databricks_auth(user_host, user_token)
    try:
        result = execute_sql("SELECT current_user()")
        client = get_workspace_client()
        warehouses = client.warehouses.list()
    finally:
        clear_databricks_auth()

Authentication Mechanism#

Authentication Priority#

Context variables (set_databricks_auth) - Multi-user applications
Environment variables (DATABRICKS_HOST, DATABRICKS_TOKEN)
Config profile (DATABRICKS_CONFIG_PROFILE or ~/.databrickscfg)

Environment Variable Configuration#

export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-token"

Key Technical Implementations#

Parallel Execution Mechanism#

SQL Dependency Analysis: Uses sqlglot to parse statement dependencies
Dependency-aware Parallelism: Independent statements execute in parallel, dependent statements execute in order
Example: t1 and t2 created in parallel, t3 waits for both to complete

Core Dependencies#

FastMCP - MCP server framework
databricks-sdk>=0.20.0 - Official Python SDK
pydantic>=2.0.0 - Data validation
sqlglot>=20.0.0 - SQL parsing
contextvars - Multi-user authentication context management

Use Cases#

Data Engineering: Spark Declarative Pipelines, streaming tables, CDC, SCD Type 2
Machine Learning: MLflow experiments, model evaluation, trace analysis
AI Agent Development: Knowledge Assistants, Genie Spaces, Supervisor Agent
Data Analytics: AI/BI Dashboards, natural language data exploration
Platform Governance: Unity Catalog resource management
Application Development: Databricks Apps full-stack web applications

Installation#

Project-level Installation (Recommended)#

Mac/Linux:

bash <(curl -sL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.sh)

Windows PowerShell:

irm https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.ps1 | iex

MCP Server Configuration#

Create .mcp.json in project root:

{
  "mcpServers": {
    "databricks": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/ai-dev-kit", "python", "databricks-mcp-server/run_server.py"],
      "defer_loading": true
    }
  }
}

Skills Installation#

# Install all skills
curl -sSL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/databricks-skills/install_skills.sh | bash

# Install specific skills
curl -sSL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/databricks-skills/install_skills.sh | bash -s -- databricks-asset-bundles agent-evaluation

Databricks AI Dev Kit

Core Components#

databricks-tools-core/#

databricks-mcp-server/#

databricks-skills/#

databricks-builder-app/#

Feature Matrix#

SQL Operations#

Jobs Management#

Unity Catalog Operations#

Compute Operations#

File Operations#

Spark Declarative Pipelines (SDP)#

AI & Agents#

Architecture Design#

Code Examples#

SQL Execution#

Jobs Management#

Multi-user Authentication#

Authentication Mechanism#

Authentication Priority#

Environment Variable Configuration#

Key Technical Implementations#

Parallel Execution Mechanism#

Core Dependencies#

Use Cases#

Installation#

Project-level Installation (Recommended)#

MCP Server Configuration#

Skills Installation#

Related Projects

oh-my-codex

Ironcurtain

vibe-remote

STAY UPDATED