Databricks AI Dev Kit

为 AI 编程助手（Claude Code、Cursor、Windsurf）提供 Databricks 平台工具链的开发工具包，包含 50+ MCP 工具、19 个技能模块和 Python 核心库。支持 SQL 执行、Jobs 管理、Unity Catalog、Spark Declarative Pipelines、AI Agent 开发（Knowledge Assistants、Genie Spaces）和 Model Serving。

核心组件#

databricks-tools-core/#

Python 核心库，提供高级 Databricks 函数封装。

主要模块：

sql/ - SQL 执行、warehouse 管理、表统计
jobs/ - 作业 CRUD、运行管理
unity_catalog/ - Unity Catalog 操作
compute/ - 集群管理、远程执行
spark_declarative_pipelines/ - SDP 操作

databricks-mcp-server/#

基于 FastMCP 的 MCP 服务器，通过 @mcp.tool 装饰器暴露 50+ 工具给 AI 助手。

databricks-skills/#

19 个 markdown 技能文档，覆盖：

AI & Agents: agent-bricks, genie, model-serving, vector-search
MLflow: agent-evaluation, analyze-mlflow-chat-session, analyze-mlflow-trace, instrumenting-with-mlflow-tracing
Analytics: aibi-dashboards, unity-catalog
Data Engineering: spark-declarative-pipelines, jobs, synthetic-data-generation
Development: asset-bundles, app-apx, app-python, python-sdk, config
Reference: docs, lakebase-provisioned

databricks-builder-app/#

全栈 Web 应用，集成 Claude Code。

功能矩阵#

SQL Operations#

execute_sql - 执行单条 SQL 查询，支持参数绑定
execute_sql_multi - 执行多条 SQL，支持依赖感知并行执行
list_warehouses, get_best_warehouse - 列出/自动选择最佳 warehouse
get_table_details - 获取表统计信息（NONE/SIMPLE/DETAILED）

Jobs Management#

生命周期：create_job, get_job, list_jobs, update_job, delete_job
运行控制：run_job_now, get_run, list_runs, cancel_run, wait_for_run
默认使用 serverless compute

Unity Catalog Operations#

catalogs.list_catalogs(), schemas.create_schema(), tables.create_table()

Compute Operations#

list_clusters, get_best_cluster, execute_databricks_command, run_python_file_on_databricks

File Operations#

upload_folder - 并行上传文件夹
upload_file - 上传单个文件

Spark Declarative Pipelines (SDP)#

create_pipeline, get_pipeline, update_pipeline, delete_pipeline
start_update, get_update, stop_pipeline
create_or_update_pipeline, find_pipeline_by_name

AI & Agents#

Knowledge Assistants (KA): manage_ka - RAG 文档问答助手
Genie Spaces: create_or_update_genie, get_genie, find_genie_by_name, delete_genie - 自然语言数据探索
Supervisor Agent (MAS): manage_mas - 多 Agent 系统协调
AI/BI Dashboards: create_or_update_dashboard, get_dashboard, list_dashboards
Model Serving: get_serving_endpoint_status, query_serving_endpoint, list_serving_endpoints

架构设计#

AI Coding Assistant (Claude Code/Cursor/Windsurf)
    │
    ├── Skills (.claude/skills/) ←→ MCP Tools (.claude/mcp.json)
    │
    └── MCP Protocol (stdio)
            │
            ▼
    databricks-mcp-server (FastMCP)
    tools/sql.py | compute.py | file.py | jobs.py | pipelines.py | agent_bricks.py | aibi_dashboards.py | serving.py
            │
            └── Python imports
                    │
                    ▼
            databricks-tools-core
            sql/ | compute/ | jobs/ | pipelines/ | unity_catalog/
                    │
                    └── Databricks SDK
                            │
                            ▼
                    Databricks Workspace

代码示例#

SQL 执行#

from databricks_tools_core.sql import execute_sql

# 简单查询
result = execute_sql("SELECT * FROM my_catalog.my_schema.customers LIMIT 10")

# 带参数的查询
result = execute_sql(
    sql_query="SELECT COUNT(*) as cnt FROM customers",
    warehouse_id="abc123def456",
    catalog="my_catalog",
    schema="my_schema",
)

Jobs 管理#

from databricks_tools_core.jobs import create_job, run_job_now, wait_for_run

# 创建 job（默认 serverless）
tasks = [
    {
        "task_key": "etl_task",
        "notebook_task": {
            "notebook_path": "/Workspace/ETL/process_data",
            "source": "WORKSPACE",
        },
    }
]
job = create_job(name="my_etl_job", tasks=tasks)

# 运行并等待完成
run_id = run_job_now(job_id=job["job_id"])
result = wait_for_run(run_id=run_id, timeout=3600)

多用户认证#

from databricks_tools_core.auth import (
    set_databricks_auth,
    clear_databricks_auth,
    get_workspace_client,
)

async def handle_request(user_host: str, user_token: str):
    set_databricks_auth(user_host, user_token)
    try:
        result = execute_sql("SELECT current_user()")
        client = get_workspace_client()
        warehouses = client.warehouses.list()
    finally:
        clear_databricks_auth()

认证机制#

认证优先级#

Context variables (set_databricks_auth) - 多用户应用
Environment variables (DATABRICKS_HOST, DATABRICKS_TOKEN)
Config profile (DATABRICKS_CONFIG_PROFILE or ~/.databrickscfg)

环境变量配置#

export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-token"

关键技术实现#

并行执行机制#

SQL 依赖分析：使用 sqlglot 解析语句依赖关系
依赖感知并行：独立语句并行执行，依赖语句按序执行
例如：t1 和 t2 并行创建，t3 等待两者完成后执行

核心依赖#

FastMCP - MCP server 框架
databricks-sdk>=0.20.0 - 官方 Python SDK
pydantic>=2.0.0 - 数据验证
sqlglot>=20.0.0 - SQL 解析
contextvars - 多用户认证上下文管理

使用场景#

数据工程：Spark Declarative Pipelines、流式表、CDC、SCD Type 2
机器学习：MLflow 实验、模型评估、追踪分析
AI Agent 开发：Knowledge Assistants、Genie Spaces、Supervisor Agent
数据分析：AI/BI Dashboards、自然语言数据探索
平台治理：Unity Catalog 资源管理
应用开发：Databricks Apps 全栈 Web 应用

安装方式#

项目级安装（推荐）#

Mac/Linux:

bash <(curl -sL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.sh)

Windows PowerShell:

irm https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.ps1 | iex

MCP Server 配置#

在项目根目录创建 .mcp.json:

{
  "mcpServers": {
    "databricks": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/ai-dev-kit", "python", "databricks-mcp-server/run_server.py"],
      "defer_loading": true
    }
  }
}

Skills 安装#

# 安装所有 skills
curl -sSL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/databricks-skills/install_skills.sh | bash

# 安装特定 skills
curl -sSL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/databricks-skills/install_skills.sh | bash -s -- databricks-asset-bundles agent-evaluation