为 AI 编程助手(Claude Code、Cursor、Windsurf)提供 Databricks 平台工具链的开发工具包,包含 50+ MCP 工具、19 个技能模块和 Python 核心库。支持 SQL 执行、Jobs 管理、Unity Catalog、Spark Declarative Pipelines、AI Agent 开发(Knowledge Assistants、Genie Spaces)和 Model Serving。
核心组件#
databricks-tools-core/#
Python 核心库,提供高级 Databricks 函数封装。
主要模块:
sql/- SQL 执行、warehouse 管理、表统计jobs/- 作业 CRUD、运行管理unity_catalog/- Unity Catalog 操作compute/- 集群管理、远程执行spark_declarative_pipelines/- SDP 操作
databricks-mcp-server/#
基于 FastMCP 的 MCP 服务器,通过 @mcp.tool 装饰器暴露 50+ 工具给 AI 助手。
databricks-skills/#
19 个 markdown 技能文档,覆盖:
- AI & Agents: agent-bricks, genie, model-serving, vector-search
- MLflow: agent-evaluation, analyze-mlflow-chat-session, analyze-mlflow-trace, instrumenting-with-mlflow-tracing
- Analytics: aibi-dashboards, unity-catalog
- Data Engineering: spark-declarative-pipelines, jobs, synthetic-data-generation
- Development: asset-bundles, app-apx, app-python, python-sdk, config
- Reference: docs, lakebase-provisioned
databricks-builder-app/#
全栈 Web 应用,集成 Claude Code。
功能矩阵#
SQL Operations#
execute_sql- 执行单条 SQL 查询,支持参数绑定execute_sql_multi- 执行多条 SQL,支持依赖感知并行执行list_warehouses,get_best_warehouse- 列出/自动选择最佳 warehouseget_table_details- 获取表统计信息(NONE/SIMPLE/DETAILED)
Jobs Management#
- 生命周期:
create_job,get_job,list_jobs,update_job,delete_job - 运行控制:
run_job_now,get_run,list_runs,cancel_run,wait_for_run - 默认使用 serverless compute
Unity Catalog Operations#
catalogs.list_catalogs(),schemas.create_schema(),tables.create_table()
Compute Operations#
list_clusters,get_best_cluster,execute_databricks_command,run_python_file_on_databricks
File Operations#
upload_folder- 并行上传文件夹upload_file- 上传单个文件
Spark Declarative Pipelines (SDP)#
create_pipeline,get_pipeline,update_pipeline,delete_pipelinestart_update,get_update,stop_pipelinecreate_or_update_pipeline,find_pipeline_by_name
AI & Agents#
- Knowledge Assistants (KA):
manage_ka- RAG 文档问答助手 - Genie Spaces:
create_or_update_genie,get_genie,find_genie_by_name,delete_genie- 自然语言数据探索 - Supervisor Agent (MAS):
manage_mas- 多 Agent 系统协调 - AI/BI Dashboards:
create_or_update_dashboard,get_dashboard,list_dashboards - Model Serving:
get_serving_endpoint_status,query_serving_endpoint,list_serving_endpoints
架构设计#
AI Coding Assistant (Claude Code/Cursor/Windsurf)
│
├── Skills (.claude/skills/) ←→ MCP Tools (.claude/mcp.json)
│
└── MCP Protocol (stdio)
│
▼
databricks-mcp-server (FastMCP)
tools/sql.py | compute.py | file.py | jobs.py | pipelines.py | agent_bricks.py | aibi_dashboards.py | serving.py
│
└── Python imports
│
▼
databricks-tools-core
sql/ | compute/ | jobs/ | pipelines/ | unity_catalog/
│
└── Databricks SDK
│
▼
Databricks Workspace
代码示例#
SQL 执行#
from databricks_tools_core.sql import execute_sql
# 简单查询
result = execute_sql("SELECT * FROM my_catalog.my_schema.customers LIMIT 10")
# 带参数的查询
result = execute_sql(
sql_query="SELECT COUNT(*) as cnt FROM customers",
warehouse_id="abc123def456",
catalog="my_catalog",
schema="my_schema",
)
Jobs 管理#
from databricks_tools_core.jobs import create_job, run_job_now, wait_for_run
# 创建 job(默认 serverless)
tasks = [
{
"task_key": "etl_task",
"notebook_task": {
"notebook_path": "/Workspace/ETL/process_data",
"source": "WORKSPACE",
},
}
]
job = create_job(name="my_etl_job", tasks=tasks)
# 运行并等待完成
run_id = run_job_now(job_id=job["job_id"])
result = wait_for_run(run_id=run_id, timeout=3600)
多用户认证#
from databricks_tools_core.auth import (
set_databricks_auth,
clear_databricks_auth,
get_workspace_client,
)
async def handle_request(user_host: str, user_token: str):
set_databricks_auth(user_host, user_token)
try:
result = execute_sql("SELECT current_user()")
client = get_workspace_client()
warehouses = client.warehouses.list()
finally:
clear_databricks_auth()
认证机制#
认证优先级#
- Context variables (
set_databricks_auth) - 多用户应用 - Environment variables (
DATABRICKS_HOST,DATABRICKS_TOKEN) - Config profile (
DATABRICKS_CONFIG_PROFILEor~/.databrickscfg)
环境变量配置#
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-token"
关键技术实现#
并行执行机制#
- SQL 依赖分析:使用 sqlglot 解析语句依赖关系
- 依赖感知并行:独立语句并行执行,依赖语句按序执行
- 例如:t1 和 t2 并行创建,t3 等待两者完成后执行
核心依赖#
FastMCP- MCP server 框架databricks-sdk>=0.20.0- 官方 Python SDKpydantic>=2.0.0- 数据验证sqlglot>=20.0.0- SQL 解析contextvars- 多用户认证上下文管理
使用场景#
- 数据工程:Spark Declarative Pipelines、流式表、CDC、SCD Type 2
- 机器学习:MLflow 实验、模型评估、追踪分析
- AI Agent 开发:Knowledge Assistants、Genie Spaces、Supervisor Agent
- 数据分析:AI/BI Dashboards、自然语言数据探索
- 平台治理:Unity Catalog 资源管理
- 应用开发:Databricks Apps 全栈 Web 应用
安装方式#
项目级安装(推荐)#
Mac/Linux:
bash <(curl -sL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.sh)
Windows PowerShell:
irm https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.ps1 | iex
MCP Server 配置#
在项目根目录创建 .mcp.json:
{
"mcpServers": {
"databricks": {
"command": "uv",
"args": ["run", "--directory", "/path/to/ai-dev-kit", "python", "databricks-mcp-server/run_server.py"],
"defer_loading": true
}
}
}
Skills 安装#
# 安装所有 skills
curl -sSL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/databricks-skills/install_skills.sh | bash
# 安装特定 skills
curl -sSL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/databricks-skills/install_skills.sh | bash -s -- databricks-asset-bundles agent-evaluation