发现 AI 代理的未来arrow_forward

Databricks AI Dev Kit

calendar_today收录于 2026年2月23日
category智能体与应用工具
code开源
Python工作流自动化MCPAI代理智能体框架SDKCLI智能体与应用工具模型与推理框架开发者工具/代码协议/API/集成数据分析/BI/可视化

为 AI 编程助手(Claude Code、Cursor、Windsurf)提供 Databricks 平台工具链的开发工具包,包含 50+ MCP 工具、19 个技能模块和 Python 核心库。支持 SQL 执行、Jobs 管理、Unity Catalog、Spark Declarative Pipelines、AI Agent 开发(Knowledge Assistants、Genie Spaces)和 Model Serving。

核心组件#

databricks-tools-core/#

Python 核心库,提供高级 Databricks 函数封装。

主要模块

  • sql/ - SQL 执行、warehouse 管理、表统计
  • jobs/ - 作业 CRUD、运行管理
  • unity_catalog/ - Unity Catalog 操作
  • compute/ - 集群管理、远程执行
  • spark_declarative_pipelines/ - SDP 操作

databricks-mcp-server/#

基于 FastMCP 的 MCP 服务器,通过 @mcp.tool 装饰器暴露 50+ 工具给 AI 助手。

databricks-skills/#

19 个 markdown 技能文档,覆盖:

  • AI & Agents: agent-bricks, genie, model-serving, vector-search
  • MLflow: agent-evaluation, analyze-mlflow-chat-session, analyze-mlflow-trace, instrumenting-with-mlflow-tracing
  • Analytics: aibi-dashboards, unity-catalog
  • Data Engineering: spark-declarative-pipelines, jobs, synthetic-data-generation
  • Development: asset-bundles, app-apx, app-python, python-sdk, config
  • Reference: docs, lakebase-provisioned

databricks-builder-app/#

全栈 Web 应用,集成 Claude Code。

功能矩阵#

SQL Operations#

  • execute_sql - 执行单条 SQL 查询,支持参数绑定
  • execute_sql_multi - 执行多条 SQL,支持依赖感知并行执行
  • list_warehouses, get_best_warehouse - 列出/自动选择最佳 warehouse
  • get_table_details - 获取表统计信息(NONE/SIMPLE/DETAILED)

Jobs Management#

  • 生命周期:create_job, get_job, list_jobs, update_job, delete_job
  • 运行控制:run_job_now, get_run, list_runs, cancel_run, wait_for_run
  • 默认使用 serverless compute

Unity Catalog Operations#

  • catalogs.list_catalogs(), schemas.create_schema(), tables.create_table()

Compute Operations#

  • list_clusters, get_best_cluster, execute_databricks_command, run_python_file_on_databricks

File Operations#

  • upload_folder - 并行上传文件夹
  • upload_file - 上传单个文件

Spark Declarative Pipelines (SDP)#

  • create_pipeline, get_pipeline, update_pipeline, delete_pipeline
  • start_update, get_update, stop_pipeline
  • create_or_update_pipeline, find_pipeline_by_name

AI & Agents#

  • Knowledge Assistants (KA): manage_ka - RAG 文档问答助手
  • Genie Spaces: create_or_update_genie, get_genie, find_genie_by_name, delete_genie - 自然语言数据探索
  • Supervisor Agent (MAS): manage_mas - 多 Agent 系统协调
  • AI/BI Dashboards: create_or_update_dashboard, get_dashboard, list_dashboards
  • Model Serving: get_serving_endpoint_status, query_serving_endpoint, list_serving_endpoints

架构设计#

AI Coding Assistant (Claude Code/Cursor/Windsurf)
    │
    ├── Skills (.claude/skills/) ←→ MCP Tools (.claude/mcp.json)
    │
    └── MCP Protocol (stdio)
            │
            ▼
    databricks-mcp-server (FastMCP)
    tools/sql.py | compute.py | file.py | jobs.py | pipelines.py | agent_bricks.py | aibi_dashboards.py | serving.py
            │
            └── Python imports
                    │
                    ▼
            databricks-tools-core
            sql/ | compute/ | jobs/ | pipelines/ | unity_catalog/
                    │
                    └── Databricks SDK
                            │
                            ▼
                    Databricks Workspace

代码示例#

SQL 执行#

from databricks_tools_core.sql import execute_sql

# 简单查询
result = execute_sql("SELECT * FROM my_catalog.my_schema.customers LIMIT 10")

# 带参数的查询
result = execute_sql(
    sql_query="SELECT COUNT(*) as cnt FROM customers",
    warehouse_id="abc123def456",
    catalog="my_catalog",
    schema="my_schema",
)

Jobs 管理#

from databricks_tools_core.jobs import create_job, run_job_now, wait_for_run

# 创建 job(默认 serverless)
tasks = [
    {
        "task_key": "etl_task",
        "notebook_task": {
            "notebook_path": "/Workspace/ETL/process_data",
            "source": "WORKSPACE",
        },
    }
]
job = create_job(name="my_etl_job", tasks=tasks)

# 运行并等待完成
run_id = run_job_now(job_id=job["job_id"])
result = wait_for_run(run_id=run_id, timeout=3600)

多用户认证#

from databricks_tools_core.auth import (
    set_databricks_auth,
    clear_databricks_auth,
    get_workspace_client,
)

async def handle_request(user_host: str, user_token: str):
    set_databricks_auth(user_host, user_token)
    try:
        result = execute_sql("SELECT current_user()")
        client = get_workspace_client()
        warehouses = client.warehouses.list()
    finally:
        clear_databricks_auth()

认证机制#

认证优先级#

  1. Context variables (set_databricks_auth) - 多用户应用
  2. Environment variables (DATABRICKS_HOST, DATABRICKS_TOKEN)
  3. Config profile (DATABRICKS_CONFIG_PROFILE or ~/.databrickscfg)

环境变量配置#

export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-token"

关键技术实现#

并行执行机制#

  • SQL 依赖分析:使用 sqlglot 解析语句依赖关系
  • 依赖感知并行:独立语句并行执行,依赖语句按序执行
  • 例如:t1 和 t2 并行创建,t3 等待两者完成后执行

核心依赖#

  • FastMCP - MCP server 框架
  • databricks-sdk>=0.20.0 - 官方 Python SDK
  • pydantic>=2.0.0 - 数据验证
  • sqlglot>=20.0.0 - SQL 解析
  • contextvars - 多用户认证上下文管理

使用场景#

  1. 数据工程:Spark Declarative Pipelines、流式表、CDC、SCD Type 2
  2. 机器学习:MLflow 实验、模型评估、追踪分析
  3. AI Agent 开发:Knowledge Assistants、Genie Spaces、Supervisor Agent
  4. 数据分析:AI/BI Dashboards、自然语言数据探索
  5. 平台治理:Unity Catalog 资源管理
  6. 应用开发:Databricks Apps 全栈 Web 应用

安装方式#

项目级安装(推荐)#

Mac/Linux:

bash <(curl -sL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.sh)

Windows PowerShell:

irm https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.ps1 | iex

MCP Server 配置#

在项目根目录创建 .mcp.json:

{
  "mcpServers": {
    "databricks": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/ai-dev-kit", "python", "databricks-mcp-server/run_server.py"],
      "defer_loading": true
    }
  }
}

Skills 安装#

# 安装所有 skills
curl -sSL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/databricks-skills/install_skills.sh | bash

# 安装特定 skills
curl -sSL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/databricks-skills/install_skills.sh | bash -s -- databricks-asset-bundles agent-evaluation

保持更新

获取最新的 AI 工具和趋势,直接发送到您的收件箱。没有垃圾邮件,只有智能。

rocket_launch