A development toolkit that equips AI coding assistants (Claude Code, Cursor, Windsurf) with Databricks platform capabilities, featuring 50+ MCP tools, 19 skill modules, and a Python core library. Supports SQL execution, Jobs management, Unity Catalog, Spark Declarative Pipelines, AI Agent development (Knowledge Assistants, Genie Spaces), and Model Serving.
Core Components#
databricks-tools-core/#
Python core library providing high-level Databricks function abstractions.
Main Modules:
sql/- SQL execution, warehouse management, table statisticsjobs/- Job CRUD, run managementunity_catalog/- Unity Catalog operationscompute/- Cluster management, remote executionspark_declarative_pipelines/- SDP operations
databricks-mcp-server/#
FastMCP-based MCP server exposing 50+ tools to AI assistants via @mcp.tool decorators.
databricks-skills/#
19 markdown skill documents covering:
- AI & Agents: agent-bricks, genie, model-serving, vector-search
- MLflow: agent-evaluation, analyze-mlflow-chat-session, analyze-mlflow-trace, instrumenting-with-mlflow-tracing
- Analytics: aibi-dashboards, unity-catalog
- Data Engineering: spark-declarative-pipelines, jobs, synthetic-data-generation
- Development: asset-bundles, app-apx, app-python, python-sdk, config
- Reference: docs, lakebase-provisioned
databricks-builder-app/#
Full-stack web application with Claude Code integration.
Feature Matrix#
SQL Operations#
execute_sql- Execute single SQL query with parameter bindingexecute_sql_multi- Execute multiple SQL statements with dependency-aware parallelismlist_warehouses,get_best_warehouse- List/auto-select best warehouseget_table_details- Get table statistics (NONE/SIMPLE/DETAILED)
Jobs Management#
- Lifecycle:
create_job,get_job,list_jobs,update_job,delete_job - Run control:
run_job_now,get_run,list_runs,cancel_run,wait_for_run - Defaults to serverless compute
Unity Catalog Operations#
catalogs.list_catalogs(),schemas.create_schema(),tables.create_table()
Compute Operations#
list_clusters,get_best_cluster,execute_databricks_command,run_python_file_on_databricks
File Operations#
upload_folder- Parallel folder uploadupload_file- Single file upload
Spark Declarative Pipelines (SDP)#
create_pipeline,get_pipeline,update_pipeline,delete_pipelinestart_update,get_update,stop_pipelinecreate_or_update_pipeline,find_pipeline_by_name
AI & Agents#
- Knowledge Assistants (KA):
manage_ka- RAG-based document Q&A assistant - Genie Spaces:
create_or_update_genie,get_genie,find_genie_by_name,delete_genie- Natural language data exploration - Supervisor Agent (MAS):
manage_mas- Multi-agent system coordination - AI/BI Dashboards:
create_or_update_dashboard,get_dashboard,list_dashboards - Model Serving:
get_serving_endpoint_status,query_serving_endpoint,list_serving_endpoints
Architecture Design#
AI Coding Assistant (Claude Code/Cursor/Windsurf)
│
├── Skills (.claude/skills/) ←→ MCP Tools (.claude/mcp.json)
│
└── MCP Protocol (stdio)
│
▼
databricks-mcp-server (FastMCP)
tools/sql.py | compute.py | file.py | jobs.py | pipelines.py | agent_bricks.py | aibi_dashboards.py | serving.py
│
└── Python imports
│
▼
databricks-tools-core
sql/ | compute/ | jobs/ | pipelines/ | unity_catalog/
│
└── Databricks SDK
│
▼
Databricks Workspace
Code Examples#
SQL Execution#
from databricks_tools_core.sql import execute_sql
# Simple query
result = execute_sql("SELECT * FROM my_catalog.my_schema.customers LIMIT 10")
# Parameterized query
result = execute_sql(
sql_query="SELECT COUNT(*) as cnt FROM customers",
warehouse_id="abc123def456",
catalog="my_catalog",
schema="my_schema",
)
Jobs Management#
from databricks_tools_core.jobs import create_job, run_job_now, wait_for_run
# Create job (defaults to serverless)
tasks = [
{
"task_key": "etl_task",
"notebook_task": {
"notebook_path": "/Workspace/ETL/process_data",
"source": "WORKSPACE",
},
}
]
job = create_job(name="my_etl_job", tasks=tasks)
# Run and wait for completion
run_id = run_job_now(job_id=job["job_id"])
result = wait_for_run(run_id=run_id, timeout=3600)
Multi-user Authentication#
from databricks_tools_core.auth import (
set_databricks_auth,
clear_databricks_auth,
get_workspace_client,
)
async def handle_request(user_host: str, user_token: str):
set_databricks_auth(user_host, user_token)
try:
result = execute_sql("SELECT current_user()")
client = get_workspace_client()
warehouses = client.warehouses.list()
finally:
clear_databricks_auth()
Authentication Mechanism#
Authentication Priority#
- Context variables (
set_databricks_auth) - Multi-user applications - Environment variables (
DATABRICKS_HOST,DATABRICKS_TOKEN) - Config profile (
DATABRICKS_CONFIG_PROFILEor~/.databrickscfg)
Environment Variable Configuration#
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-token"
Key Technical Implementations#
Parallel Execution Mechanism#
- SQL Dependency Analysis: Uses sqlglot to parse statement dependencies
- Dependency-aware Parallelism: Independent statements execute in parallel, dependent statements execute in order
- Example: t1 and t2 created in parallel, t3 waits for both to complete
Core Dependencies#
FastMCP- MCP server frameworkdatabricks-sdk>=0.20.0- Official Python SDKpydantic>=2.0.0- Data validationsqlglot>=20.0.0- SQL parsingcontextvars- Multi-user authentication context management
Use Cases#
- Data Engineering: Spark Declarative Pipelines, streaming tables, CDC, SCD Type 2
- Machine Learning: MLflow experiments, model evaluation, trace analysis
- AI Agent Development: Knowledge Assistants, Genie Spaces, Supervisor Agent
- Data Analytics: AI/BI Dashboards, natural language data exploration
- Platform Governance: Unity Catalog resource management
- Application Development: Databricks Apps full-stack web applications
Installation#
Project-level Installation (Recommended)#
Mac/Linux:
bash <(curl -sL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.sh)
Windows PowerShell:
irm https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.ps1 | iex
MCP Server Configuration#
Create .mcp.json in project root:
{
"mcpServers": {
"databricks": {
"command": "uv",
"args": ["run", "--directory", "/path/to/ai-dev-kit", "python", "databricks-mcp-server/run_server.py"],
"defer_loading": true
}
}
}
Skills Installation#
# Install all skills
curl -sSL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/databricks-skills/install_skills.sh | bash
# Install specific skills
curl -sSL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/databricks-skills/install_skills.sh | bash -s -- databricks-asset-bundles agent-evaluation