DISCOVER THE FUTURE OF AI AGENTSarrow_forward

Databricks AI Dev Kit

calendar_todayAdded Feb 23, 2026
categoryAgent & Tooling
codeOpen Source
PythonWorkflow AutomationModel Context ProtocolAI AgentsAgent FrameworkSDKCLIAgent & ToolingModel & Inference FrameworkDeveloper Tools & CodingProtocol, API & IntegrationData Analytics, BI & Visualization

A development toolkit that equips AI coding assistants (Claude Code, Cursor, Windsurf) with Databricks platform capabilities, featuring 50+ MCP tools, 19 skill modules, and a Python core library. Supports SQL execution, Jobs management, Unity Catalog, Spark Declarative Pipelines, AI Agent development (Knowledge Assistants, Genie Spaces), and Model Serving.

Core Components#

databricks-tools-core/#

Python core library providing high-level Databricks function abstractions.

Main Modules:

  • sql/ - SQL execution, warehouse management, table statistics
  • jobs/ - Job CRUD, run management
  • unity_catalog/ - Unity Catalog operations
  • compute/ - Cluster management, remote execution
  • spark_declarative_pipelines/ - SDP operations

databricks-mcp-server/#

FastMCP-based MCP server exposing 50+ tools to AI assistants via @mcp.tool decorators.

databricks-skills/#

19 markdown skill documents covering:

  • AI & Agents: agent-bricks, genie, model-serving, vector-search
  • MLflow: agent-evaluation, analyze-mlflow-chat-session, analyze-mlflow-trace, instrumenting-with-mlflow-tracing
  • Analytics: aibi-dashboards, unity-catalog
  • Data Engineering: spark-declarative-pipelines, jobs, synthetic-data-generation
  • Development: asset-bundles, app-apx, app-python, python-sdk, config
  • Reference: docs, lakebase-provisioned

databricks-builder-app/#

Full-stack web application with Claude Code integration.

Feature Matrix#

SQL Operations#

  • execute_sql - Execute single SQL query with parameter binding
  • execute_sql_multi - Execute multiple SQL statements with dependency-aware parallelism
  • list_warehouses, get_best_warehouse - List/auto-select best warehouse
  • get_table_details - Get table statistics (NONE/SIMPLE/DETAILED)

Jobs Management#

  • Lifecycle: create_job, get_job, list_jobs, update_job, delete_job
  • Run control: run_job_now, get_run, list_runs, cancel_run, wait_for_run
  • Defaults to serverless compute

Unity Catalog Operations#

  • catalogs.list_catalogs(), schemas.create_schema(), tables.create_table()

Compute Operations#

  • list_clusters, get_best_cluster, execute_databricks_command, run_python_file_on_databricks

File Operations#

  • upload_folder - Parallel folder upload
  • upload_file - Single file upload

Spark Declarative Pipelines (SDP)#

  • create_pipeline, get_pipeline, update_pipeline, delete_pipeline
  • start_update, get_update, stop_pipeline
  • create_or_update_pipeline, find_pipeline_by_name

AI & Agents#

  • Knowledge Assistants (KA): manage_ka - RAG-based document Q&A assistant
  • Genie Spaces: create_or_update_genie, get_genie, find_genie_by_name, delete_genie - Natural language data exploration
  • Supervisor Agent (MAS): manage_mas - Multi-agent system coordination
  • AI/BI Dashboards: create_or_update_dashboard, get_dashboard, list_dashboards
  • Model Serving: get_serving_endpoint_status, query_serving_endpoint, list_serving_endpoints

Architecture Design#

AI Coding Assistant (Claude Code/Cursor/Windsurf)
    │
    ├── Skills (.claude/skills/) ←→ MCP Tools (.claude/mcp.json)
    │
    └── MCP Protocol (stdio)
            │
            ▼
    databricks-mcp-server (FastMCP)
    tools/sql.py | compute.py | file.py | jobs.py | pipelines.py | agent_bricks.py | aibi_dashboards.py | serving.py
            │
            └── Python imports
                    │
                    ▼
            databricks-tools-core
            sql/ | compute/ | jobs/ | pipelines/ | unity_catalog/
                    │
                    └── Databricks SDK
                            │
                            ▼
                    Databricks Workspace

Code Examples#

SQL Execution#

from databricks_tools_core.sql import execute_sql

# Simple query
result = execute_sql("SELECT * FROM my_catalog.my_schema.customers LIMIT 10")

# Parameterized query
result = execute_sql(
    sql_query="SELECT COUNT(*) as cnt FROM customers",
    warehouse_id="abc123def456",
    catalog="my_catalog",
    schema="my_schema",
)

Jobs Management#

from databricks_tools_core.jobs import create_job, run_job_now, wait_for_run

# Create job (defaults to serverless)
tasks = [
    {
        "task_key": "etl_task",
        "notebook_task": {
            "notebook_path": "/Workspace/ETL/process_data",
            "source": "WORKSPACE",
        },
    }
]
job = create_job(name="my_etl_job", tasks=tasks)

# Run and wait for completion
run_id = run_job_now(job_id=job["job_id"])
result = wait_for_run(run_id=run_id, timeout=3600)

Multi-user Authentication#

from databricks_tools_core.auth import (
    set_databricks_auth,
    clear_databricks_auth,
    get_workspace_client,
)

async def handle_request(user_host: str, user_token: str):
    set_databricks_auth(user_host, user_token)
    try:
        result = execute_sql("SELECT current_user()")
        client = get_workspace_client()
        warehouses = client.warehouses.list()
    finally:
        clear_databricks_auth()

Authentication Mechanism#

Authentication Priority#

  1. Context variables (set_databricks_auth) - Multi-user applications
  2. Environment variables (DATABRICKS_HOST, DATABRICKS_TOKEN)
  3. Config profile (DATABRICKS_CONFIG_PROFILE or ~/.databrickscfg)

Environment Variable Configuration#

export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-token"

Key Technical Implementations#

Parallel Execution Mechanism#

  • SQL Dependency Analysis: Uses sqlglot to parse statement dependencies
  • Dependency-aware Parallelism: Independent statements execute in parallel, dependent statements execute in order
  • Example: t1 and t2 created in parallel, t3 waits for both to complete

Core Dependencies#

  • FastMCP - MCP server framework
  • databricks-sdk>=0.20.0 - Official Python SDK
  • pydantic>=2.0.0 - Data validation
  • sqlglot>=20.0.0 - SQL parsing
  • contextvars - Multi-user authentication context management

Use Cases#

  1. Data Engineering: Spark Declarative Pipelines, streaming tables, CDC, SCD Type 2
  2. Machine Learning: MLflow experiments, model evaluation, trace analysis
  3. AI Agent Development: Knowledge Assistants, Genie Spaces, Supervisor Agent
  4. Data Analytics: AI/BI Dashboards, natural language data exploration
  5. Platform Governance: Unity Catalog resource management
  6. Application Development: Databricks Apps full-stack web applications

Installation#

Mac/Linux:

bash <(curl -sL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.sh)

Windows PowerShell:

irm https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.ps1 | iex

MCP Server Configuration#

Create .mcp.json in project root:

{
  "mcpServers": {
    "databricks": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/ai-dev-kit", "python", "databricks-mcp-server/run_server.py"],
      "defer_loading": true
    }
  }
}

Skills Installation#

# Install all skills
curl -sSL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/databricks-skills/install_skills.sh | bash

# Install specific skills
curl -sSL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/databricks-skills/install_skills.sh | bash -s -- databricks-asset-bundles agent-evaluation

Related Projects

View All arrow_forward

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.

rocket_launch