Databricks and Unity Catalog from a DevOps perspective

At the end of January I spoke at Microsoft Azure User Group Poland in Wrocław — 29 January, Azure Databricks and Unity Catalog from a DevOps perspective. English meetup, mostly platform and application engineers rather than data scientists.

I had joined a Databricks engagement from the Azure infrastructure side — Terraform, identity, pipelines — rather than from notebooks. The meetup was a compressed version of that delivery work. I am expanding the same material into a longer whitepaper with diagrams; this post is the short form from right after the talk.

If you come from application DevOps, Databricks terminology can feel unfamiliar at first. In the lecture I mapped it to something closer to home: notebook ≈ pipeline YAML, cluster ≈ hosted agent, job run ≈ pipeline run, workflow ≈ pipeline definition. Shared compute, repeatable jobs, permissions on resources — except the resources are tables and catalogs instead of repositories and service connections.

The first time Unity Catalog showed up in our Terraform pipeline, the workspace apply succeeded and the metastore assignment step failed with a permission error that did not mention metastores at all. Workspace admin on the deployment service principal was not enough — metastore admin rights were required for databricks_metastore_assignment. Once we knew that, the fix was straightforward; until then it cost an afternoon of reading provider issues and Databricks docs.

That gap is typical of how Unity Catalog lands in platform work: the data team talks about catalogs and schemas, while the DevOps side discovers account-level providers, Access Connectors, and a second layer of RBAC that does not show up in workspace-only tutorials. It was the hook I opened the meetup with — several people in the room had hit the same wall.

Without Unity Catalog, permissions are configured per workspace — users, groups, service principals, cluster creation, notebook ACLs — and that configuration is repeated for dev, test, and production. That can work for a time. As the number of workspaces grows, groups and permissions drift, and it becomes harder to answer simple questions such as who can read a given table in the gold layer.

Unity Catalog (open-sourced by Databricks in June 2024) adds account-level metastores and a hierarchy of catalog → schema → table. Multiple workspaces in a region can attach to one metastore. The Account Console handles account-wide users, groups, and metastore administration, while workspace admins remain important for compute and workspace-local settings.

On Azure I often see data arrive through Azure Data Factory or similar tooling, sometimes with ETL outside Databricks, then medallion layers inside: bronze for raw data, silver for cleaned and conformed data, gold for business-ready outputs. Those tiers also shape access — some users only need gold, others need silver or bronze for exploration. Governance works better when it follows the data tiers, not only workspace membership.

Entra group sync into Databricks runs on a schedule, so access changes are not always immediate. Fine for most day-to-day work; worth remembering if you depend on just-in-time access patterns.

Secret scopes for sensitive values can be Databricks-backed or Key Vault-backed. On Azure, Key Vault is usually the better fit. Creating scopes from the UI uses a dedicated URL on the workspace (#secrets/createScope); we automated scopes with Terraform and the CLI for repeatability.

Key Vault-backed scopes record retrievals against the identity that created the scope. If that was a personal admin account during a one-off test, audit logs may not reflect the service that actually uses the secret in production. A dedicated deployment service principal for Terraform keeps that clearer — that detail got nods in the Q&A.

Unity Catalog governs data access. Compute — who can create or use clusters — is still managed through cluster policies and workspace permissions. Both layers matter. Access modes (single-user vs shared) add further rules; the current Databricks documentation is the right place to check when designing notebooks and jobs.

AzureRM provisions the Databricks workspace resource and its managed resource group, similar in spirit to AKS. Related Azure resources include the access connector, VNet peering, and customer-managed keys for root storage.

The databricks/databricks provider manages resources inside the workspace and at account level. On Azure I typically use two provider configurations: host = workspace_url for workspace resources, and host = https://accounts.azuredatabricks.net with account_id for metastores and account-level groups. Aliases such as workspace and accounts help when both appear in one Terraform state.

Databricks Access Connector is an Azure resource with a managed identity Databricks uses to reach storage and other Azure services. You can define more than one connector — for example, a default connector with limited RBAC and another scoped for a specific group and storage account. Metastore deployment requires a connector with access to the backing storage account. Databricks generally recommends one metastore per region, with regional workspaces attached to it.

One limitation worth verifying in your environment: Azure DevOps service connections using workload identity federation did not authenticate successfully to the Databricks account provider when we last tested. Workspace-level federation may work while account-level operations still require a service principal with a secret or another supported auth path. Check current provider guidance if federation is part of your design.

Adding metastores to mature workspaces later tends to be a larger migration than teams expect — that came up in audience questions after the talk. Catalogs modeled around medallion layers or business domains age better than catalogs named after individual contributors. Separate deployment service principals for workspace Terraform and account Terraform, each with least privilege, keeps audit trails and permission scopes from mixing.

The longer whitepaper will go further on metastore diagrams and provider examples. When it is published I will link it from Talks.