Skip to main content
Version: 2.4.3 (latest)

Configure Databricks as a target

note

Databricks is currently available as a preview feature and under development. If you use Databricks as a target metastore with Data Migrator, and have feedback to share, please contact Support.

The feature is automatically enabled. See Preview features.

Configure a Databricks metadata agent

Use Data Migrator to integrate with Databricks and migrate structured data from Hadoop to Databricks tables, including converting automatically from source Hive formats to Delta Lake format used in Databricks.

Configure a Databricks metadata agent in Data Migrator using the UI or CLI, and connect it to your Databricks cluster.

Prerequisites

caution

Review and complete the prerequisites section linked here before attempting to add a Databricks Metastore Agent.

Configure Databricks as a target with the UI

note

When using Unity catalog as the Metastore Type, Delta tables, when migrated, are created as external tables in Databricks. Other Source formats are created as managed Delta tables and data is converted and copied into the table.

To add Databricks from your Dashboard:

  1. From the Dashboard, select an instance under Instances.

  2. Under Filesystems & Agents, select Metastore Agents.

  3. Select Connect to Metastore.

  4. Select the filesystem.

  5. Select Databricks as the Metastore Type.

  6. Enter a Display Name.

  7. Enter the JDBC Server Hostname, Port, and HTTP Path.

  8. Enter the Databricks Access Token.

    note

    You’ll need to reenter the access token when updating this agent.

  9. Select a Metastore Type, choose either Unity Catalog or Workspace Hive Metastore (Legacy).

Unity Catalog
  1. Enter the name of your Databricks Unity Catalog under Catalog.

  2. Specify the external location by appending the pre-populated URI under External Location.

    info

    Ensure the external location you specify has been created in Databricks. Learn more from Azure, AWS and GCP.

Example: External location path.
abfss://file_system@account_name.dfs.core.windows.net/dir/subdir
  1. Select Delete after conversion to delete raw data after it has been converted to Delta format and migrated to Databricks.
    info

    Only use this option if you're performing one-time migrations for the underlying table data. The Databricks agent doesn't support continuous (live) updates of table data if you're converting to Delta Lake in Databricks.

Workspace Hive Metastore (Legacy)
  1. Enter the Filesystem Mount Point. The file system that contains your data you want to migrate must be mounted onto your DBFS.
    Enter the mounted container's path on the DBFS.
Example: Mounted container's path
/mounted/container/path
  1. Select Convert to Delta format if you want to convert your tables to Delta Lake format.

  2. Select Delete after conversion to delete the underlying table data and metadata from the Filesystem Mount Point location after it has been converted to Delta Lake in Databricks.

    info

    Only use this option if you're performing one-time migrations for the underlying table data. The Databricks agent doesn't support continuous (live) updates of table data if you're converting to Delta Lake in Databricks.

  3. Enter path for Default Filesystem Override.

    1. If you select Convert to Delta Lake , enter the location on the DBFS to store the tables converted to Delta Lake. To store Delta Lake tables on cloud storage, enter the path to the mount point and the path on the cloud storage.

      Example: Location on the DBFS to store tables converted to Delta Lake
      dbfs:<location>
      Example: Cloud storage location
      dbfs:/mnt/adls2/storage_account/delta_tables
    2. If you don't select Convert to Delta, enter the mount point.

      Example: Filesystem mount point
      dbfs:<value of Fs mount point>
  1. Select Save to add the Metastore Agent.

Next steps

If you have already added Metadata Rules, create a Metadata Migration.

tip

Databricks caching can result in data not being visible on the target. Refresh the cache by issuing a REFRESH TABLE command on the target. See the Databricks guide here to learn more.

info

Under certain conditions with a Databricks target, source truncate operations may take longer than expected. See the following Knowledge base article for more information.

Configure Databricks as a target with the CLI

To add Databricks as a metadata agent with the CLI, use the hive agent add databricks to add either a Unity Catalog or Workspace Hive Metastore (Legacy) Metastore Agent.

Examples

Example for Unity Catalog Databricks agent
hive agent add databricks --file-system-id adls-target --name UnityExample --jdbc-server-hostname mydbcluster.cloud.databricks.com --jdbc-port 443 --jdbc-http-path sql/protocolv1/o/8445611123456789/0234-125567-testy978 --access-token *** --catalog cat1 --external-location abfss://file_system@account_name.dfs.core.windows.net/dir/subdir --delete-after-conversion
Example for Workspace Hive Metastore (Legacy) Databricks agent
hive agent add databricks --name LegacyExample --jdbc-server-hostname mydbcluster.cloud.databricks.com --jdbc-port 443 --jdbc-http-path sql/protocolv1/o/8445611123456789/0234-125567-testy978 --access-token *** --fs-mount-point /mnt/mybucket --convert-to-delta --delete-after-conversion --default-fs-override dbfs:/mnt/mybucketname --file-system-id aws-target
note

To ensure you see all of your migrated data in Databricks, set the value of default-fs-override to dbfs:/path/ and replace /path/ with the value from the --fs-mount-point parameter.

--default-fs-override dbfs:/mnt/mybucketname

Next steps

Create a metadata migration with the CLI using the Databricks target agent you just configured.