Skip to main content
Version: 2.0

Configure Databricks as a target

note

Databricks is currently available as a preview feature and under development. If you use Databricks as a target metastore with Data Migrator, and have feedback to share, please contact WANdisco Support.

The feature is automatically enabled.

See Preview features.

Configure a Databricks metadata agent

Use Data Migrator to integrate with Databricks and migrate structured data from Hadoop to Databricks tables, including converting automatically from source Hive formats to Delta Lake format used in Databricks.

Configure a Databricks metadata agent in Data Migrator using the UI or CLI, and connect it to your Databricks cluster.

Prerequisites

To ensure a successful migration to Databricks, the source tables must be in one of the following formats:

  • CSV
  • JSON
  • AVRO
  • ORC
  • PARQUET
  • Text

Ensure you have the following before you start:

Example: Script to mount ADLS Gen2 or blob storage with Azure Blob File System

configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<application-id>",
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}

# Optionally, you can add example-directory-name to the source URI of your mount point.
dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
mount_point = "/mnt/<mount-name>",
extra_configs = configs)

Replace:

  • <application-id> with the Application (client) ID for the Azure Active Directory application.
  • <scope-name> with the Databricks secret scope name.
  • <service-credential-key-name> with the name of the key containing the client secret.
  • <directory-id> with the Directory (tenant) ID for the Azure Active Directory application.
  • <container-name> with the name of a container in the ADLS Gen2 storage account.
  • <storage-account-name> with the ADLS Gen2 storage account name.
  • <mount-name> with the name of the intended mount point in DBFS.

Install Databricks driver

To install the JDBC driver:

  1. Download the Databricks JDBC driver.

    note

    Data Migrator only supports JDBC driver version 2.6.25 or higher.

  2. Unzip the package and upload the DatabricksJDBC42.jar file to the Data Migrator host machine.

  3. Move the DatabricksJDBC42.jar file to the Data Migrator directory:

    /opt/wandisco/hivemigrator/agent/databricks
  4. Change ownership of the jar file to the Hive Migrator system user and group:

    chown hive:hadoop /opt/wandisco/hivemigrator/agent/databricks/DatabricksJDBC42.jar

Configure Databricks as a target with the UI

To add Databricks from your Dashboard:

  1. From the Dashboard, select a product under Products.

  2. Under Metastore Agents, select Add metastore agent.

  3. Select the filesystem.

  4. Select Databricks as the Metastore Type.

  5. Enter a Display Name.

  6. Enter the JDBC Server Hostname, Port, and HTTP Path.

  7. Enter the Databricks Access Token.

    note

    You’ll need to reenter the access token when updating this agent.

  8. Select Convert to Delta Lake if you want to convert your tables.

  9. Enter the Filesystem Mount Point.
    The filesystem that contains your data you want to migrate must be mounted onto your DBFS.
    Enter the mounted container's path on the DBFS.

    Example: Mounted container's path
    /mounted/container/path
  10. Enter path for Default Filesystem Override.

    1. If you select Convert to Delta Lake , enter the location on the DBFS to store the tables converted to Delta Lake. To store Delta Lake tables on cloud storage, enter the path to the mount point and the path on the cloud storage.

      Example: Location on the DBFS to store tables converted to Delta Lake
      dbfs:<location>
      Example: Cloud storage location
      dbfs:/mnt/adls2/storage_account/delta_tables
    2. If you don't select Convert to Delta, enter the mount point.

      Example: Filesystem mount point
      dbfs:<value of Fs mount point>
  11. Select Save.

Next steps

Create a metadata migration.

info

Under certain conditions with a Databricks target, source truncate operations may take longer than expected. See the following Knowledge base article for more information.