Skip to main content

Create a Databricks metadata target

LiveData Migrator for Azure supports metadata migration to Azure Databricks.

note

Databricks is currently available as a preview feature and is under development. If you use Databricks as a target metastore and have feedback to share, please contact Support.

Use LiveData Migrator to integrate with Databricks and migrate structured data from Hadoop to Databricks tables, including converting automatically from source Hive formats to Delta Lake format used in Databricks.

Configure a Databricks metadata agent using the Azure Portal or CLI, and connect it to your Databricks cluster.

Prerequisites

To ensure a successful migration to Databricks, the source tables must be in one of the following formats:

  • CSV
  • JSON
  • AVRO
  • ORC
  • PARQUET
  • Text

Ensure you have the following before you start:

Example: Script to mount ADLS Gen2 or blob storage with Azure Blob File System

configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<application-id>",
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}
# Optionally, you can add example-directory-name to the source URI of your mount point.
dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
mount_point = "/mnt/<mount-name>",
extra_configs = configs)

Replace:

  • <application-id> with the Application (client) ID for the Azure Active Directory application.
  • <scope-name> with the Databricks secret scope name.
  • <service-credential-key-name> with the name of the key containing the client secret.
  • <directory-id> with the Directory (tenant) ID for the Azure Active Directory application.
  • <container-name> with the name of a container in the ADLS Gen2 storage account.
  • <storage-account-name> with the ADLS Gen2 storage account name.
  • <mount-name> with the name of the intended mount point in DBFS.

Install Databricks driver

To use a Databricks metadata target, you need to supply a DatabricksJDBC42.jar file to the cluster LiveData Migrator for Azure is deployed on:

  1. Download Databricks JDBC driver to your LiveData Migrator host system.

    info

    LiveData Migrator for Azure only supports JDBC driver version 2.6.25 or higher.

  2. Unzip the package to gain access to the DatabricksJDBC42.jar file.

  3. Move the DatabricksJDBC42.jar file to the following directory:

    /opt/wandisco/hivemigrator/agent/databricks
  4. Change the owner of the .jar file to the system user and group that runs the Hive Migrator service. By default, these are "hive" and "hadoop" respectively:

    Example for hive:hadoop
    chown hive:hadoop /opt/wandisco/hivemigrator/agent/databricks/DatabricksJDBC42.jar

LiveData Migrator will detect the .jar file automatically in this location. You're now ready to create a Databricks metadata target.

Create a Databricks metadata target

  1. In the Azure Portal, navigate to the LiveData Migrator resource page.

  2. From the LiveData Migrator menu on the left, select Metadata Targets.

  3. Select Create.

  4. Enter a Name for the metadata target as you want it to appear in your resource list.

  5. Under the Basics tab, select Databricks Target in the Type dropdown list.

  6. Complete the Databricks details:

    • JDBC Server Hostname: The domain name or IP address to use when connecting to the JDBC server. For example, hostname.
    • JDBC Port: The port to use when accessing the JDBC server. For example, 1433.
    • JDBC Http Path: The path to the Compute resource on Databricks. For example, sql/protocolv1/o/1010101010101010/1010-101010-eXaMpLe1.
    • Access Token: The access token to use when authenticating with the JDBC server. For example, s8Fjs823JdkeXaMpLeKeYWoSd82WjD23kSd8.

    To find your JDBC server details, see the Databricks documentation to find your JDBC server details. To generate an access token, see the Microsoft documentation.

  7. Select the Filesystem Details tab and complete the details:

    • (Optional) Catalog: Enter the name of your Databricks Unity Catalog.

    • Convert to delta format: Convert metadata sent to your Databricks cluster to Delta Lake format.

    • FS Mount Point: The ADLS location in your Databricks cluster where you've mounted your cloud filesystem. For example, /mnt/mybucketname.

    • Data Target: Set a previously created data target for this metadata target.

    • DefaultFS Override: - Enter path for Default Filesystem Override.

      1. If you select Convert to Delta Lake , enter the location on the DBFS to store the tables converted to Delta Lake. To store Delta Lake tables on cloud storage, enter the path to the mount point and the path on the cloud storage.

        Example: Location on the DBFS to store tables converted to Delta Lake
        dbfs:<location>
        Example: Cloud storage location
        dbfs:/mnt/adls2/storage_account/
      2. If you don't select Convert to Delta, enter the mount point.

        Example: Filesystem mount point
        dbfs:<value of fs mount point>
  8. Select Review and create.

  9. Select Create.

note

See Update authentication settings for information on updating your authentication settings with the UI.

Next steps

You can migrate metadata to your Databricks target.