Create a Databricks metadata target

LiveData Migrator for Azure supports metadata migration to Azure Databricks.

note

Databricks is currently available as a preview feature and is under development. If you use Databricks as a target metastore and have feedback to share, please contact Support.

Use LiveData Migrator to integrate with Databricks and migrate structured data from Hadoop to Databricks tables, including converting automatically from source Hive formats to Delta Lake format used in Databricks.

Configure a Databricks metadata agent using the Azure Portal or CLI, and connect it to your Databricks cluster.

Prerequisites

To ensure a successful migration to Databricks, the source tables must be in one of the following formats:

CSV
JSON
AVRO
ORC
PARQUET
Text

Ensure you have the following before you start:

Example: Script to mount ADLS Gen2 or blob storage with Azure Blob File System

configs = {"fs.azure.account.auth.type": "OAuth",
       "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
       "fs.azure.account.oauth2.client.id": "<application-id>",
       "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
       "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}
# Optionally, you can add example-directory-name to the source URI of your mount point.
dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
mount_point = "/mnt/<mount-name>",
extra_configs = configs)

Replace:

<application-id> with the Application (client) ID for the Azure Active Directory application.
<scope-name> with the Databricks secret scope name.
<service-credential-key-name> with the name of the key containing the client secret.
<directory-id> with the Directory (tenant) ID for the Azure Active Directory application.
<container-name> with the name of a container in the ADLS Gen2 storage account.
<storage-account-name> with the ADLS Gen2 storage account name.
<mount-name> with the name of the intended mount point in DBFS.

Install Databricks driver

To use a Databricks metadata target, you need to supply a DatabricksJDBC42.jar file to the cluster LiveData Migrator for Azure is deployed on:

Download Databricks JDBC driver to your LiveData Migrator host system.
info
LiveData Migrator for Azure only supports JDBC driver version 2.6.25 or higher.
Unzip the package to gain access to the DatabricksJDBC42.jar file.
Move the DatabricksJDBC42.jar file to the following directory:
```
/opt/wandisco/hivemigrator/agent/databricks
```
Change the owner of the .jar file to the system user and group that runs the Hive Migrator service. By default, these are "hive" and "hadoop" respectively:
Example for hive:hadoop
```
chown hive:hadoop /opt/wandisco/hivemigrator/agent/databricks/DatabricksJDBC42.jar
```

LiveData Migrator will detect the .jar file automatically in this location. You're now ready to create a Databricks metadata target.