Create a Databricks metadata target
LiveData Migrator for Azure supports metadata migration to Azure Databricks.
Databricks is currently available as a preview feature and is under development. If you use Databricks as a target metastore and have feedback to share, please contact Support.
Use LiveData Migrator to integrate with Databricks and migrate structured data from Hadoop to Databricks tables, including converting automatically from source Hive formats to Delta Lake format used in Databricks.
Configure a Databricks metadata agent using the Azure Portal or CLI, and connect it to your Databricks cluster.
Prerequisites
To ensure a successful migration to Databricks, the source tables must be in one of the following formats:
- CSV
- JSON
- AVRO
- ORC
- PARQUET
- Text
Ensure you have the following before you start:
- A Databricks cluster
- A Databricks File System (DBFS)
- Cloud storage mounted onto the DBFS
- Install Databricks driver
Example: Script to mount ADLS Gen2 or blob storage with Azure Blob File System
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<application-id>",
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}
# Optionally, you can add example-directory-name to the source URI of your mount point.
dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
mount_point = "/mnt/<mount-name>",
extra_configs = configs)
Replace:
- <application-id> with the Application (client) ID for the Azure Active Directory application.
- <scope-name> with the Databricks secret scope name.
- <service-credential-key-name> with the name of the key containing the client secret.
- <directory-id> with the Directory (tenant) ID for the Azure Active Directory application.
- <container-name> with the name of a container in the ADLS Gen2 storage account.
- <storage-account-name> with the ADLS Gen2 storage account name.
- <mount-name> with the name of the intended mount point in DBFS.
Install Databricks driver
To use a Databricks metadata target, you need to supply a DatabricksJDBC42.jar
file to the cluster LiveData Migrator for Azure is deployed on:
Download Databricks JDBC driver to your LiveData Migrator host system.
infoLiveData Migrator for Azure only supports JDBC driver version 2.6.25 or higher.
Unzip the package to gain access to the
DatabricksJDBC42.jar
file.Move the
DatabricksJDBC42.jar
file to the following directory:/opt/wandisco/hivemigrator/agent/databricks
Change the owner of the .jar file to the system user and group that runs the Hive Migrator service. By default, these are "hive" and "hadoop" respectively:
Example for hive:hadoopchown hive:hadoop /opt/wandisco/hivemigrator/agent/databricks/DatabricksJDBC42.jar
LiveData Migrator will detect the .jar file automatically in this location. You're now ready to create a Databricks metadata target.
Create a Databricks metadata target
- Azure Portal
- Azure CLI
In the Azure Portal, navigate to the LiveData Migrator resource page.
From the LiveData Migrator menu on the left, select Metadata Targets.
Select Create.
Enter a Name for the metadata target as you want it to appear in your resource list.
Under the Basics tab, select Databricks Target in the Type dropdown list.
Complete the Databricks details:
- JDBC Server Hostname: The domain name or IP address to use when connecting to the JDBC server. For example,
hostname
. - JDBC Port: The port to use when accessing the JDBC server. For example,
1433
. - JDBC Http Path: The path to the Compute resource on Databricks. For example,
sql/protocolv1/o/1010101010101010/1010-101010-eXaMpLe1
. - Access Token: The access token to use when authenticating with the JDBC server. For example,
s8Fjs823JdkeXaMpLeKeYWoSd82WjD23kSd8
.
To find your JDBC server details, see the Databricks documentation to find your JDBC server details. To generate an access token, see the Microsoft documentation.
- JDBC Server Hostname: The domain name or IP address to use when connecting to the JDBC server. For example,
Select the Filesystem Details tab and complete the details:
(Optional) Catalog: Enter the name of your Databricks Unity Catalog.
Convert to delta format: Convert metadata sent to your Databricks cluster to Delta Lake format.
FS Mount Point: The ADLS location in your Databricks cluster where you've mounted your cloud filesystem. For example,
/mnt/mybucketname
.Data Target: Set a previously created data target for this metadata target.
DefaultFS Override: - Enter path for Default Filesystem Override.
If you select Convert to Delta Lake , enter the location on the DBFS to store the tables converted to Delta Lake. To store Delta Lake tables on cloud storage, enter the path to the mount point and the path on the cloud storage.
Example: Location on the DBFS to store tables converted to Delta Lakedbfs:<location>
Example: Cloud storage locationdbfs:/mnt/adls2/storage_account/
If you don't select Convert to Delta, enter the mount point.
Example: Filesystem mount pointdbfs:<value of fs mount point>
Select Review and create.
Select Create.
See Update authentication settings for information on updating your authentication settings with the UI.
Run the following command to set up a Databricks cluster server as a target:
az livedata migrator metadata-target databricks-agent create --resource-group <resource_group> --metadata-target-name <target_name> --migrator-name <migrator_name> --data-target <data_target> --convert-to-delta-lake --delete-after-convert --fs-mount-point <mount_point> --default-fs-override <fs_override> --jdbc-server-hostname <server_hostname> --jdbc-http-path <jdbc_http_path> --jdbc-port <port> --access-token <access_token> --catalog <unity_catalog_name>
See the Databricks Metadata Migration Target section of the Azure CLI LiveData Extension page for help with these parameters.
Next steps
You can migrate metadata to your Databricks target.