Connect metastores for metadata migrations
LiveData Migrator replicates metadata between Databricks (target only), Apache Hive, AWS Glue and Azure SQL.
Ready to migrate metadata? First, connect to your Metastores by adding Hive agents.
note
Databricks agents are currently available as a preview feature.
info
The source table format must be Parquet to ensure a successful migration to Databricks Delta Lake.
#
Connect Metastores With The UI#
Apache HiveLiveData Migrator will attempt to auto-discover Apache Hive and create an agent for your source environment. Check whether an existing agent is listed under the Agents panel.
If Kerberos is enabled on your cluster and HDFS is configured as your source filesystem, select to configure the existing agent and provide the Kerberos credentials.
Click Connect To Metastore.
Provide a Display Name.
Select Hive as the Agent type.
Provide an Override Default Hadoop Configuration Path.
caution
If using a local Hive agent for a target filesystem, then hive-site.xml must be copied from the target cluster to the local cluster into a location specified by the Override Default Hadoop Configuration Path. Alternatively, a remote agent can be used for the target filesystem (not currently supported via the ui).
Select the Filesystem.
Specify DefaultFs Override (optional).
Click Save
#
AWS Glue Data CatalogClick Connect To Metastore.
Select AWS Glue as the Agent type.
Provide a Display Name.
Select the AWS Catalog Credentials Provider.
Enter the AWS Glue Service Endpoint.
Enter the AWS Region.
Select the Filesystem.
Specify DefaultFs Override (optional).
Click Save
#
Azure SQLClick Connect To Metastore.
Select Azure SQL DB as the Agent type.
Provide a Display Name.
Enter the Azure SQL Server Name
Enter the ADLS Gen2 Storage Account Name and Container Name.
Specify the Root Folder.
Select the Authentication Method.
Select the HDI version.
Select the Filesystem.
Specify DefaultFs Override (optional).
Click Save
#
Databricks Delta Lake (Target Only)Databricks Delta Lake Metastores are supported as a target only. LiveData Migrator can convert tables to Delta format during migration.
note
Version 2.6.25 of the Databricks JDBC driver was released recently. If you download this version, you'll receive an error message stating you haven't downloaded a driver. This is because the latest version isn't compatible with Hive Migrator. Download version 2.6.22 to create your agent.
Click Connect To Metastore.
Select Databricks as the Agent type.
Provide a Display Name.
Enter the JDBC Server Hostname, Port and HTTP Path.
Enter the Access Token.
Enter the FS Mount Point.
Select the Filesystem.
Specify DefaultFs Override (optional).
Click Save
#
Google Cloud DataprocClick Connect To Metastore.
Select Google Cloud Dataproc as the Agent type.
Provide a Display Name.
Provide the Hostname or IP Address.
Provide the Port.
Select the Filesystem.
Specify DefaultFs Override (optional).
Click Save
#
Connect Metastores With The CLI#
Connect To MetastoresConnect To Metastores to connect your source and target Metastores.
Command | Action |
---|---|
hive agent add azure | Add a Hive agent for an Azure SQL connection |
hive agent add filesystem | Add a Hive agent for a local filesystem |
hive agent add glue | Add a Hive agent for an AWS Glue Data Catalog |
hive agent add hive | Add a Hive agent for a local or remote Apache Hive Metastore |
hive agent add databricks | Add a Hive agent for a Databricks Delta Lake Metastore |
hive agent add dataproc | Add a Hive agent for a Google Cloud Dataproc Metastore |
#
Configure Existing Hive AgentsCommand | Action |
---|---|
hive agent configure azure | Change the configuration of an existing Hive agent for the Azure SQL database server |
hive agent configure filesystem | Change the configuration of an existing Hive agent for the local filesystem |
hive agent configure glue | Change the configuration of an existing Hive agent for the AWS Glue Data Catalog |
hive agent configure hive | Change the configuration of an existing Hive agent for the Apache Hive Metastore |
hive agent configure databricks | Change the configuration of an existing Hive agent for the Databricks Delta Lake Metastore |
hive agent configure datapropc | Change the configuration of an existing Hive agent for the Google Cloud Dataproc Metastore |
#
Manage Hive AgentsCommand | Action |
---|---|
hive agent check | Check whether the Hive agent can connect to the Metastore |
hive agent delete | Delete a Hive agent |
hive agent list | List all configured Hive agents |
hive agent show | Show the configuration for a Hive agent |
hive agent types | List supported Hive agent types |
#
Next StepsConnected to your Metastores? Define metadata rules for your metadata migrations.