Skip to main content
Version: 1.17.1

Connect metastores

Ready to migrate metadata? Hive Migrator, which comes bundled with LiveData Migrator, lets you transfer metadata from a source metastore to target metastores. Connect to metastores by creating local or remote metadata agents. Remote agents are currently unsupported on the UI.

  • Supported metadata sources are: Apache Hive and AWS Glue Data Catalog.
  • Supported metadata targets are: Apache Hive, Azure SQL DB, AWS Glue Data Catalog, Databricks, Google Dataproc, and Azure Snowflake.

Connect to metastores with the UI#

Remote agent

A remote agent is a service deployed on a remote host that connects to LiveData Migrator. A remote agent must be deployed on the target cluster if

  • The source and target run different major Hive versions.
  • Transactional tables are migrated.

When deploying a remote agent on an environment where Hive uses MySQL, the JDBC Driver for MySQL must be copied into /opt/wandisco/hivemigrator and made executable on the remote server.

  1. From the Dashboard, click a product under Products.

    info

    LiveData Migrator will attempt to auto-discover Apache Hive and create a metadata agent for your Hadoop source filesystem. Check whether an existing agent is listed under the Agents panel.

    Auto-discovery will fail if Kerberos is enabled. Select to configure the existing agent and provide the Kerberos credentials.

  2. Click Connect To Metastore.

  3. Select Hive as the Agent Type.

  4. Provide a Display Name.

  5. (Optional) - Provide Kerberos Configuration. Use the Hive service principal hive/hostname@REALM or a principal of similar permission. The keytab must be readable by the user running the Hive Migrator process and contain the appropriate principal.

  6. (Optional) - Provide a value for Override Default Hive Configuration Path. The default path will be used if left blank.

  7. Select the Filesystem in which the data associated with the metadata is held. For Hive agents, this will likely be the Hadoop Distributed File System (HDFS) which contains the data for your tables.

note

If using a local Hive agent for a target filesystem, then hive-site.xml must be copied from the target cluster to the local cluster into a location specified by the Override Default Hive Configuration Path. Alternatively, a remote agent can be used for the target filesystem.

  1. (Optional) - Specify DefaultFS Override to override the default filesystem URI. Recommended for complex use cases only.

  2. Click Save.

Connect to metastores with the CLI#

Connect to the LiveData Migrator CLI.

CommandAction
hive agent add hiveAdd a Hive agent for a local or remote Apache Hive Metastore
hive agent configure hiveChange the configuration of an existing Hive agent for the Apache Hive Metastore
hive agent checkCheck whether the Hive agent can connect to the Metastore
hive agent deleteDelete a Hive agent
hive agent listList all configured Hive agents
hive agent showShow the configuration for a Hive agent
hive agent typesList supported Hive agent types

Connect to remote metastores with the CLI#

Follow these steps to deploy a remote Hive agent for Apache Hive:

  1. On your local host, run the hive agent add hive command with the following parameters to configure your remote Hive agent.

    • --host The host where the remote Hive agent will be deployed.
    • --port The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local LiveData Migrator server.
    • --no-ssl (Optional) TLS encryption and certificate authentication is enabled by default between LiveData Migrator and the remote agent. Use this parameter to disable it.
Example for remote Apache Hive deployment - automated
    hive agent add hive --name targetautoAgent --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5552 --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@REMOTEREALM.COM --config-path <example directory path> --file-system-id mytargethdfs

Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.

Example for remote Apache Hive deployment - manual
    hive agent add hive --name targetmanualAgent --host myRemoteHost.example.com --port 5552 --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@REMOTEREALM.COM --config-path <example directory path> --file-system-id mytargethdfs

Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.

  1. Transfer the remote server installer to your remote host:
Example of secure transfer from local to remote host
    scp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
  1. On your remote host, run the installer as root (or sudo) user in silent mode:

         ./hivemigrator-remote-server-installer.sh -- --silent
  2. On your remote host, start the remote server service:

         service hivemigrator-remote-server start
info

If specifying Kerberos and config path information for remote agents, ensure that the directories and Kerberos principal are correct for your chosen remote host (not your local host).

Next steps#

Connected to your Metastores? Define metadata rules for your metadata migrations.