Skip to main content
Version: 1.19.1

Connect to source and target metastores

Ready to migrate metadata? Hive Migrator, which comes bundled with LiveData Migrator, lets you transfer metadata from a source metastore to any number of target metastores. Connect to metastores by creating local or remote metadata agents.

  • Supported metadata sources are: Apache Hive and AWS Glue Data Catalog.

  • Supported metadata targets are: Apache Hive, Azure SQL DB, AWS Glue Data Catalog, Databricks, Google Dataproc, and Snowflake.

    To configure Databricks as a target, see Configure Databricks as a target.

    To configure Snowflake as a target, see Configure Snowflake as a target.

Migrating transactional tables

Transactional tables may take longer to appear on the target cluster than expected. Hive Migrator uses a cautious approach to ensure data integrity. The following conditions must be met for table data to appear on the target:

  • All corresponding data files are migrated.
  • The table's transaction writeId is updated, confirming that all data files are on the target.

Hive Migrator uses migration gates to ensure data files are in place before meeting the second condition. Coming improvements to migration gates will change the conditions so table migrations may proceed without the need for the data migration to be live, reducing migration times.

Connect to metastores with the UI#

Remote agent

A remote agent is a service deployed on a remote host that connects to LiveData Migrator. A remote agent must be deployed on the target cluster if:

  • The source and target run different major Hive versions.
  • Transactional tables are migrated.

When deploying a remote agent on an environment where Hive uses MySQL, the JDBC Driver for MySQL must be copied into /opt/wandisco/hivemigrator and made executable on the remote server.

  1. From the Dashboard, select a product under Products.

    info

    LiveData Migrator will attempt to auto-discover Apache Hive and create a metadata agent for your Hadoop source filesystem. Check whether an existing agent is listed under the Agents panel.

    Auto-discovery will fail if Kerberos is enabled. Select to configure the existing agent and enter the Kerberos credentials.

  2. Select Connect To Metastore.

  3. Select the Filesystem in which the data associated with the metadata is held. For Hive agents, this will likely be the Hadoop Distributed File System (HDFS) which contains the data for your tables.

    note

    If using a local Hive agent for a target filesystem, then hive-site.xml must be copied from the target cluster to the local cluster into a location specified by the Override Default Hive Configuration Path. Alternatively, a remote agent can be used for the target filesystem.

  4. Select Hive as the Metastore Type.

  5. Enter a Display Name.

  6. (Optional) - Enter a value for Configuration Path. The default path will be used if left blank.

    note

    This should be the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml. Required when using a local agent for a target filesystem.

  7. (Optional) - Enter Kerberos Configuration. Use the Hive service principal hive/hostname@REALM or a principal of similar permission. The keytab must be readable by the user running the Hive Migrator process and contain the appropriate principal.

  8. (Optional) - Enter Default Filesystem Override to override the default filesystem URI. Recommended for complex use cases only.

  9. Select Save.

Connect to metastores with the CLI#

Connect to the LiveData Migrator CLI.

CommandAction
hive agent add hiveAdd a Hive agent for a local or remote Apache Hive Metastore
hive agent configure hiveChange the configuration of an existing Hive agent for the Apache Hive Metastore
hive agent checkCheck whether the Hive agent can connect to the Metastore
hive agent deleteDelete a Hive agent
hive agent listList all configured Hive agents
hive agent showShow the configuration for a Hive agent
hive agent typesList supported Hive agent types

Connect to remote metastores with the CLI#

Follow these steps to deploy a remote Hive agent for Apache Hive:

  1. On your local host, run the hive agent add hive command with the following parameters to configure your remote Hive agent.

    • --host The host where the remote Hive agent will be deployed.
    • --port The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local LiveData Migrator server.
    • --no-ssl (Optional) TLS encryption and certificate authentication is enabled by default between LiveData Migrator and the remote agent. Use this parameter to disable it.
    Example for remote Apache Hive deployment - automated
        hive agent add hive --name targetautoAgent --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5052 --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@REMOTEREALM.COM --config-path /<example directory path> --file-system-id mytargethdfs
    Example for remote Apache Hive deployment - manual
        hive agent add hive --name targetmanualAgent --host myRemoteHost.example.com --port 5052 --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@REMOTEREALM.COM --config-path /<example directory path> --file-system-id mytargethdfs
  2. Transfer the remote server installer to your remote host:

    Example of secure transfer from local to remote host
         scp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
  3. On your remote host, run the installer as root (or sudo) user in silent mode:

         ./hivemigrator-remote-server-installer.sh -- --silent --config <example config string here>
  4. On your remote host, start the remote server service:

         service hivemigrator-remote-server start
info

If you enter Kerberos and configuration path information for remote agents, ensure that the directories and Kerberos principal are correct for your chosen remote host (not your local host).

Next steps#

Connected to your metastores? Define metadata rules for your metadata migrations.