Version: 2.3

Configure CDP target for Hive metadata migrations

Hive Migrator can migrate metadata to a Hive or other metastore service in Apache Hadoop, operating as part of a Hadoop deployment. Metadata migrations make the metadata from a source environment available in a target environment, where tools like Apache Hive can query the data.

Use the following options to set up appropriate security credentials and connection settings to perform metadata migrations successfully.

Supported metadata types:

AWS Glue Data Catalog
Databricks
Google Cloud Dataproc (an Azure SQL Database used for Azure HDInsight's Hive-compatible metastore)
Apache Hive metastore
Snowflake

note

Support for more metadata types will be added in future releases.

Metadata migration to a Hive metastore target

Data Migrator interacts with a metastore using an "agent". Agents hold the information needed to communicate with metastores and allow metadata migrations to be defined in Data Migrator. The migrations don't need to know anything about the specific type of the underlying metastore or how to communicate with it.

Deploy each agent locally or remotely. Deploy a local agent on the host as Hive Migrator. A remote agent, currently used for Hive and Google Cloud Dataproc, runs as a separate service and can be deployed on a separate host, not running Data Migrator.

Deployment with local agents

Agent Local

Deployment with remote agents

Agent Remote

Remote agents let you migrate metadata between different versions of Apache Hive. They also give you complete control over the network communication between source and target environments instead of relying on the network interfaces directly exposed by your metadata target.

For a list of agent types available for each supported platform, see Supported metadata agents.

Configure Hive agents

Configure agents through your preferred interface: UI, Data Migrator CLI, or the REST API.

This information includes:

Agent names
Path to the Hadoop files that contain Hive and Hadoop configuration properties
File system location information that holds content referenced by your metadata, for example, a HDFS instance
[Optional] Kerberos credentials in the form of principal and keytab information

The Apache Thrift API and its underlying database allow migration metadata for Hive ACID transactional tables.

The main Hadoop configuration file hive-site.xml provides the information required for agents to communicate with the Hive metastore. In a Cloudera Data Platform deployment, it's found in /etc/hive/conf/hive-site.xml.

Runtime environment needed by Hive agents

Hive agents need access to the JDBC driver/connector used to communicate with the Hive metastore's underlying database. For example, a Cloudera Data Platform environment typically uses MySQL or Postgres databases.

Download the appropriate JDBC driver/connector JAR file, MySQL or Postgres.
note
You only need the JAR file, if you're asked to specify an OS, choose platform independent. If the download is provided as an archive, extract the JAR file from the archive.
Copy the JDBC driver JAR to /opt/wandisco/hivemigrator/agent/hive on your metadata agent host machine.
Set the ownership of the file to the Hive Migrator system user and group for your Hivemigrator instance.
Example
```
chown {hive:hadoop} mysql*
```

Temporarily turn off redaction

If you don't know the password to connect to your Hive metastore, get it from your Hadoop environment. If your Hadoop platform redacts the password, follow the steps below on the node running your cluster manager:

Open /etc/default/cloudera-scm-server in a text editor.

Edit the following line, changing "true" to "false". For example:

$ export CMF_JAVA_OPTS="$CMF_JAVA_OPTS -Dcom.cloudera.api.redaction=false"

Save the change.
Restart the manager using the command:
```
$ sudo service cloudera-scm-server restart
```
This may take a few minutes.
The hive_metastore_database_password will no longer be redacted. Open and view the file to confirm. For example:
```
```
},{
  "name" : "hive_metastore_database_password",
  "value" : "The-true-password-value",
  "sensitive" : true
}, {
```
```
info
Enable redaction after you confirm your system's credentials

Configure the CDP target with the UI

To configure a CDP target, when adding Metastore Database Credentials in the Agents tab in the UI, select the Override JDBC Connection Properties checkbox to enable the following fields:

Connection URL - JDBC URL for the database
Connection Driver Name - Full class name of JDBC driver
Connection Username - The username for your metastore database
Connection Password - The password for your metastore database

Configure the CDP target with the CLI

Get the required JDBC configuration using the following API call:

{clusterHost}:{clusterPort}/api/v19/clusters/{clusterName}/services/{serviceName}/config

For example:

abcd01-vm0.domain.name.com:7180/api/v19/clusters/ABCD-01/services/hive1/config

Open the /etc/hive/conf/hive-site.xml file with a text editor and save it as /etc/hive/alt-conf/hive-site.xml.
note
Saving the changes to the existing hive-site.xml file will work but the following changes will be lost if the Hive service is restarted.
Add the following configuration to the new file.

`hive-site.xml` configuration	Description
`--jdbc-url`	JDBC URL for the database
`--jdbc-driver-name`	Full class name of JDBC driver
`--jdbc-username`	User name for connecting to database
`--jdbc-password`	Password for connecting to database

For example:

hive agent add hive --name targetautoAgent5 --host test.cirata.com --port 5052 --no-ssl --jdbcUrl jdbc:postgresql://test.bdauto.cirata.com:7432/hive --jdbc-DriverName org.postgresql.Driver --jdbc-username admin --jdbc--password *** --file-system-id 'testfs'

The correct formatting/syntax for these values:

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:{database type}://{Host}:{Port used by database type}/hive</value>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value> See Guidance for javax.jdo.option.ConnectionDriverName, below </value>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>{hive_metastore_database_password}</value>
</property>

For more information, see Command reference.

Guidance for javax.jdo.option.ConnectionDriverName

If you're using a MySQL database:
- ConnectionDriverName is com.mysql.jdbc.Driver
- Port used is 3306
- Database type is mysql
  If you're using a Postgres database:
- ConnectionDriverName is org.postgresql.Driver
- Port used is 7432
- Database is postgresql
Save the changes to the file.
Go to the Hive Connection screen of the targetAgent. Enter the path to the alternate hive-site.xml file into the Override Default Hadoop Configuration Path.
Select Save.
Restart the Hive service in Cloudera Manager.

Additional steps

Use the JDBC driver and Hadoop configuration information to create an agent instance:

Save the changes to the hive-site.xml file.
Go to the Hive Connection screen of the targetAgent. Enter the path to the alternative hive-site.xml file into the Override Default Hadoop Configuration Path.
Select Save.
Restart the Hive service in Cloudera Manager.

Deploy the remote agent from source (MySQL)

On the source cluster, open Data Migrator through the CLI.

Create a new remote agent using the command below:

hive agent add hive --autodeploy --file-system-id fsId --host example.host.name --port 5052 --ignore-host-checking --name remoteAgent --ssh-key /root/.ssh/id_rsa --ssh-user root

Navigate to the Hive Migrator directory:

cd /opt/wandisco/hivemigrator-remote-server

Run the following command to copy both of the appropriate jars into this directory:
```
cp /usr/share/java/mysql*
```
Set the appropriate permissions with the following command:
```
chown {user:group} mysql*
```
After a few seconds, the agent will appear healthy on the UI. Check its status with the following command:
```
hive agent check --name remoteAgent
```

Deploy the remote agent from source (Postgres)

On the source cluster, open Data Migrator through the CLI.

Create a new remote agent with the following command:

hive agent add hive --autodeploy --file-system-id fsId --host example.host.name --port 5052 --ignore-host-checking --name remoteAgent --ssh-key /root/.ssh/id_rsa --ssh-user root

After a few seconds, the agent will appear healthy on the UI. Check its status with the following command:
```
hive agent check --name remoteAgent
```

Create a metadata migration

Create a metadata migration.

Metadata migration to a Hive metastore target​

Configure Hive agents​

Runtime environment needed by Hive agents​

Configure the CDP target with the UI​

Configure the CDP target with the CLI​

Guidance for javax.jdo.option.ConnectionDriverName​

Additional steps​

Deploy the remote agent from source (MySQL)​

Deploy the remote agent from source (Postgres)​

Create a metadata migration​