Configure CDP target for Hive metadata migrations
To migrate metadata to a Hadoop target, some additional configuration changes are needed before starting. This section covers Cloudera Data Platform (CDP) as an example.
#
Assumptions- For brevity, these steps don't include Kerberos. However, the steps may be applied to a Kerberized environment.
- A section is included in case the "hive_metastore_database_password" credentials are not known.
- [CDP] This step assumes that there's no policy restriction on briefly turning off CDP's redaction.
- Both the source and the target require changes.
#
Confirm database credentialsThere are two extra sections for getting the required database credentials:
- Turning off redaction on the target cluster, so that the credentials can be read from file.
- [Postgres] Create and configure new Postgres credentials, following the instructions in the applicable CDP documentation. For example, Set up a PostreSQL database (CDP 7.1.4).
#
Turning off redaction on CDP (On the Target)note
Ignore this section if the Hive metastore database credentials are already known, or if CDP's redaction feature isn't enabled.
This section covers the capture of Hive metastore database credentials if they are not known. CDP's redaction feature is briefly turned off to allow the credentials to be viewed.
If redaction is enabled, credential values like hive_metastore_database_password
are replaced with "REDACTED", for example:
},{ "name" : "hive_metastore_database_password", "value" : "REDACTED", "sensitive" : true}, {
To make the password visible, use the following steps on the node on which cluster manager is installed:
Open
/etc/default/cloudera-scm-server
in a text editor.Edit the following line, changing "true" to "false". For example:
$ export CMF_JAVA_OPTS="$CMF_JAVA_OPTS -Dcom.cloudera.api.redaction=false"
Save the change.
Restart the manager using the command:
$ sudo service cloudera-scm-server restart
This may take a few minutes.
The
hive_metastore_database_password
will no longer be redacted. Open and view the file to confirm. For example:},{ "name" : "hive_metastore_database_password", "value" : "42jJrRlep1v3veaw", "sensitive" : true}, {
#
Configure the CDP target:There are a number of configurations that must be added to hive-site.xml
. These can all be found by accessing the API call:
{clusterHost}:{clusterPort}/api/v19/clusters/{clusterName}/services/{serviceName}/config
For example:
abcd01-vm0.domain.name.com:7180/api/v19/clusters/ABCD-01/services/hive1/config
Open the "/etc/hive/conf/hive-site.xml" file with a text editor and save it as "/etc/hive/alt-conf/hive-site.xml".
note
Saving the changes to the existing hive-site.xml file will work but the following changes will be lost if the Hive service is restarted.
Add the following configuration to the new file.
hive-site.xml configuration Description javax.jdo.option.ConnectionURL
JDBC URL for the database javax.jdo.option.ConnectionDriverName
Full class name of JDBC driver javax.jdo.option.ConnectionUserName
User name for connecting to database javax.jdo.option.ConnectionPassword
Password for connecting to database The correct formatting/syntax for these values:
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:{database type}://{Host}:{Port used by database type}/hive</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value> SEE GUIDANCE BELOW </value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>{hive_metastore_database_password}</value> </property>
Guidance for javax.jdo.option.ConnectionDriverName
If using a MySQL database:
- ConnectionDriverName will be
com.mysql.jdbc.Driver
- Port used will be
3306
- Database type will be
mysql
If using a Postgres database:
- ConnectionDriverName will be
org.postgresql.Driver
- port used will be 7432
- Database type will be
postgresql
- ConnectionDriverName will be
Save the changed to the file.
Go to the Hive Connection screen of the targetAgent and enter the path to the alternate hive-site.xml file into the Override Default Hadoop Configuration Path.
Select Save.
Restart the Hive service in Cloudera Manager.
#
Turn CDP redaction on again (if applicable)Follow the steps in this section if the redaction feature was turned off.
- After getting the credentials, edit
/etc/default/cloudera-scm-server
, restore "redaction=true". For example:/etc/default/cloudera-scm-server`export CMF_JAVA_OPTS="$CMF_JAVA_OPTS -Dcom.cloudera.api.redaction=true"
- Save the change.
- Restart the manager using the command:
$ sudo service cloudera-scm-server restart
#
Deploy the remote agent from source (MySQL)On the source cluster, open Data Migrator through the CLI.
Create a new remote agent using the command below:
hive agent add hive --autodeploy --file-system-id fsId --host example.host.name --port 5052 --ignore-host-checking --name remoteAgent --ssh-key /root/.ssh/id_rsa --ssh-user root
Navigate to the Hive Migrator directory:
cd /opt/wandisco/hivemigrator-remote-server
Run the following command to copy both of the appropriate jars into this directory:
cp /usr/share/java/mysql* .
Set the appropriate permissions with the following command:
chown {user:group} mysql*
After a few seconds, the agent will appear healthy on the UI. Check its status with the following command:
hive agent check --name remoteAgent
#
Deploy the remote agent from source (Postgres)On the source cluster, open Data Migrator through the CLI.
Create a new remote agent with the following command:
hive agent add hive --autodeploy --file-system-id fsId --host example.host.name --port 5052 --ignore-host-checking --name remoteAgent --ssh-key /root/.ssh/id_rsa --ssh-user root
After a few seconds, the agent will appear healthy on the UI. Check its status with the following command:
hive agent check --name remoteAgent
#
Create a metadata migrationThe next step is to follow the instructions to Create a metadata migration.