Skip to main content
Version: 1.18.1

Configure CDP target for Hive metadata migrations

To migrate metadata to a Hadoop target, some additional configuration changes are needed before starting. This section covers Cloudera Data Platform (CDP) as an example.

Assumptions#

  • For brevity, these steps don't include Kerberos. However, the steps may be applied to a Kerberized environment.
  • A section is included in case the "hive_metastore_database_password" credentials are not known.
    • [CDP] This step assumes that there's no policy restriction on briefly turning off CDP's redaction.
  • Both the source and the target require changes.

Confirm database credentials#

There are two extra sections for getting the required database credentials:

  • Turning off redaction on the target cluster, so that the credentials can be read from file.
  • [Postgres] Create and configure new Postgres credentials, following the instructions provided in the applicable CDP documentation. For example: Set up a PostreSQL database (CDP 7.1.4).

Turning off redaction on CDP (On the Target)#

note

Ignore this section if the Hive metastore database credentials are already known, or if CDP's redaction feature isn't enabled.

This section covers the capture of Hive metastore database credentials if they are not known. CDP's redaction feature is briefly turned off to allow the credentials to be viewed.

If redaction is enabled, credential values like hive_metastore_database_password are replaced with "REDACTED", for example:

},{  "name" : "hive_metastore_database_password",  "value" : "REDACTED",  "sensitive" : true}, {

To make the password visible, use the following steps on the node on which cluster manager is installed:

  1. Open /etc/default/cloudera-scm-server in a text editor.

  2. Edit the following line, changing "true" to "false". For example:

     $ export CMF_JAVA_OPTS="$CMF_JAVA_OPTS -Dcom.cloudera.api.redaction=false"
  3. Save the change.

  4. Restart the manager using the command:

    $ sudo service cloudera-scm-server restart

    This may take a few minutes.

  5. The hive_metastore_database_password will no longer be redacted. Open and view the file to confirm. For example:

    },{  "name" : "hive_metastore_database_password",  "value" : "42jJrRlep1v3veaw",  "sensitive" : true}, {

Configure the CDP target:#

There are a number of configurations that must be added to hive-site.xml. These can all be found by accessing the API call:

{clusterHost}:{clusterPort}/api/v19/clusters/{clusterName}/services/{serviceName}/config

For example:

abcd01-vm0.domain.name.com:7180/api/v19/clusters/ABCD-01/services/hive1/config
  1. Open the "/etc/hive/conf/hive-site.xml" file with a text editor and save it as "/etc/hive/alt-conf/hive-site.xml".

    note

    Saving the changes to the existing hive-site.xml file will work but the following changes will be lost if the Hive service is restarted.

  2. Add the following configuration to the new file.

    hive-site.xml configurationDescription
    javax.jdo.option.ConnectionURLJDBC URL for the database
    javax.jdo.option.ConnectionDriverNameFull class name of JDBC driver
    javax.jdo.option.ConnectionUserNameUser name for connecting to database
    javax.jdo.option.ConnectionPasswordPassword for connecting to database

    The correct formatting/syntax for these values:

      <property>    <name>javax.jdo.option.ConnectionURL</name>    <value>jdbc:{database type}://{Host}:{Port used by database type}/hive</value>  </property>
      <property>    <name>javax.jdo.option.ConnectionDriverName</name>    <value> SEE GUIDANCE BELOW </value>  </property>
      <property>    <name>javax.jdo.option.ConnectionUserName</name>    <value>hive</value>  </property>
      <property>    <name>javax.jdo.option.ConnectionPassword</name>    <value>{hive_metastore_database_password}</value>  </property>

    Guidance for javax.jdo.option.ConnectionDriverName

    If using a MySQL database:

    • ConnectionDriverName will be com.mysql.jdbc.Driver
    • Port used will be 3306
    • Database type will be mysql

    If using a Postgres database:

    • ConnectionDriverName will be org.postgresql.Driver
    • port used will be 7432
    • Database type will be postgresql
  3. Save the changed to the file.

  4. Go to the Hive Connection screen of the targetAgent and enter the path to the alternate hive-site.xml file into the Override Default Hadoop Configuration Path.

  5. Select Save.

  6. Restart the Hive service in Cloudera Manager.

Turn CDP redaction on again (if applicable)#

Follow the steps in this section if the redaction feature was turned off.

  1. After getting the credentials, edit /etc/default/cloudera-scm-server, restore "redaction=true". For example:
    /etc/default/cloudera-scm-server`export CMF_JAVA_OPTS="$CMF_JAVA_OPTS -Dcom.cloudera.api.redaction=true"
  2. Save the change.
  3. Restart the manager using the command:
    $ sudo service cloudera-scm-server restart

Deploy the remote agent from source (MySQL)#

  1. On the source cluster, open LiveData Migrator through the CLI.

  2. Create a new remote agent using the command below:

    hive agent add hive --autodeploy --file-system-id fsId --host example.host.name --ignore-host-checking --name remoteAgent --ssh-key /root/.ssh/id_rsa --ssh-user root
  3. Navigate to the Hive Migrator directory:

    cd /opt/wandisco/hivemigrator-remote-server
  4. Run the following command to copy both the appropriate jars into this director

    cp /usr/share/java/mysql* .
  5. Set the appropriate permissions using the following command:

    chown {user:group} mysql*
  6. After a few seconds, the agent will appear healthy on the UI. Check it using the command below:

    hive agent check --name remoteAgent

Deploy the remote agent from source (Postgres)#

  1. On the source cluster, open LiveData Migrator through the CLI.

  2. Create a new remote agent using the command below:

    hive agent add hive --autodeploy --file-system-id fsId --host example.host.name --ignore-host-checking --name remoteAgent --ssh-key /root/.ssh/id_rsa --ssh-user root
  3. After a few seconds, the agent will appear healthy on the UI. Check it using the command below:

    hive agent check --name remoteAgent

Creating a metadata migration#

The next step is to follow the instructions to Create a metadata migration.