Skip to main content
Version: 1.19.1

Configure Hive Migrator

The Hive Migrator service is responsible for migrating metadata and communication between agents. Use the following steps to configure the Hive Migrator service.

note

LiveData Migrator 1.19 Hive Migrator configuration changes

Hive Migrator configuration is now stored in /etc/wandisco/hivemigrator/application.properties.

The following configuration files have been removed:

  • /etc/wandisco/hivemigrator/application.yaml
  • /etc/wandisco/hivemigrator/hive-migrator.yaml

This update is automatically applied when upgrading from earlier product versions. There's no need to make any manual changes.

Security#

Basic authentication#

Enable Micronaut framework#

Follow these steps to enable basic authentication on the Hive Migrator REST API:

  1. Open /etc/wandisco/hivemigrator/application.properties.

  2. Ensure micronaut.security.enabled is "true". For example,

    Ensure the enabled parameter is true
    micronaut.security.enabled=true
  3. Save and close the file.

Add authentication credentials#

Apply the following steps if basic authentication is enabled on the LiveData Migrator REST API.

  1. Open /etc/wandisco/hivemigrator/application.properties.

  2. Add the hivemigrator.integration.livedataMigrator.username and hivemigrator.integration.livedataMigrator.password properties to the end of the hivemigrator.integration section. These credentials are used by Hive Migrator to communicate with LiveData Migrator so they must match those used for LiveData Migrator core.

    Example
      hivemigrator.integration.liveDataMigrator.port=18080  hivemigrator.integration.liveDataMigrator.useSsl=false  hivemigrator.integration.livedataMigrator.username="admin"  hivemigrator.integration.livedataMigrator.password="password"
    important

    The property hivemigrator.integration.livedataMigrator.password must be in plain text.
    Support for an encrypted password will be added in a future release.

  3. Add the user credentials hivemigrator.username and hivemigrator.password immediately after micronaut.security.enabled=true.

    Example
    micronaut.security.enabled=truehivemigrator.username=adminhivemigrator.password=$2aPASSWORDSTRING.eMyOt67yEM6TVkz1qeIxDMfaCnI8SjFaRUy

    The hivemigrator.password string needs to be encrypted using a bcrypt generator that provides a "2a" prefix at the beginning of the encrypted password.

  4. Restart the Hive Migrator service to enable the new configuration:

    service hivemigrator restart

Connect to Hive Migrator with basic authentication#

note

Follow these steps if you used different credentials for LiveData Migrator and Hive Migrator, or if basic authentication isn't enabled on LiveData Migrator.

This step isn't required if you use the same credentials for both services.

When basic authentication is enabled, enter the username and password when prompted to connect to Hive Migrator with the CLI:

Example
  connect hivemigrator localhost: trying to connect...Username: adminPassword: ***********Connected to hivemigrator v1.<VERSION-NUMBER> on http://localhost:6780.

The username and password are required for direct access to the Hive Migrator REST API.

important

If you enable basic authentication on Hive Migrator, ensure you update the LiveData UI with the credentials to maintain functionality.

TLS certificates#

When you deploy a remote agent (for example, Azure SQL or AWS Glue), Hive Migrator establishes a Transport Layer Security (TLS) connection to the agent.

Certificates (and keys) are automatically generated for this connection for both Hive Migrator and the remote agent. These are placed in the following directories:

Hive Migrator - Client and Root CA certificates
/etc/wandisco/hivemigrator/client-key.pem/etc/wandisco/hivemigrator/client-cert.pem/etc/wandisco/hivemigrator/ca-cert.pem/etc/wandisco/hivemigrator/ca-key.pem/etc/wandisco/hivemigrator/ca-cert.srl
Remote agent - Server and Root CA certificates
/etc/wandisco/hivemigrator-remote-server/server-key.pem/etc/wandisco/hivemigrator-remote-server/server-cert.pem/etc/wandisco/hivemigrator-remote-server/ca-cert.pem

You can generate new certificates at any time or upload your own.

Generate new certificates#

important

Generate new certificates for Hive Migrator and all remote agents that are connected.

Generating certificates for just one of these components breaks existing connections.

Generate new certificates and keys by using the following Hive Migrator REST API endpoints:

Hive Migrator
POST ​/config​/certificates​/generate
Remote agent
POST ​/agents/{name}/certificates/generate

The remote agent service automatically restarts when you generate new certificates this way. You don't need to restart the Hive Migrator service to use the new certificates.

SSL authentication and encryption#

Secure Sockets Layer (SSL) for remote agents is enabled by default. To enable SSL/TLS for Hive Migrator to encrypt Hive Migrator's API, add the following properties to /etc/wandisco/hivemigrator/application.properties:

TSL Configuration in /etc/wandisco/hivemigrator/application.properties
hivemigrator.integration.liveDataMigrator.useSsl=truehivemigrator.integration.liveDataMigrator.trust-store.path=/etc/wandisco/hivemigrator/tls/keystore.p12hivemigrator.integration.liveDataMigrator.trust-store.password=passwordhivemigrator.integration.liveDataMigrator.trust-store.type=PKCS12micronaut.ssl.enabled=truemicronaut.ssl.buildSelfSigned=falsemicronaut.ssl.port=6781micronaut.ssl.key-store.path=file:/etc/wandisco/hivemigrator/tls/keystore.p12micronaut.ssl.key-store.password=passwordmicronaut.ssl.key-store.type=PKCS12micronaut.ssl.key.alias=hivemigratormicronaut.server.port=6780micronaut.server.dualProtocol=false

Each configuration property is described in the following table:

Certificate propertiesDescription
hivemigrator.integration.liveDataMigrator.useSsl=trueDecides whether Hive Migrator should use TLS to talk to LiveData Migrator.
hivemigrator.integration.liveDataMigrator.trust-store.path=/etc/wandisco/hivemigrator/tls/keystore.p12Decides what truststore should be used to determine whether Hive Migrator will trust the certificate obtained from LiveData Migrator.
hivemigrator.integration.liveDataMigrator.trust-store.password=examplepasswordThe password for the truststore defined in the above trust-store.path parameter.
hivemigrator.integration.liveDataMigrator.trust-store.type=PKCS12Decides the file type of the truststore defined in the above trust-store.path parameter.
micronaut.ssl.enabled=trueDecides whether Hive Migrator should use TLS on its own API.
micronaut.ssl.buildSelfSigned=falseDecides whether Hive Migrator should build some self-signed certificates. We recommend setting to false when using a custom truststore and keystore.
micronaut.ssl.port=6781The port to access the Hive Migrator API on TLS
micronaut.ssl.key-store.path=file:/etc/wandisco/hivemigrator/tls/keystore.p12Decides the path to the keystore that Hive Migrator will use.
micronaut.ssl.key-store.password=examplepasswordThe password for the above keystore.
micronaut.ssl.key-store.type=PKCS12The filetype to the above keystore.
micronaut.ssl.key.alias=hivemigratorThe alias of the certificate in the keystore that Hive Migrator will supply to clients.
micronaut.server.port=6780The NON-TLS port to access Hive Migrator.
micronaut.server.dualProtocol=falseDecides whether Hive Migrator should allow NON-TLS connections on the non-TLS port and allow TLS connections on the TLS port.

Using default truststore/keystore with generated certificates#

To use SSL for authentication and encryption, complete the following steps:

  1. Generate self-signed certificates, assigning Hive Migrator as the client and the Hive Migrator remote agent as the server.
    Use the following files names:

    ca-cert.pem,client-cert.pem,client-key.pem,server-cert.pem,server-key.pem
  2. Copy the following files to the Hive Migrator directory /etc/wandisco/hivemigrator/:

    • ca-cert.pem
    • client-cert.pem
    • client-key.pem
  3. Copy the following files to the Hive Migrator remote server directory /etc/wandisco/hivemigrator-remote-server:

    • ca-cert.pem
    • server-cert.pem
    • server-key.pem
  4. Restart the service for the Hive Migrator remote server by running the command:

    service hivemigrator-remote-server restart

Using a custom truststore (CLI)#

Use the following steps to create a custom truststore in the CLI. This will secure connections between Hive Migrator and LiveData Migrator.

  1. Create a new truststore for the CLI containing the certificates for LiveData Migrator and HiveMigrator.

  2. Create a new vars.env file for the LiveData Migrator CLI.

    vi /opt/wandisco/livedata-migrator-cli/vars.env
  3. Add the following line to the vars.env file:

    LIVEDATA_MIGRATOR_OPTS="-Djavax.net.ssl.trustStore=/path/to/trust/store -Djavax.net.ssl.trustStorePassword=password"
  4. Save the change.

  5. Open /opt/wandisco/livedata-migrator-cli/bin/livedata-migrator in a text editor.

  6. Add the line: source /opt/wandisco/livedata-migrator-cli/vars.env.

    For example:

    Example edit
    #!/usr/bin/env sh
    ##################################################################################  livedata-migrator start up script for UN*X################################################################################
    source /opt/wandisco/livedata-migrator-cli/vars.env
    # Attempt to set APP_HOME
    important

    After you upgrade LiveData Migrator, check the change is still in place and reapply it if necessary.

  7. Run the CLI.

    livedata-migrator

Using a custom truststore (UI)#

The default SSL keystore configuration for the LiveData UI is stored in /etc/wandisco/ui/application-prod.properties:

server.ssl.port=8443server.ssl.enabled=trueserver.ssl.key-store=/etc/wandisco/ui/tls/keystore.p12server.ssl.key-store-password=passwordserver.ssl.key-store-type=PKCS12server.ssl.key-alias=livedata-ui
note

If you define a custom keystore using these configuration parameters, the truststore will still default to the one in the JAVA home directory.

For more information, see Transport Layer Security (TLS).

Use the following steps to enter a custom truststore:

  1. Open /etc/wandisco/ui/vars.env in a text editor.
  2. Add the following line:
    LDUI_EXTRA_JVM_ARGS="-Djavax.net.ssl.trustStore=/etc/wandisco/ui/tls/keystore.p12 -Djavax.net.ssl.trustStorePassword=password
    This LiveData UI extra JAVA argument adds the following:
  • Djavax.net.ssl.trustStore - Path to the custom truststore file.
  • Djavax.net.ssl.trustStorePassword - The custom truststore password.
  1. Save the change.
  2. Restart the Hive Migrator service using the command:
    service hivemigrator restart

Upload your own certificates#

important

Ensure the correct certificates and keys are uploaded for Hive Migrator and all remote agents that are connected.

Existing connections will break if the trust relationship isn't established between Hive Migrator and remote agents.

Upload certificates and keys by using the following Hive Migrator REST API endpoints:

Hive Migrator
POST ​/config​/certificates​/upload
Remote agent
POST ​/agents/{name}/certificates/upload

The remote agent service restarts automatically when new certificates are uploaded this way. The Hive Migrator service doesn't require a restart to start using new certificates.

Migration batch size#

Use the following parameter to limit the number of objects that are sent in a single request to a target metastore:

Batch size parameter:
hivemigrator.migrationBatchSize=1000

The default is 1000. To change the parameter:

  1. Open /etc/wandisco/hivemigrator/application.properties in a text editor.
  2. Add the line hivemigrator.migrationBatchSize=<integer>, where <integer> is the maximum number of objects per request.
  3. Save the change.
  4. Restart the Hive Migrator service using the command: service hivemigrator restart.
important

AWS Glue Data Catalog allows a maximum of 100 objects per request.
When it's used as a target, set hivemigrator.migrationBatchSize to 100.

Hive Migrator logging#

Configuration for the Hive Migrator log file is stored in /etc/wandisco/hivemigrator/log4j2.yaml.

  • Example log4j2.yaml file contents:

    Default log4j2.yaml:
    
    Configuration:  packages: "com.wandisco.hivemigrator.configuration.log"  monitorInterval: 300  properties:    property:    - name: log.dir      value: "/var/log/wandisco/hivemigrator"    - name: log.level      value: "info"    - name: httpserver.level      value: "info"    - name: hivemetastore.level      value: "warn"    - name: metrics.level      value: "off"    - name: log.max.size      value: 200    - name: log.max.files      value: 100  appenders:    Routing:      name: routingAppender      Routes:        pattern: "${ctx:migrationId}"        Route:          -            RollingFile:              name: hiveMigratorLog              fileName: "${log.dir}/hivemigrator.log"              filePattern: "${log.dir}/hivemigrator.%d{yyyy-MM-dd'T'HH-mm-ss}.log.gz"              PatternLayout:                pattern: "%d{DEFAULT} %5p [%-15.15t] %-45c{1.} %15X{agentName}: %maskpass%n"                charset: "UTF-8"              Policies:                OnStartupTriggeringPolicy: {}                SizeBasedTriggeringPolicy:                  size: "${log.max.size} MB"              DefaultRolloverStrategy:                Delete:                  basePath: "${log.dir}"                  IfFileName:                    glob: "hivemigrator*.log.gz"                  IfAccumulatedFileCount:                    exceeds: ${log.max.files}            key: "${ctx:migrationId}"          -            RollingFile:              name: "migration-${ctx:migrationId}"              fileName: "${log.dir}/migration-audit-${ctx:migrationId}.log"              filePattern: "${log.dir}/migration-audit-${ctx:migrationId}.%d{yyyy-MM-dd'T'HH-mm-ss}.log.gz"              PatternLayout:                pattern: "%d{DEFAULT} %5p [%-15.15t] %-45c{1.} %15X{agentName}: %maskpass%n"                charset: "UTF-8"              Policies:                OnStartupTriggeringPolicy: {}                SizeBasedTriggeringPolicy:                  size: "${log.max.size} MB"              DefaultRolloverStrategy:                Delete:                  basePath: "${log.dir}"                  IfFileName:                    glob: "migration-audit-${ctx:migrationId}*.log.gz"                    IfAccumulatedFileCount:                      exceeds: ${log.max.files}    RollingFile:     - name: metricsLog        fileName: "${log.dir}/metrics.log"        filePattern: "${log.dir}/metrics.%d{yyyy-MM-dd'T'HH-mm-ss}.log.gz"        createOnDemand: "true"        PatternLayout:          pattern: "%d{DEFAULT} - %maskpass%n"          charset: "UTF-8"        Policies:          OnStartupTriggeringPolicy: {}          SizeBasedTriggeringPolicy:            size: "${log.max.size} MB"        DefaultRolloverStrategy:          Delete:            basePath: "${log.dir}"            IfFileName:              glob: "metrics*.log.gz"            IfAccumulatedFileCount:              exceeds: ${log.max.files}      - name: httpAuditLog        fileName: "${log.dir}/http-audit.log"        filePattern: "${log.dir}/http-audit.%d{yyyy-MM-dd'T'HH-mm-ss}.log.gz"        PatternLayout:          pattern: "%d{DEFAULT} %5p [%-15.15t] %-45c{1.}: %maskpass%n"          charset: "UTF-8"        Policies:          OnStartupTriggeringPolicy: {}          SizeBasedTriggeringPolicy:            size: "${log.max.size} MB"        DefaultRolloverStrategy:          Delete:            basePath: "${log.dir}"            IfFileName:              glob: "http-audit*.log.gz"            IfAccumulatedFileCount:              exceeds: ${log.max.files}
      Loggers:    logger:      - name: io.micrometer        level: "${metrics.level}"        additivity: false        AppenderRef:          ref: metricsLog      - name: com.wandisco.hivemigrator.rest.filters.HttpLoggingFilter        level: "${httpserver.level}"        additivity: false        AppenderRef:          ref: httpAuditLog#      - name: "migration-audit"#        level: "${log.level}"#        additivity: false#        AppenderRef:#          ref: migrationAudit      - name: org.apache.hadoop.hive.metastore        level: "${hivemetastore.level}"    Root:      level: "${log.level}"      AppenderRef:        ref: routingAppender
    

Change the log file name for migration audit logs for Hive Migrator#

  1. Open /etc/wandisco/hivemigrator/log4j2.yaml.
  2. Update the filename parameter from fileName: "${log.dir}/migration-audit-${ctx:migrationId}.log" to your preferred filename. For example: fileName: "${log.dir}/YOUR-FILE-NAME-${ctx:migrationId}.log"
  3. Save the change.
  4. Restart Hive Migrator.

Enable Hive Migrator debugging mode#

Use the following steps to investigate authentication/authorization problems when using a HDFS source or target:

  1. Open /etc/wandisco/hivemigrator/vars.sh.

  2. Add the following JVM argument:

    HVM_EXTRA_JVM_ARGS="-Dsun.security.krb5.debug=true -Dsun.security.jgss.debug=true -Dsun.security.spnego.debug=true"
  3. Add the following log location parameter:

    LOG_OUT_FILE=/var/log/wandisco/hivemigrator/hivemigrator.out
  4. Save the changes.

  5. Restart the Hive Migrator service:

    • Linux distributions with systemd
      systemctl restart hivemigrator
    • Linux distributions without systemd
      service hivemigrator restart
  6. IMPORTANT: Once you've completed your investigation, reverse the changes. The debug mode generates massive log files that may exhaust your available storage.

Read more about adding additional JVM arguments in the knowledge base.

Perform a LiveData Migrator Java connection debug capture#

Used the following steps to investigate authentication/authorization problems when using a HDFS source or target:

  1. Open /etc/wandisco/hivemigrator/vars.sh
  2. Add the following JVM argument:
    HVM_EXTRA_JVM_ARGS="-Dsun.security.krb5.debug=true -Dsun.security.jgss.debug=true -Dsun.security.spnego.debug=true"
  3. Add the following log location parameter:
    LOG_OUT_FILE=/var/log/wandisco/hivemigrator/hivemigrator.out
  4. Restart Hive Migrator.

Read more about adding additional JVM arguments in the knowledge base.

Directory structure#

The following directories are used for Hive Migrator:

LocationContent
/var/log/wandisco/hivemigratorLogs
/etc/wandisco/hivemigratorConfiguration files
/opt/wandisco/hivemigratorJava archive files
/var/run/hivemigratorRuntime files

Remote servers#

The following directories are used for Hive Migrator remote servers (remote agents):

LocationContent
/var/log/wandisco/hivemigrator-remote-serverLogs
/etc/wandisco/hivemigrator-remote-serverConfiguration files
/opt/wandisco/hivemigrator-remote-serverJava archive files
/var/run/hivemigrator-remote-serverRuntime files