Skip to main content
Version: 1.22.0

Configure Hive Migrator

The Hive Migrator service is responsible for migrating metadata and communication between agents. Use the following steps to configure the Hive Migrator service.

note

Data Migrator 1.19 Hive Migrator configuration changes

Hive Migrator configuration is now stored in /etc/wandisco/hivemigrator/application.properties.

The following configuration files have been removed:

  • /etc/wandisco/hivemigrator/application.yaml
  • /etc/wandisco/hivemigrator/hive-migrator.yaml

This update is automatically applied when upgrading from earlier product versions. There's no need to make any manual changes.

Security

Basic auth

Basic auth enables password-based authentication for Data Migrator and Hive Migrator.

Connect to Data Migrator and Hive Migrator with basic authentication

To use the basic auth for both Data Migrator and Hive Migrator, follow the steps in Configure basic auth.

Connect to Hive Migrator with basic authentication

Follow these steps if you used different credentials for Data Migrator and Hive Migrator, or if basic authentication isn't enabled on Data Migrator.

This step isn't required if you use the same credentials for both services.

When basic authentication is enabled, enter the username and password when prompted to connect to Hive Migrator with the CLI:

Example
  connect hivemigrator localhost: trying to connect...
Username: admin
Password: ***********
Connected to hivemigrator v1.<VERSION-NUMBER> on http://localhost:6780.

The username and password are required for direct access to the Hive Migrator REST API.

info

If you enable basic authentication on Hive Migrator, ensure you update the UI with the credentials to maintain functionality.

Transport Layer Security certificates

When you deploy a remote agent (for example, Azure SQL or AWS Glue), Hive Migrator establishes a Transport Layer Security (TLS) connection to the agent.

Certificates (and keys) are automatically generated for this connection for both Hive Migrator and the remote agent. These are placed in the following directories:

Hive Migrator - Client and Root CA certificates
/etc/wandisco/hivemigrator/client-key.pem
/etc/wandisco/hivemigrator/client-cert.pem
/etc/wandisco/hivemigrator/ca-cert.pem
/etc/wandisco/hivemigrator/ca-key.pem
/etc/wandisco/hivemigrator/ca-cert.srl
Remote agent - Server and Root CA certificates
/etc/wandisco/hivemigrator-remote-server/server-key.pem
/etc/wandisco/hivemigrator-remote-server/server-cert.pem
/etc/wandisco/hivemigrator-remote-server/ca-cert.pem

You can generate new certificates at any time or upload your own.

To update WANdisco UI with details for TLS, remove the product instance and then add it again with the updated TLS connection details.

Generate new certificates

info

Generate new certificates for Hive Migrator and all remote agents that are connected.

Generating certificates for just one of these components breaks existing connections.

Generate new certificates and keys by using the following Hive Migrator REST API endpoints:

Hive Migrator
POST ​/config​/certificates​/generate
Remote agent
POST ​/agents/{name}/certificates/generate

The remote agent service automatically restarts when you generate new certificates this way. You don't need to restart the Hive Migrator service to use the new certificates.

SSL authentication and encryption

info

Due to a limitation of the Micronaut framework, your key specified for the Hive Migrator API must be the first or only key contained within the keystore. Provide a keystore containing either the first or only key you require for the Hive Migrator API.

Secure Sockets Layer (SSL) for remote agents is enabled by default. To enable SSL/TLS for Hive Migrator to encrypt Hive Migrator's API, add the following properties to /etc/wandisco/hivemigrator/application.properties:

TSL Configuration in /etc/wandisco/hivemigrator/application.properties
hivemigrator.integration.liveDataMigrator.useSsl=true
hivemigrator.integration.liveDataMigrator.trust-store.path=/etc/wandisco/hivemigrator/tls/keystore.p12
hivemigrator.integration.liveDataMigrator.trust-store.password=password
hivemigrator.integration.liveDataMigrator.trust-store.type=PKCS12
micronaut.ssl.enabled=true
micronaut.ssl.buildSelfSigned=false
micronaut.ssl.port=6781
micronaut.ssl.key-store.path=file:/etc/wandisco/hivemigrator/tls/keystore.p12
micronaut.ssl.key-store.password=password
micronaut.ssl.key-store.type=PKCS12
micronaut.ssl.key.alias=hivemigrator
micronaut.server.port=6780
micronaut.server.dualProtocol=false

Each configuration property is described in the following table:

Certificate propertiesDescription
hivemigrator.integration.liveDataMigrator.useSsl=trueDecides whether Hive Migrator should use TLS to talk to Data Migrator.
hivemigrator.integration.liveDataMigrator.trust-store.path=/etc/wandisco/hivemigrator/tls/keystore.p12Decides what truststore should be used to determine whether Hive Migrator will trust the certificate obtained from Data Migrator.
hivemigrator.integration.liveDataMigrator.trust-store.password=examplepasswordThe password for the truststore defined in the above trust-store.path parameter.
hivemigrator.integration.liveDataMigrator.trust-store.type=PKCS12Decides the file type of the truststore defined in the above trust-store.path parameter.
micronaut.ssl.enabled=trueDecides whether Hive Migrator should use TLS on its own API.
micronaut.ssl.buildSelfSigned=falseDecides whether Hive Migrator should build some self-signed certificates. We recommend setting to false when using a custom truststore and keystore.
micronaut.ssl.port=6781The port to access the Hive Migrator API on TLS
micronaut.ssl.key-store.path=file:/etc/wandisco/hivemigrator/tls/keystore.p12Decides the path to the keystore that Hive Migrator will use.
micronaut.ssl.key-store.password=examplepasswordThe password for the above keystore.
micronaut.ssl.key-store.type=PKCS12The filetype to the above keystore.
micronaut.ssl.key.alias=hivemigratorThe alias of the certificate in the keystore that Hive Migrator will supply to clients.
micronaut.server.port=6780The NON-TLS port to access Hive Migrator.
micronaut.server.dualProtocol=falseDecides whether Hive Migrator should allow NON-TLS connections on the non-TLS port and allow TLS connections on the TLS port.

Using default truststore/keystore with generated certificates

To use SSL for authentication and encryption, complete the following steps:

  1. Generate self-signed certificates, assigning Hive Migrator as the client and the Hive Migrator remote agent as the server.
    Use the following files names:

    ca-cert.pem,
    client-cert.pem,
    client-key.pem,
    server-cert.pem,
    server-key.pem
  2. Copy the following files to the Hive Migrator directory /etc/wandisco/hivemigrator/:

    • ca-cert.pem
    • client-cert.pem
    • client-key.pem
  3. Copy the following files to the Hive Migrator remote server directory /etc/wandisco/hivemigrator-remote-server:

    • ca-cert.pem
    • server-cert.pem
    • server-key.pem
  4. Restart the service for the Hive Migrator remote server by running the command:

    service hivemigrator-remote-server restart

Upload your own certificates

info

Ensure the correct certificates and keys are uploaded for Hive Migrator and all remote agents that are connected.

Existing connections will break if the trust relationship isn't established between Hive Migrator and remote agents.

Upload certificates and keys by using the following Hive Migrator REST API endpoints:

Hive Migrator
POST ​/config​/certificates​/upload
Remote agent
POST ​/agents/{name}/certificates/upload

The remote agent service restarts automatically when new certificates are uploaded this way. The Hive Migrator service doesn't require a restart to start using new certificates.

Migration batch size

Use the following parameter to limit the number of objects that are sent in a single request to a target metastore:

Batch size parameter:
hivemigrator.migrationBatchSize=1000

The default is 1000. To change the parameter:

  1. Open /etc/wandisco/hivemigrator/application.properties in a text editor.
  2. Add the line hivemigrator.migrationBatchSize=<integer>, where <integer> is the maximum number of objects per request.
  3. Save the change.
  4. Restart the Hive Migrator service. See System service commands - Hive Migrator
info

AWS Glue Data Catalog allows a maximum of 100 objects per request.
When it's used as a target, set hivemigrator.migrationBatchSize to 100.

Hive Migrator logging

Configuration for the Hive Migrator log file is stored in /etc/wandisco/hivemigrator/log4j2.yaml.

  • Example log4j2.yaml file contents:

    Default log4j2.yaml:

    Configuration:
    packages: "com.wandisco.hivemigrator.configuration.log"
    monitorInterval: 300
    properties:
    property:
    - name: log.dir
    value: "/var/log/wandisco/hivemigrator"
    - name: log.level
    value: "info"
    - name: httpserver.level
    value: "info"
    - name: hivemetastore.level
    value: "warn"
    - name: metrics.level
    value: "off"
    - name: log.max.size
    value: 200
    - name: log.max.files
    value: 100
    appenders:
    Routing:
    name: routingAppender
    Routes:
    pattern: "${ctx:migrationId}"
    Route:
    -
    RollingFile:
    name: hiveMigratorLog
    fileName: "${log.dir}/hivemigrator.log"
    filePattern: "${log.dir}/hivemigrator.%d{yyyy-MM-dd'T'HH-mm-ss}.log.gz"
    PatternLayout:
    pattern: "%d{DEFAULT} %5p [%-15.15t] %-45c{1.} %15X{agentName}: %maskpass%n"
    charset: "UTF-8"
    Policies:
    OnStartupTriggeringPolicy: {}
    SizeBasedTriggeringPolicy:
    size: "${log.max.size} MB"
    DefaultRolloverStrategy:
    Delete:
    basePath: "${log.dir}"
    IfFileName:
    glob: "hivemigrator*.log.gz"
    IfAccumulatedFileCount:
    exceeds: ${log.max.files}
    key: "${ctx:migrationId}"
    -
    RollingFile:
    name: "migration-${ctx:migrationId}"
    fileName: "${log.dir}/migration-audit-${ctx:migrationId}.log"
    filePattern: "${log.dir}/migration-audit-${ctx:migrationId}.%d{yyyy-MM-dd'T'HH-mm-ss}.log.gz"
    PatternLayout:
    pattern: "%d{DEFAULT} %5p [%-15.15t] %-45c{1.} %15X{agentName}: %maskpass%n"
    charset: "UTF-8"
    Policies:
    OnStartupTriggeringPolicy: {}
    SizeBasedTriggeringPolicy:
    size: "${log.max.size} MB"
    DefaultRolloverStrategy:
    Delete:
    basePath: "${log.dir}"
    IfFileName:
    glob: "migration-audit-${ctx:migrationId}*.log.gz"
    IfAccumulatedFileCount:
    exceeds: ${log.max.files}
    RollingFile:
    - name: metricsLog
    fileName: "${log.dir}/metrics.log"
    filePattern: "${log.dir}/metrics.%d{yyyy-MM-dd'T'HH-mm-ss}.log.gz"
    createOnDemand: "true"
    PatternLayout:
    pattern: "%d{DEFAULT} - %maskpass%n"
    charset: "UTF-8"
    Policies:
    OnStartupTriggeringPolicy: {}
    SizeBasedTriggeringPolicy:
    size: "${log.max.size} MB"
    DefaultRolloverStrategy:
    Delete:
    basePath: "${log.dir}"
    IfFileName:
    glob: "metrics*.log.gz"
    IfAccumulatedFileCount:
    exceeds: ${log.max.files}
    - name: httpAuditLog
    fileName: "${log.dir}/http-audit.log"
    filePattern: "${log.dir}/http-audit.%d{yyyy-MM-dd'T'HH-mm-ss}.log.gz"
    PatternLayout:
    pattern: "%d{DEFAULT} %5p [%-15.15t] %-45c{1.}: %maskpass%n"
    charset: "UTF-8"
    Policies:
    OnStartupTriggeringPolicy: {}
    SizeBasedTriggeringPolicy:
    size: "${log.max.size} MB"
    DefaultRolloverStrategy:
    Delete:
    basePath: "${log.dir}"
    IfFileName:
    glob: "http-audit*.log.gz"
    IfAccumulatedFileCount:
    exceeds: ${log.max.files}

    Loggers:
    logger:
    - name: io.micrometer
    level: "${metrics.level}"
    additivity: false
    AppenderRef:
    ref: metricsLog
    - name: com.wandisco.hivemigrator.rest.filters.HttpLoggingFilter
    level: "${httpserver.level}"
    additivity: false
    AppenderRef:
    ref: httpAuditLog
    # - name: "migration-audit"
    # level: "${log.level}"
    # additivity: false
    # AppenderRef:
    # ref: migrationAudit
    - name: org.apache.hadoop.hive.metastore
    level: "${hivemetastore.level}"
    Root:
    level: "${log.level}"
    AppenderRef:
    ref: routingAppender

Enable Hive Migrator debugging mode

Use the following steps to investigate authentication/authorization problems when using a HDFS source or target:

  1. Open /etc/wandisco/hivemigrator/vars.sh.

  2. Add the following JVM argument:

    HVM_EXTRA_JVM_ARGS="-Dsun.security.krb5.debug=true -Dsun.security.jgss.debug=true -Dsun.security.spnego.debug=true"
  3. Add the following log location parameter:

    LOG_OUT_FILE=/var/log/wandisco/hivemigrator/hivemigrator.out
  4. Save the changes.

  5. Restart the Hive Migrator service. See System service commands - Hive Migrator

  6. IMPORTANT: Once you've completed your investigation, reverse the changes. The debug mode generates massive log files that may exhaust your available storage.

Learn more about adding additional JVM arguments in the knowledge base.

Perform a Data Migrator Java connection debug capture

Used the following steps to investigate authentication/authorization problems when using a HDFS source or target:

  1. Open /etc/wandisco/hivemigrator/vars.sh
  2. Add the following JVM argument:
    HVM_EXTRA_JVM_ARGS="-Dsun.security.krb5.debug=true -Dsun.security.jgss.debug=true -Dsun.security.spnego.debug=true"
  3. Add the following log location parameter:
    LOG_OUT_FILE=/var/log/wandisco/hivemigrator/hivemigrator.out
  4. Restart Hive Migrator.

Learn more about adding additional JVM arguments in the knowledge base.

Directory structure

The following directories are used for Hive Migrator:

LocationContent
/var/log/wandisco/hivemigratorLogs
/etc/wandisco/hivemigratorConfiguration files
/opt/wandisco/hivemigratorJava archive files
/var/run/hivemigratorRuntime files

Remote servers

The following directories are used for Hive Migrator remote servers (remote agents):

LocationContent
/var/log/wandisco/hivemigrator-remote-serverLogs
/etc/wandisco/hivemigrator-remote-serverConfiguration files
/opt/wandisco/hivemigrator-remote-serverJava archive files
/var/run/hivemigrator-remote-serverRuntime files