Skip to main content
Version: 1.18.1 (latest)

Troubleshooting

This article details issues that you may face when installing and using LiveData Migrator. Follow the provided steps if you encounter these issues.

Please ensure you have read the Prerequisites as you may experience problems if you miss any of these requirements.

We recommend making use of logs when troubleshooting LiveData Migrator. See Log Commands for information on how to enable logging across various levels. Logs for each component of LiveData Migrator are stored in the /var/log/wandisco/ directory within the LiveData Migrator installation directory, with a directory for each component, such as /var/log/wandisco/ui for the LiveData UI.

General#

Rule names parameter does not autocomplete in the CLI#

When adding the --rule-names parameter to the end of a hive migration add command, auto-completion will not suggest the parameter name. For example:

Example
WANdisco LiveData Migrator >> hive migration add --name test --source sourceAgent --target testGlue --rule-names

To work around this, either:

  • Use the --rule-names parameter earlier in the command. For example: WANdisco LiveData Migrator >> hive migration add --name test --rule-names
  • Use the Tab key twice in the CLI when attempting to autocomplete the parameter, and select --rule-names with the left and right arrow keys.

HiveMigrator configuration files missing when reinstalling LiveData Migrator on Ubuntu/Debian#

This issue will occur when you have removed the HiveMigrator package with apt-get remove instead of apt-get purge during the uninstall steps.

The /etc/wandisco/hivemigrator directory will be missing files as a result. The cause is that the Ubuntu package management tool (dpkg) stores service configuration information in its internal database and assumes this directory already has the needed files (even if they were manually removed).

To resolve this:

  1. Cleanup the dpkg database for the HiveMigrator service:

    rm -f /var/lib/dpkg/info/hivemigrator*
  2. Fully remove the HiveMigrator package again using dpkg and the --purge option:

    dpkg --purge hivemigrator
  3. Carry out the install steps for the new version of LiveData Migrator.

  4. If needed, install the HiveMigrator package using dpkg and the --force-confmiss option:

    Example
    dpkg -i --force-confmiss hivemigrator_1.3.1-518_all.deb

Manual JDBC driver configuration#

If using MariaDB or MSSQL, the JDBC driver must be manually added to the classpath, or the metadata migration will stall.

hivemigrator.log
2021-09-09 16:44:49,033 INFO com.wandisco.hivemigrator.agent.utils.JdbcUtil - [default-nioEventLoopGroup-3-4]: Loaded jdbc drivers: [class org.apache.derby.jdbc.EmbeddedDriver, *null*, class org.postgresql.Driver, *null*]

If the migration stalls, manually move the driver into place. Note that the driver version may vary.

  • mv mysql-connector-java-8.0.20 /opt/wandisco/hivemigrator/agent/hive/

LiveData Migrator account#

Reset admin user password#

If you have lost or otherwise need to change the admin user password without using the associated email address, refer to these instructions.

Microsoft Azure resources#

Insufficient container permissions with an Azure Data Lake Storage (ADLS) Gen2 target filesystem when using OAuth2 authentication#

When creating or updating an ADLS Gen2 target filesystem using the OAuth2 authentication protocol, you may have insufficient permission to guarantee a successful migration. This is usually because the Role Based Access Control on the service principal does not guarantee root access. In this case, the migration will fail to start (or resume) and issue a warning.

To force the migration to start (or resume) despite the warning, update the ADLS Gen2 filesystem with the following property and restart LiveData Migrator afterwards:

Property
fs.ignore-authentication-privileges=true
Example Usage
filesystem update adls2 oauth --file-system-id target --properties fs.ignore-authentication-privileges=true

Amazon Web Services (AWS) resources#

Failed to connect to LiveData Migrator when adding an S3 filesystem through the UI using access/secret keys#

When adding an S3 bucket as a filesystem through the LiveData UI, the following error may display when attempting to save the configuration:

Failed to connect to LiveData Migrator

This can be due to an incorrectly entered access or secret key. Double check that you provided the correct keys with no extra characters (including spaces), and try again.

Error Code: AccessDenied. Error Message: Access to the resource https://sqs.eu-west-1.amazonaws.com/ is denied.#

This problem arises if your account does not have sufficient SQS permissions to access the bucket resource. To fix this, ask your organisation administrator to assign the necessary privileges in the SQS policy manager.

For example, configuring an allow rule for sqs:* will allow all organization users configured with SQS to perform the necessary actions with LiveData Migrator.

Notifications#

Below are some of the most common notifications that you may encounter during the deployment or use of LiveData Migrator.

LiveMigratorPanicNotification#

When LiveData Migrator encounters an unexpected run-time exception, it will emit a log message with the notification LiveMigratorPanicNotification. The message provided, and therefore the resolution, will vary based on the cause of the exception. For example:

Example
2020-11-12 16:26:37.441 ERROR - [engine-pool-1 ] c.w.l.e.LM2UncaughtExceptionHandler : Uncaught exception in thread Thread[engine-pool-1,5,main], exception: java.lang.IllegalArgumentException: Wrong FS: hdfs://.livemigrator_55f9bf54-77fc-4bc1-95e9-0a378d938609, expected: hdfs://nmcnu01-vm0.bdfrem.wandisco.com at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:730) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:233) at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576) at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1573) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1588) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1683) at com.wandisco.livemigrator2.fs.hdfs.HdfsFileSystemWrapper.exists(HdfsFileSystemWrapper.java:154) at com.wandisco.livemigrator2.fs.hdfs.HdfsFileSystemWrapper$$FastClassBySpringCGLIB$$c15450b.invoke(<generated>) at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:771) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749) at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:88) at com.wandisco.livemigrator2.fs.FileSystemExceptionHandlerAspect.handleException(FileSystemExceptionHandlerAspect.java:19) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:644) at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:633) at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:70) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:95) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749) at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:691) at com.wandisco.livemigrator2.fs.hdfs.HdfsFileSystemWrapper$$EnhancerBySpringCGLIB$$57c6ec3a.exists(<generated>) at com.wandisco.livemigrator2.migration.MigratorEngine.createMarkerIfNecesssary(MigratorEngine.java:959) at com.wandisco.livemigrator2.migration.MigratorEngine.init(MigratorEngine.java:211) at com.wandisco.livemigrator2.migration.MigratorEngine.run(MigratorEngine.java:304) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2020-11-12 16:26:37.442 INFO - [engine-pool-1 ] c.w.l.n.NotificationManagerImpl : Notification: Notification{level=ERROR, type='LiveMigratorPanicNotification', message='Wrong FS: hdfs://.livemigrator_55f9bf54-77fc-4bc1-95e9-0a378d938609, expected: hdfs://nmcnu01-vm0.bdfrem.wandisco.com', id='urn:uuid:8bf396b3-2b58-473c-9e77-8cab70e88c04', timeStamp=1605198397441, code=40003, resolved=false, updatedTimeStamp=1605198397441, payload={}}

Any issue triggering this notification will cause the application to shut down with a return code of -1, indicating an abnormal termination.

HighPendingRegionNotification#

When directories are moved or modified during a migration, they are logged as pending regions. Exceeding the configured maximum number of pending regions, during migration, will cause the migration to abort.

This issue can be resolved by raising the maximum number of pending regions in the migration.

This notification displays when the number of pending regions exceeds the "high watermark" percentage of maximum pending regions, and is resolved when the number falls below the "low watermark" percentage.

Both watermarks may be configured by adding settings to application.properties. The following setting configures the high watermark percentage of pending regions:

Example
notifications.pending.region.warn.percent=60

And the following setting determines the low watermark percentage:

Example
notifications.pending.region.clear.percent=50

Error message "Can't access source events stream from the Kafka service."#

Migrations from IBM Cloud Object Storage use Kafka to pull out the source cluster's events. These events include filesystem changes that apply to the target cluster during the migration. If LiveData Migrator cannot communicate with the Kafka service, the migration will stall until communication with the service resumes.

note

The notification message is sent 10 minutes after contact with the Kafka service is lost.

Recommended steps

  1. Check the availability of the Kafka service.
  2. If the Kafka service is unavailable, restart the Kafka service.
  3. If there are very large numbers of queued changes we recommend that you reset the migration. It'll be faster and more reliable to rescan the source than attempt to continue the stalled migration.

Hive Migrator connection to the source or target timed-out#

Hive metadata migrations are set to fail if connections to either the target or source agent are lost for more than 20 minutes. Fix any failures by restarting the affected migrations.

If migrations continue to fail due to this timeout, consider increasing the connectionRetryTimeout parameter:

Changing connectionRetryTimeout parameter#

  1. Open /etc/wandisco/hivemigrator/hive-migrator.yaml.
  2. Uncomment out the connectionRetryTimeout parameter and change the default 20 minutes to something higher. It's better to make incremental increases and retest rather than immediately setting a very high value.
  3. Save the change.
  4. Restart the HiveMigrator service to enable the new configuration:
    service hivemigrator restart

Change metastore rescan rate#

Hive Migrator rescans the Hive metastore as soon as the previous scan finishes. If appropriate you can reduce the scan rate by updating the delayBetweenScanRounds parameter:

Changing delayBetweenScanRounds parameter#

  1. Open /etc/wandisco/hivemigrator/hive-migrator.yaml.
  2. Uncomment out the delayBetweenScanRounds parameter and change the default 1 second to something higher. If you introduce a large delay, test that migration performance is not significantly impacted.
  3. Save the change.
  4. Restart the HiveMigrator service to enable the new configuration:
    service hivemigrator restart

Kerberos#

Kerberos configuration#

If you're having issues configuring Kerberos for a filesystem, try the following:

Check the provided keytab is readable by the user operating LiveData Migrator.#

To test this, run the following commands (where ldmuser should be your user):

Example of authenticating 'ldmuser'
su ldmuserls -al /etc/security/keytabs/ldmuser.keytab

If the command fails, modify permissions on the directory to allow access for ldmuser.

Check the Kerberos principal is included within the keytab file.#

Inspect the keytab file's contents:

Example of listing ldmuser's keytab file contents
su ldmuserklist -kt /etc/security/keytabs/ldmuser.keytab

If ldmuser/hostname@REALM.COM is not in the keytab, create a keytab containing ldmuser/hostname@REALM.COM and copy it to the /etc/security/keytabs directory on the edge node running LiveData Migrator.

Check the Kerberos principal is valid.#

For example: a principal of ldmuser/hostname@REALM.COM and a keytab file ldmuser.keytab are valid.

To ensure principal validity, you can destroy all currently active authentication tickets in the cache and try initiating a new one:

Test validity of Kerberos principal
su ldmuserkdestroykinit -kt /etc/security/keytabs/ldmuser.keytab ldmuser/hostname@REALM.COMklist

If kinit fails and there is no principal in the cache, check the principal to ensure there are no password mismatches or other inconsistencies. In this case, the ldmuser principal and keytab file might need to be recreated.

Ensure the Kerberos principal is linked to a superuser: global access to filesystem operations is required.#

To test access, run the following commands to read the file tree, replacing the user details with your own:

Test superuser access by reading the file tree
su ldmuserkinit -kt /etc/security/keytabs/ldmuser.keytab ldmuser/hostname@REALM.COMhdfs dfs -ls /

If successful, the operation will return the HDFS file tree. Optionally, try creating a directory as well:

Test superuser access by creating a directory
hdfs dfs -mkdir /ldm_test

This creates an ldm_test directory if successful.

If either command fails, check auth_to_local rules are correctly configured, and that your user (in this case, ldmuser) is in the superuser group.

note

Additionally, if you're configuring Kerberos for a Hive metastore, the principal must be associated with the hive user or another superuser. For example: hive/hostname@REALM.COM

note

If Kerberos is disabled, and Hadoop configuration is on the host, LiveData Migrator detects the source filesystem automatically on startup.

Hadoop should be installed globally on the filesystem to allow LiveData Migrator to access Hadoop configuration during automatic detection. Alternatively, if you're running LiveData Migrator for a single user's environment, Hadoop should be made available to the agent running the service on the PATH environment variable:

Systemctl sudo systemctl set-environment PATH=$PATH

Message stream modified (41)#

If you encounter the error "Message stream modified (41) To try to automatically discover the source, please run 'filesystem auto-discover-source' for the type of filesystem you want to discover" and it is not resolved by performing the suggested action, fix the issue by modifying the user principal in the key distribution center:

Example with principal ldmuser/hostname@REALM.com
modprinc -maxrenewlife 90day +allow_renewable ldmuser/hostname@REALM.COM

Troubleshooting techniques#

Use these LiveData Migrator features to identify problems with migrations or filesystems.

Check path status#

You can check the status of a file path in either the UI or the CLI to determine whether any work is scheduled on the file.