Troubleshooting
This article details issues that you may face when installing and using LiveData Migrator. Follow the provided steps if you encounter these issues.
Please ensure you have read the Prerequisites as you may experience problems if you miss any of these requirements.
We recommend making use of logs when troubleshooting LiveData Migrator. See Log Commands for information on how to enable logging across various levels. Logs for each component of LiveData Migrator are stored in the /var/log/wandisco/
directory within the LiveData Migrator installation directory, with a directory for each component, such as /var/log/wandisco/ui
for the LiveData UI.
#
General#
Rule names parameter does not autocomplete in the CLIWhen adding the --rule-names
parameter to the end of a hive migration add
command, auto-completion will not suggest the parameter name. For example:
WANdisco LiveData Migrator >> hive migration add --name test --source sourceAgent --target testGlue --rule-names
To work around this, either:
- Use the
--rule-names
parameter earlier in the command. For example:WANdisco LiveData Migrator >> hive migration add --name test --rule-names
- Use the Tab key twice in the CLI when attempting to autocomplete the parameter, and select
--rule-names
with the left and right arrow keys.
#
HiveMigrator configuration files missing when reinstalling LiveData Migrator on Ubuntu/DebianThis issue will occur when you have removed the HiveMigrator package with apt-get remove
instead of apt-get purge
during the uninstall steps.
The /etc/wandisco/hivemigrator
directory will be missing files as a result. The cause is that the Ubuntu package management tool (dpkg) stores service configuration information in its internal database and assumes this directory already has the needed files (even if they were manually removed).
To resolve this:
Cleanup the dpkg database for the HiveMigrator service:
rm -f /var/lib/dpkg/info/hivemigrator*
Fully remove the HiveMigrator package again using
dpkg
and the--purge
option:dpkg --purge hivemigrator
Carry out the install steps for the new version of LiveData Migrator.
If needed, install the HiveMigrator package using
dpkg
and the--force-confmiss
option:Exampledpkg -i --force-confmiss hivemigrator_1.3.1-518_all.deb
#
Manual JDBC driver configurationIf using MariaDB or MSSQL, the JDBC driver must be manually added to the classpath, or the metadata migration will stall.
2021-09-09 16:44:49,033 INFO com.wandisco.hivemigrator.agent.utils.JdbcUtil - [default-nioEventLoopGroup-3-4]: Loaded jdbc drivers: [class org.apache.derby.jdbc.EmbeddedDriver, *null*, class org.postgresql.Driver, *null*]
If the migration stalls, manually move the driver into place. Note that the driver version may vary.
mv mysql-connector-java-8.0.20 /opt/wandisco/hivemigrator/agent/hive/
#
LiveData Migrator account#
Reset admin user passwordIf you have lost or otherwise need to change the admin user password without using the associated email address, refer to these instructions.
#
Microsoft Azure resources#
Insufficient container permissions with an Azure Data Lake Storage (ADLS) Gen2 target filesystem when using OAuth2 authenticationWhen creating or updating an ADLS Gen2 target filesystem using the OAuth2 authentication protocol, you may have insufficient permission to guarantee a successful migration. This is usually because the Role Based Access Control on the service principal does not guarantee root access. In this case, the migration will fail to start (or resume) and issue a warning.
To force the migration to start (or resume) despite the warning, update the ADLS Gen2 filesystem with the following property and restart LiveData Migrator afterwards:
fs.ignore-authentication-privileges=true
filesystem update adls2 oauth --file-system-id target --properties fs.ignore-authentication-privileges=true
#
Amazon Web Services (AWS) resources#
Failed to connect to LiveData MigratorThis error appears when you try to add an S3 bucket in the UI with any of the following problems:
- You've made a mistake or typo while entering an access or secret key.
- Your bucket contains a dot (.) in the name.
Check that you've entered your access and secret keys correctly with no extra characters, and follow the recommendations in the bucket naming rules guide when you create an S3 bucket.
https://sqs.eu-west-1.amazonaws.com/ is denied.#
Error Code: AccessDenied. Error Message: Access to the resourceThis problem arises if your account does not have sufficient SQS permissions to access the bucket resource. To fix this, ask your organisation administrator to assign the necessary privileges in the SQS policy manager.
For example, configuring an allow rule for sqs:*
will allow all organization users configured with SQS to perform the necessary actions with LiveData Migrator.
#
NotificationsBelow are some of the most common notifications that you may encounter during the deployment or use of LiveData Migrator.
#
LiveMigratorPanicNotificationWhen LiveData Migrator encounters an unexpected run-time exception, it will emit a log message with the notification LiveMigratorPanicNotification
. The message provided, and therefore the resolution, will vary based on the cause of the exception. For example:
2020-11-12 16:26:37.441 ERROR - [engine-pool-1 ] c.w.l.e.LM2UncaughtExceptionHandler : Uncaught exception in thread Thread[engine-pool-1,5,main], exception: java.lang.IllegalArgumentException: Wrong FS: hdfs://.livemigrator_55f9bf54-77fc-4bc1-95e9-0a378d938609, expected: hdfs://nmcnu01-vm0.bdfrem.wandisco.com at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:730) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:233) at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576) at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1573) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1588) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1683) at com.wandisco.livemigrator2.fs.hdfs.HdfsFileSystemWrapper.exists(HdfsFileSystemWrapper.java:154) at com.wandisco.livemigrator2.fs.hdfs.HdfsFileSystemWrapper$$FastClassBySpringCGLIB$$c15450b.invoke(<generated>) at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:771) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749) at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:88) at com.wandisco.livemigrator2.fs.FileSystemExceptionHandlerAspect.handleException(FileSystemExceptionHandlerAspect.java:19) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:644) at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:633) at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:70) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:95) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749) at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:691) at com.wandisco.livemigrator2.fs.hdfs.HdfsFileSystemWrapper$$EnhancerBySpringCGLIB$$57c6ec3a.exists(<generated>) at com.wandisco.livemigrator2.migration.MigratorEngine.createMarkerIfNecesssary(MigratorEngine.java:959) at com.wandisco.livemigrator2.migration.MigratorEngine.init(MigratorEngine.java:211) at com.wandisco.livemigrator2.migration.MigratorEngine.run(MigratorEngine.java:304) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2020-11-12 16:26:37.442 INFO - [engine-pool-1 ] c.w.l.n.NotificationManagerImpl : Notification: Notification{level=ERROR, type='LiveMigratorPanicNotification', message='Wrong FS: hdfs://.livemigrator_55f9bf54-77fc-4bc1-95e9-0a378d938609, expected: hdfs://nmcnu01-vm0.bdfrem.wandisco.com', id='urn:uuid:8bf396b3-2b58-473c-9e77-8cab70e88c04', timeStamp=1605198397441, code=40003, resolved=false, updatedTimeStamp=1605198397441, payload={}}
Any issue triggering this notification will cause the application to shut down with a return code of -1, indicating an abnormal termination.
#
HighPendingRegionNotificationWhen directories are moved or modified during a migration, they are logged as pending regions. Exceeding the configured maximum number of pending regions, during migration, will cause the migration to abort.
This issue can be resolved by raising the maximum number of pending regions in the migration.
This notification displays when the number of pending regions exceeds the "high watermark" percentage of maximum pending regions, and is resolved when the number falls below the "low watermark" percentage.
Both watermarks may be configured by adding settings to application.properties. The following setting configures the high watermark percentage of pending regions:
notifications.pending.region.warn.percent=60
And the following setting determines the low watermark percentage:
notifications.pending.region.clear.percent=50
#
Error message "Can't access source events stream from the Kafka service."Migrations from IBM Cloud Object Storage use Kafka to pull out the source cluster's events. These events include filesystem changes that apply to the target cluster during the migration. If LiveData Migrator cannot communicate with the Kafka service, the migration will stall until communication with the service resumes.
note
The notification message is sent 10 minutes after contact with the Kafka service is lost.
Recommended steps
- Check the availability of the Kafka service.
- If the Kafka service is unavailable, restart the Kafka service.
- If there are very large numbers of queued changes we recommend that you reset the migration. It'll be faster and more reliable to rescan the source than attempt to continue the stalled migration.
#
Hive Migrator connection to the source or target timed-outHive metadata migrations are set to fail if connections to either the target or source agent are lost for more than 20 minutes. Fix any failures by restarting the affected migrations.
If migrations continue to fail due to this timeout, consider increasing the connectionRetryTimeout
parameter:
#
Changing connectionRetryTimeout parameter- Open /etc/wandisco/hivemigrator/hive-migrator.yaml.
- Uncomment out the
connectionRetryTimeout
parameter and change the default 20 minutes to something higher. It's better to make incremental increases and retest rather than immediately setting a very high value. - Save the change.
- Restart the HiveMigrator service to enable the new configuration:
service hivemigrator restart
#
Change metastore rescan rateHive Migrator rescans the Hive metastore as soon as the previous scan finishes. If appropriate you can reduce the scan rate by updating the delayBetweenScanRounds
parameter:
#
Changing delayBetweenScanRounds parameter- Open /etc/wandisco/hivemigrator/hive-migrator.yaml.
- Uncomment out the
delayBetweenScanRounds
parameter and change the default 1 second to something higher. If you introduce a large delay, test that migration performance is not significantly impacted. - Save the change.
- Restart the HiveMigrator service to enable the new configuration:
service hivemigrator restart
#
Kerberos#
Kerberos configurationIf you're having issues configuring Kerberos for a filesystem, try the following:
#
Check the provided keytab is readable by the user operating LiveData Migrator.To test this, run the following commands (where ldmuser
should be your user):
su ldmuserls -al /etc/security/keytabs/ldmuser.keytab
If the command fails, modify permissions on the directory to allow access for ldmuser
.
#
Check the Kerberos principal is included within the keytab file.Inspect the keytab file's contents:
su ldmuserklist -kt /etc/security/keytabs/ldmuser.keytab
If ldmuser/hostname@REALM.COM
is not in the keytab, create a keytab containing ldmuser/hostname@REALM.COM
and copy it to the /etc/security/keytabs
directory on the edge node running LiveData Migrator.
#
Check the Kerberos principal is valid.For example: a principal of ldmuser/hostname@REALM.COM
and a keytab file ldmuser.keytab
are valid.
To ensure principal validity, you can destroy all currently active authentication tickets in the cache and try initiating a new one:
su ldmuserkdestroykinit -kt /etc/security/keytabs/ldmuser.keytab ldmuser/hostname@REALM.COMklist
If kinit
fails and there is no principal in the cache, check the principal to ensure there are no password mismatches or other inconsistencies. In this case, the ldmuser
principal and keytab file might need to be recreated.
#
Ensure the Kerberos principal is linked to a superuser: global access to filesystem operations is required.To test access, run the following commands to read the file tree, replacing the user details with your own:
su ldmuserkinit -kt /etc/security/keytabs/ldmuser.keytab ldmuser/hostname@REALM.COMhdfs dfs -ls /
If successful, the operation will return the HDFS file tree. Optionally, try creating a directory as well:
hdfs dfs -mkdir /ldm_test
This creates an ldm_test
directory if successful.
If either command fails, check auth_to_local
rules are correctly configured, and that your user (in this case, ldmuser
) is in the superuser group.
note
Additionally, if you're configuring Kerberos for a Hive metastore, the principal must be associated with the hive
user or another superuser. For example: hive/hostname@REALM.COM
note
If Kerberos is disabled, and Hadoop configuration is on the host, LiveData Migrator detects the source filesystem automatically on startup.
Hadoop should be installed globally on the filesystem to allow LiveData Migrator to access Hadoop configuration during automatic detection. Alternatively, if you're running LiveData Migrator for a single user's environment, Hadoop should be made available to the agent running the service on the PATH environment variable:
Systemctl sudo systemctl set-environment PATH=$PATH
#
Message stream modified (41)If you encounter the error "Message stream modified (41) To try to automatically discover the source, please run 'filesystem auto-discover-source' for the type of filesystem you want to discover" and it is not resolved by performing the suggested action, fix the issue by modifying the user principal in the key distribution center:
modprinc -maxrenewlife 90day +allow_renewable ldmuser/hostname@REALM.COM
#
Troubleshooting techniquesUse these LiveData Migrator features to identify problems with migrations or filesystems.
#
Check path statusYou can check the status of a file path in either the UI or the CLI to determine whether any work is scheduled on the file.