Transient failures are errors that occur in when migrating data. There can be a variety of reasons for transient failures:
- Network interruptions
- System maintenance
- Expired access keys
- Directory quotas filled on the target filesystem
You can manage and configure the retry duration so you don't get unnecessary error notifications when you have system maintenance or planned downtime.
If Data Migrator doesn't succeed, it marks the path as failed.
Failed paths are files and directories that Data Migrator tried to migrate from the source filesystem to the target filesystem unsuccessfully, and which Data Migrator won't retry.
To retry a failed path, you can do one or both of the following:
Retries are files and directories that Data Migrator tried to migrate unsuccessfully, and which Data Migrator continues to retry for a specified time period.
You can configure the time period Data Migrator retries failed paths so you don't need to add paths to be rescanned or reset migrations during your configured time period. This gives you time to renew expired access keys, increase directory quotas, and so forth.
When the configured retry duration ends for an event, the event becomes a failed path and Data Migrator doesn't retry.
To ensure events are queued for an extended period, and to avoid out-of-memory (OOM) situations, plan sufficient memory usage for the number of events you have for your migrations.
Your memory usage equals the number of events in a specified period multiplied by the number of bytes for each event.
For example, if each event requires 2 GB of memory in one hour, the total memory you require for your configured retry period of 12 hours is 12 x 2 GB = 24 GB.
- To set up email notifications for failed file transfers, go to Configuration and select Email Notifications.
- Select the checkbox Retry occurred.
For more information, see Configure email notifications.
Data Migrator provides details on failed files and retries to help you identify the causes for failures. These causes can include missing permissions on a specific target file path or file sizes that exceed a threshold value.
View the following details on the Retries page:
|Total Active Retries||The total number of events or operations that Data Migrator tried more than once.|
|Total paths being retried||The total number of events that this Data Migrator instance is retrying in the configured retry period for one migration.|
|Path||The path of the directory or file in the source filesystem on which Data Migrator tried an operation more than once.|
|Reason||The last reason why the operation on the path didn’t succeed.|
|Times retried||The total number of times that Data Migrator tried to migrate this file.|
|Timestamp||The Coordinated Universal Time when Data Migrator last tried to migrate this file.|
To view the number of retries:
- Select a migration from the Migrations panel and go to the Migration Status page. You can see the number of active retry paths on this page.
- Select View details to view more information on the Retries page.
- Go to Notifications to view whether a Data Migrator instance has tried to transfer a failed file more than once.
- Select View details to see which migrations and files have failed and retried.
You can configure the maximum time Data Migrator tries to migrate failed files. This configuration gives you more time to identify patterns of failures and the root cause, and resolve issues such as network problems, permissions on file paths, and file size thresholds.
On the Retries page, enter the number of hours you want Data Migrator to keep trying to migrate failed files. The maximum configurable duration for retries is 12 hours.
To view the current number of actively requeuing operations across all migrations, run the diagnostics summary.
The value of property
totalActionsRequeuingis the total number of actions being retried.
To run the diagnostics summary, see the Diagnostics section of this guide.
To view the total number of actions being retried for one migration, run the following command:
The value of property
numberPathRequeuedis the total number of paths being retried.
The default value for the retry duration is 43,200 seconds (12 hours). To configure the duration in the CLI, run the following commands:
configuration get --key requeue.time.limit.secondsor
configuration get requeue.time.limit.seconds
configuration set --key requeue.time.limit.seconds --number of seconds OKor
configuration set requeue.time.limit.seconds --number of seconds OK