Version: 2.0

Transient failure handling

Transient failures are errors that occur in when migrating data. There can be a variety of reasons for transient failures:

Network interruptions
System maintenance
Expired access keys
Directory quotas filled on the target filesystem

You can manage and configure the retry duration so you don't get unnecessary error notifications when you have system maintenance or planned downtime.

Failed paths and retries

Failed paths are files and directories that Data Migrator tried to migrate from the source filesystem to the target filesystem unsuccessfully, and which Data Migrator won't retry.

To retry a failed path, you can do one or both of the following:

Retries are files and directories that Data Migrator tried to migrate unsuccessfully, and which Data Migrator continues to retry for a specified time period.

You can configure the time period Data Migrator retries failed paths so you don't need to add paths to be rescanned or reset migrations during your configured time period. This gives you time to renew expired access keys, increase directory quotas, and so forth.

When the configured retry duration ends for an event, the event becomes a failed path and Data Migrator doesn't retry.

Prerequisites

Plan your memory usage

note

To ensure events are queued for an extended period, and to avoid out-of-memory (OOM) situations, plan sufficient memory usage for the number of events you have for your migrations.

Data Migrator events consume roughly 0.77KB each. As more events are consumed and retried, memory usage will grow accordingly. If you have very busy migrations, you may want to ensure there is a buffer of memory available for retries.

Set up notifications

To set up email notifications for failed file transfers, go to Configuration and select Email Notifications.
Select the checkbox Retry occurred.

For more information, see Configure email notifications.

View details of failed files and retries

Data Migrator provides details on failed files and retries to help you identify the causes for failures. These causes can include missing permissions on a specific target file path or file sizes that exceed a threshold value.

View the following details on the Retries page:

Metric	Description
Total Active Retries	The total number of events or operations that Data Migrator tried more than once.
Total paths being retried	The total number of events that this Data Migrator instance is retrying in the configured retry period for one migration.
Path	The path of the directory or file in the source filesystem on which Data Migrator tried an operation more than once.
Reason	The last reason why the operation on the path didn’t succeed.
Times retried	The total number of times that Data Migrator tried to migrate this file.
Timestamp	The Coordinated Universal Time when Data Migrator last tried to migrate this file.

To view the number of retries:

Select a migration from the Migrations panel and go to the Migration Status page. You can see the number of active retry paths on this page.
Select View details to view more information on the Retries page.

Go to Notifications to view whether a Data Migrator instance has tried to transfer a failed file more than once.
Select View details to see which migrations and files have failed and retried.

Configure the retry duration

You can configure the maximum time Data Migrator tries to migrate failed files. This configuration gives you more time to identify patterns of failures and the root cause, and resolve issues such as network problems, permissions on file paths, and file size thresholds.

On the Retries page, enter the number of hours you want Data Migrator to keep trying to migrate failed files. The maximum configurable duration for retries is 12 hours.

View failed files and retries in the CLI

To view the current number of actively requeuing operations across all migrations, run the diagnostics summary.
The value of property totalActionsRequeuing is the total number of actions being retried.
To run the diagnostics summary, see the Diagnostics section of this guide.
To view the total number of actions being retried for one migration, run the following command:
migration show
The value of property numberPathRequeued is the total number of paths being retried.

Configure the retry duration for failed file retries in the CLI

The default value for the retry duration is 43,200 seconds (12 hours). To configure the duration in the CLI, run the following commands:

Retrieve the current configuration

Example, retrieve the current requeue.time.limit.seconds

configuration get --key requeue.time.limit.seconds

Example, requeue.time.limit.seconds

configuration get requeue.time.limit.seconds

Apply new configuration

Example, set the requeue.time.limit.seconds

configuration set --key requeue.time.limit.seconds  --value  <number_of_seconds>

Example, set the requeue.time.limit.seconds

configuration set requeue.time.limit.seconds --value <number_of_seconds>

Learn more

To understand why paths and files fail to migrate, see Failed paths.
You can also monitor and troubleshoot Data Migrator and monitor failed operations to identify problems that cause failures.
Add a path to be rescanned for migration.

Failed paths and retries​

Prerequisites​

Plan your memory usage​

Set up notifications​

View details of failed files and retries​

Configure the retry duration​

View failed files and retries in the CLI​

Configure the retry duration for failed file retries in the CLI​

Retrieve the current configuration​

Apply new configuration​

Learn more​