Skip to main content
Version: 2.0

Transient failure handling

Transient failures are errors that occur in when migrating data. There can be a variety of reasons for transient failures:

  • Network interruptions
  • System maintenance
  • Expired access keys
  • Directory quotas filled on the target filesystem

You can manage and configure the retry duration so you don't get unnecessary error notifications when you have system maintenance or planned downtime.

Failed paths and retries

Failed paths are files and directories that Data Migrator tried to migrate from the source filesystem to the target filesystem unsuccessfully, and which Data Migrator won't retry.

To retry a failed path, you can do one or both of the following:

Retries are files and directories that Data Migrator tried to migrate unsuccessfully, and which Data Migrator continues to retry for a specified time period.

You can configure the time period Data Migrator retries failed paths so you don't need to add paths to be rescanned or reset migrations during your configured time period. This gives you time to renew expired access keys, increase directory quotas, and so forth.

When the configured retry duration ends for an event, the event becomes a failed path and Data Migrator doesn't retry.

Prerequisites

Plan your memory usage

note

To ensure events are queued for an extended period, and to avoid out-of-memory (OOM) situations, plan sufficient memory usage for the number of events you have for your migrations.

Data Migrator events consume roughly 0.77KB each. As more events are consumed and retried, memory usage will grow accordingly. If you have very busy migrations, you may want to ensure there is a buffer of memory available for retries.

Set up notifications

  1. To set up email notifications for failed file transfers, go to Configuration and select Email Notifications.
  2. Select the checkbox Retry occurred.

For more information, see Configure email notifications.

View details of failed files and retries

Data Migrator provides details on failed files and retries to help you identify the causes for failures. These causes can include missing permissions on a specific target file path or file sizes that exceed a threshold value.

View the following details on the Retries page:

MetricDescription
Total Active RetriesThe total number of events or operations that Data Migrator tried more than once.
Total paths being retriedThe total number of events that this Data Migrator instance is retrying in the configured retry period for one migration.
PathThe path of the directory or file in the source filesystem on which Data Migrator tried an operation more than once.
ReasonThe last reason why the operation on the path didn’t succeed.
Times retriedThe total number of times that Data Migrator tried to migrate this file.
TimestampThe Coordinated Universal Time when Data Migrator last tried to migrate this file.

To view the number of retries:

  1. Select a migration from the Migrations panel and go to the Migration Status page. You can see the number of active retry paths on this page.
  2. Select View details to view more information on the Retries page.

Or

  1. Go to Notifications to view whether a Data Migrator instance has tried to transfer a failed file more than once.
  2. Select View details to see which migrations and files have failed and retried.

Configure the retry duration

You can configure the maximum time Data Migrator tries to migrate failed files. This configuration gives you more time to identify patterns of failures and the root cause, and resolve issues such as network problems, permissions on file paths, and file size thresholds.

On the Retries page, enter the number of hours you want Data Migrator to keep trying to migrate failed files. The maximum configurable duration for retries is 12 hours.

Learn more