Skip to main content
Version: 2.4.3 (latest)

Transient failure handling

Transient failures are errors that occur in when migrating data. There can be a variety of reasons for transient failures:

  • Network interruptions
  • System maintenance
  • Expired access keys
  • Directory quotas filled on the target filesystem

You can manage and configure the retry duration so you don't get unnecessary error notifications when you have system maintenance or planned downtime.

Failed paths and retries

Failed paths are files and directories that Data Migrator tried to migrate from the source filesystem to the target filesystem unsuccessfully, and which Data Migrator won't retry.

To retry a failed path, you can do one or both of the following:

Retries are files and directories that Data Migrator tried to migrate unsuccessfully, and which Data Migrator will retry after a specified time period.

You can configure the time period Data Migrator will requeue and retry failed paths. You don't need to add paths to be rescanned or reset migrations during your configured requeue time period. This gives you time to renew expired access keys, increase directory quotas, and so forth.

When the configured retry duration ends for an event, the event becomes a failed path and Data Migrator doesn't retry.

Set up notifications

  1. Select your Data Migrator product from the Instances in the dashboard.
  2. Go to Configuration and select Email Notifications.
  3. Enter the email address to send notifications too in the Email Address field.
  4. Select the checkbox Retry occurred.
  5. Select Save

For more information, see Configure email notifications.

View details of failed files and retries

Data Migrator provides details on failed files and retries to help you identify the causes for failures. These causes can include missing permissions on a specific target file path or file sizes that exceed a threshold value.

  1. Select your Data Migrator product from Instances on the dashboard.
  2. Go to Migrations and select Retries.

View the following details on the Retries page:

MetricDescription
Total Active RetriesThe total number of events or operations that Data Migrator tried more than once.
Total paths being retriedThe total number of events that this Data Migrator instance is retrying in the configured retry period for one migration.
PathThe path of the directory or file in the source filesystem on which Data Migrator tried an operation more than once.
ReasonThe last reason why the operation on the path didn’t succeed.
Times retriedThe total number of times that Data Migrator tried to migrate this file.
TimestampThe Coordinated Universal Time when Data Migrator last tried to migrate this file.

To view the number of retries for a specific migration:

  1. Select your Data Migrator product from Instances on the dashboard.
  2. Under Migrations, select Data Migrations.
  3. Select a migration from the Data Migrations panel. You can see the number of Active Retries on this page.
  4. Select View details to view more information on the Retries page.

Or

  1. Go to Notifications to view whether a Data Migrator instance has tried to transfer a failed file more than once.
  2. Select View details to see which migrations and files have failed and retried.

View failed files and retries in the CLI

To view the current number of actively requeuing operations across all migrations, run the Diagnostics summary.

Diagnostics Summary
status --diagnostics

The value of property Total number of currently requeuing actions: is the total number of actions being retried.

To view the total number of actions being retried for one migration, run the migration show command:

Migration details
migration show --migration-id <migration_name>

The value of property numberPathsRequeued is the total number of paths being retried.

Configure the requeue time limit.

note

To ensure events are queued for an extended period, and to avoid out-of-memory (OOM) situations, plan sufficient memory usage for the number of events you have for your migrations.

Data Migrator events consume roughly 0.77KB each. As more events are consumed and retried, memory usage will grow accordingly. If you have very busy migrations, you may want to ensure there is a buffer of memory available for retries.

You can configure the time period Data Migrator will requeue and retry paths. The Retry Duration in the UI sets the duration that a path should be queued and eventually retried. The default value for the requeue time limit is 43,200 seconds (12 hours). This configuration gives you more time to identify patterns of failures and the root cause, and resolve issues such as network problems, permissions on file paths, and file size thresholds.

  1. Select your Data Migrator product from Instances in the dashboard.
  2. Go to Migrations and select Retries.
  3. Under Retry Duration, enter the number of hours Data Migrator will requeue then retry an unsuccessful path.

Configure the requeue time limit in the CLI.

The requeue.time.limit.seconds in the CLI sets the duration that an event should be queued then retried. To configure the duration in the CLI, run the following commands:

Retrieve the current configuration
Example, retrieve the current requeue.time.limit.seconds
configuration get --key requeue.time.limit.seconds
Example, requeue.time.limit.seconds
configuration get requeue.time.limit.seconds
Apply new configuration
Example, set the requeue.time.limit.seconds
configuration set --key requeue.time.limit.seconds  --value  <number_of_seconds>
Example, set the requeue.time.limit.seconds
configuration set requeue.time.limit.seconds --value <number_of_seconds>

Learn more