Skip to main content
Version: 2.0

Auto source cleanup

note

Auto source cleanup is a feature that removes data from the source filesystem after the data is migrated successfully to a target.

To use this feature in production, please contact WANdisco Support.

Prerequisites

  • You have the Admin or Migration Manager role assigned.

    Learn more about user roles.

  • You have an ingest process set up to copy folders you want to migrate and clean up to a temporary space on your source filesystem.

    Add the temporary paths to your migration when you're creating it. In this version of the feature, we recommend using a script to copy your data into the folder for cleanup. In the next iteration of this feature, this won't be a prerequisite.

    note

    If you don’t change your ingest process and attempt to use auto source cleanup, Data Migrator simply won't migrate the data in the paths added to migrations. Notifications appear in Data Migrator, however, there is no effect on your source or target filesystems or data.

Use cases

If you ingest large volumes of data, we migrate the data to your cloud target and clean it from your source Hadoop Distributed File System (HDFS) to free up space, keeping your buffer size and costs to a minimum.

Data Migrator does the following:

  • Checks that source files exist on the target before removing them from the source.
  • Ignores files you’ve specified to exclude from the migration to your target so these aren't removed from your source either. See Configure exclusions.

We support this feature for the following use cases:

Enable auto source cleanup

info

Don't enable auto source cleanup for migrations with migration verification set up.

Using auto source cleanup and migration verification on the same migration will cause the verification report to list files intentionally deleted during auto source cleanup as discrepancies.

You can enable auto source cleanup when you create a migration, or at any time afterward.

  1. In the WANdisco® UI, create a migration with an HDFS source and a cloud target.
  2. Under Migration Type, select Live, Recurring, or One-time.
  3. Under Advanced Options, select Enable source cleanup.
  4. Select a deletion mode:
    • Immediately - Delete files from the source after verifying they’re on the target.
    • After a file has not changed for - Delete files from the source after a selected period of no activity.
      1. Enter the minimum number of hours or days you want files to have existed on the target before being deleted from source. Select hour(s) or day(s) from the dropdown.
        info

        Review the ingest contract for your selected deletion mode for guidance on interacting with your migration paths while auto source cleanup is enabled.

  5. Select the acknowledgement checkbox(es) to enable the auto source cleanup feature.
  6. Continue creating your migration.
note

Source files that are being updated after you’ve enabled auto source cleanup and started migrating data from the source won't be migrated or removed. You will receive notifications to this effect in Data Migrator.

info

Reenabling auto source cleanup will require a rescan of source data.

Non-recurring migrations

Changing (enabling, disabling, or reenabling) auto source cleanup settings will reset a non-recurring migration. If the migration was in a Running, Live, Scheduled, or Completed state before the change, it will restart.

Disable auto source cleanup

You can disable auto source cleanup at any time after enabling it.

  1. In the WANdisco® UI, go to the migration for which you want to disable auto source cleanup.
  2. Select the Auto Source Cleanup tab.
  3. Uncheck the Enable source cleanup checkbox.
  4. Select Save.

This will return the migration to the state it was in before auto source cleanup was enabled. When auto source cleanup is disabled, data will not be removed from the source filesystem after the data is migrated successfully to a target.

Check if auto source cleanup is enabled

Select the migration from your dashboard and go to the Auto Source Cleanup page. If the Enable source cleanup checkbox is selected, auto source cleanup is enabled.

Monitor the cleanup

On the Notifications page, you can view notifications for “unsupported events” on the source.

Unsupported events include changes made to files and directories that you added to the migration for which cleanup is enabled. Because we can’t remove source files or directories that are changing, we notify you of these events including file or path renames or new files added to paths, for example.

Ingest contracts

Immediately

  • Data Migrator can delete a file after it is made available for migration and successfully migrated to the target.

  • You can't interact with or modify paths within a migration with immediate deletion configured.

  • The only supported source filesystem operation for a migration with immediate deletion configured is moving content into the migration path atomically (using the mv command) from outside the migration.

  • If you replace existing content on the source for a migration with immediate deletion configured, there is no guarantee that the new content will be migrated. The old version of the file may be migrated and the new version deleted.

  • Depending on the Skip or Overwrite Settings, you can replace content on the target for a migration with immediate deletion configured if you verify that the path on the source is empty before writing to it.

    note

    If Data Migrator has deleted a path after successfully migrating it to the target, it is possible to rewrite the source content and expect that the new changes will be replicated to the target.

    Confirm that Data Migrator deleted a path by checking it doesn't exist on the source or checking the audit log to see if it's registered as a deleted path.

    New content written to the source path can be replicated safely by then adding a rescan directory to the path. For recurring migrations, the change will be picked up automatically in the future scan iterations.

    info

    If the target action policy for the migration is SKIP_IF_SIZE_MATCH, the new changes will only be replicated if the file size has changed.

  • In migrations with an event stream that have immediate deletion configured, Data Migrator ignores all events except for moving data into the migration from outside the migration.

After a file has not changed for x days/hours

  • Data Migrator can delete each individual file after it meets the following criteria:
    • The age of the file on the source is at least equal to the delay period
    • Is a file
    • The file on the source is older than the file on the target
    • The file exists on the target and the source, and is consistent
    • The file on the target is older than the delay period
  • A file that can be deleted by Data Migrator is not guaranteed to be deleted immediately.
  • Interaction with a file (reading/appending/replacing) ready for deletion is not safe or recommended.
  • Delete operations are not supported while auto source cleanup is enabled. This is to prevent deletions made by Data Migrator being replicated to the target.