Auto source cleanup is a feature that removes data from the source filesystem after the data is migrated successfully to a target.
To use this feature in production, please contact WANdisco Support.
You have the Admin or Migration Manager role assigned.
Auto source cleanup is available as a setting when you create migrations. However, this is not visible to users with the Migration Manager role.
Read more about user roles.
You have an ingest process set up to copy folders you want to migrate and clean up to a temporary space on your source filesystem.
Add the temporary paths to your migration when you're creating it. In this version of the feature, we recommend using a script to copy your data into the folder for cleanup. In the next iteration of this feature, this won't be a prerequisite.
If you don’t change your ingest process and attempt to use auto source cleanup, Data Migrator simply won't migrate the data in the paths added to migrations. Notifications appear in Data Migrator, however, there is no effect on your source or target filesystems or data.
If you ingest large volumes of data, we migrate the data to your cloud target and clean it from your source Hadoop Distributed File System (HDFS) to free up space, keeping your buffer size and costs to a minimum.
Data Migrator does the following:
- Checks that source files exist on the target before removing them from the source.
- Ignores files you’ve specified to exclude from the migration to your target so these aren't removed from your source either. See Configure exclusions.
We support this feature for the following use cases:
- HDFS to Amazon Simple Storage Service (Amazon S3)
- HDFS to Google Cloud Platform (GCP)
- HDFS to Azure Data Lake Storage Gen2 (ADLS Gen2)
Currently, we remove files as soon as they’re successfully migrated. There will be an option to defer removing source files after a specified period of time. You can check how long it takes for your largest files to be written to your source and use this length of time as your deferral time period for auto source cleanup with Data Migrator.
This will negate the need for you to alter your ingest process because files can be written and changed on the source filesystem and the cleanup will happen once that work is completed.
You can enable auto source cleanup when you create a migration, or at any time afterward.
- In the WANdisco® UI, create a migration with an HDFS source and a cloud target.
- Select Live migration or One-Time Migration.
- Under Advanced Options, select Enable source cleanup.
- Select the acknowledgement checkbox to enable the auto source cleanup feature.
Source files that are being updated after you’ve enabled auto source cleanup and started migrating data from the source won't be migrated or removed. You will receive notifications to this effect in Data Migrator.
Reenabling auto source cleanup will require a rescan of source data.
Changing (enabling, disabling, or reenabling) auto source cleanup settings will reset a non-recurring migration. If the migration was in a Running, Live, Scheduled, or Completed state before the change, it will restart.
You can disable auto source cleanup at any time after enabling it.
- In the WANdisco® UI, go to the migration for which you want to disable auto source cleanup.
- Select the Auto Source Cleanup tab.
- Uncheck the Enable source cleanup checkbox.
- Select Save.
This will return the migration to the state it was in before auto source cleanup was enabled. When auto source cleanup is disabled, data will not be removed from the source filesystem after the data is migrated successfully to a target.
You can’t enable the feature after a migration is created but you can check if a migration has the setting enabled.
Select the migration from your dashboard and go to the Auto Source Cleanup page.
On the Notifications page, you can view notifications for “unsupported events” on the source.
Unsupported events include changes made to files and directories that you added to the migration for which cleanup is enabled. Because we can’t remove source files or directories that are changing, we notify you of these events including file or path renames or new files added to paths, for example.