Skip to main content
Version: 2.1.1

Auto source cleanup


Auto source cleanup is a feature that removes data from the source filesystem after the data is migrated successfully to a target.

To use this feature in production, please contact WANdisco Support.


  • You have the Admin or Migration Manager role assigned.

    Learn more about user roles.

  • You have an ingest process set up to copy folders you want to migrate and clean up to a temporary space on your source filesystem.


    To protect production data from being deleted, copy your data to a temporary folder.

    Add the temporary paths to your migration when you're creating it.

    Don't create a migration with auto source cleanup enabled on a production dataset path (for example, /data1). Instead, create a new folder (for example, /mig1) and make periodic copies of the data from /data1 to /mig1. When the data arrives in /mig1, Data Migrator transfers it and then removes it. This derisks inadvertently removing production data.

  • For local and NAS filesystems, Data Migrator needs write access to source files to perform auto source cleanup. Without write access, auto source cleanup won't work and files won't be removed from the source.

Use cases

If you ingest large volumes of data, we migrate the data to your cloud target and clean it from your source Hadoop Distributed File System (HDFS), local filesystem, or network-attached storage (NAS) to free up space, keeping your buffer size and costs to a minimum.

Data Migrator does the following:

  • Checks that source files exist on the target before removing them from the source.
  • Ignores files you’ve specified to exclude from the migration to your target so these aren't removed from your source. See Configure exclusions.

We support this feature for the following use cases:

Enable auto source cleanup


Don't enable auto source cleanup for migrations with migration verifications.

Using auto source cleanup and migration verification on the same migration will cause the verification report to list files intentionally deleted during auto source cleanup as discrepancies.

You can enable auto source cleanup when you create a migration, or at any time afterward.

  1. In the WANdisco® UI, create a migration with HDFS, NAS, or local filesystem as a source, and a cloud target.
  2. Under Migration Type, select Live, Recurring, or One-time.
  3. Under Advanced Options, select Enable source cleanup.
  4. Select a deletion mode:
    • Immediately - Delete files from the source after verifying they’re on the target.
    • After a file has not changed for - Delete files from the source after a selected period of no activity.
      1. Enter the minimum number of hours or days you want files to have existed on the target before being deleted. Select hour(s) or day(s) from the dropdown.

        Review the ingest contract for your selected deletion mode for guidance on interacting with your migration paths while auto source cleanup is enabled.

  5. Select the acknowledgement checkbox(es) to enable the auto source cleanup feature.
  6. Continue creating your migration.

Source files that are being updated after you’ve enabled auto source cleanup and started migrating data from the source won't be migrated or removed. You will receive notifications to this effect in Data Migrator.


Reenabling auto source cleanup will require a rescan of source data.
If you reenable auto source cleanup, it defaults to the previous deletion mode (Immediately or After a file has not changed for a specified amount of time).

Non-recurring migrations

Changing (enabling, disabling, or reenabling) auto source cleanup settings will reset a non-recurring migration. If the migration was in a Running, Live, Scheduled, or Completed state before the change, it will restart.

Disable auto source cleanup

You can disable auto source cleanup at any time after enabling it.

  1. In the WANdisco® UI, go to the migration for which you want to disable auto source cleanup.
  2. Select the Auto Source Cleanup tab.
  3. Uncheck the Enable source cleanup checkbox.
  4. Select Save.

This will return the migration to the state it was in before auto source cleanup was enabled. When auto source cleanup is disabled, data will not be removed from the source filesystem after the data is migrated successfully to a target.

Check if auto source cleanup is enabled

Select the migration from your dashboard and go to the Auto Source Cleanup page. If the Enable source cleanup checkbox is selected, auto source cleanup is enabled.

Monitor the cleanup


Only relevant to live migrations. Recurring and one-time migrations don't show unsupported events.

On the Notifications page, you can view notifications for “unsupported events” on the source.

Unsupported events include changes made to files and directories that you added to the migration for which cleanup is enabled. Because we can’t remove source files or directories that are changing, we notify you of these events including file or path renames or new files added to paths, for example.


To check the correct files have been removed from your source and to ensure you have accurate information for auditing purposes, you can access reports which you can download and share.

Reports are:

  • Created every four hours automatically.
    The reporting period for the current date is four hours.
    The first report runs for 00:00 - 03:59, the next for 04:00 - 07:59, 08:00 - 11:59, and so on.
  • Placed in a folder whose name is derived from the migration ID. The location of the folder is /opt/wandisco/livedata-migrator/db/sourcecleanup.
  • A record of all the files that have been removed from the source during cleanup.
  • A record of what has been deleted successfully.
  • Available for immediate and delayed deletes.
  • Available for download in the following file formats:
    • .jsonl (uncompressed)
    • tar.gz (compressed)
      The four hour reports are compressed into a daily report.

You can view and download a report while a migration is still in progress.

The reporting period for archived reports is 24 hours, for example, from 00:00 to 23:59.


If a migration is reset, the reporting still captures files that were removed from the source before the migration was reset. All cleanup operations after the reset are captured in the same report. The cleanup report is simply added to a directory that contains the new name of the reset migration.

Reporting with the UI

  1. Select a data migration for which auto source cleanup is enabled.
  2. Select Auto Source Cleanup and go to the Source Cleanup History panel.
    If files were removed from the source, you can see the the report files generated. Download the files to view them:
    • In the last 4 hours under Latest Reports. For example, 21.02.2023-08:00:00.jsonl, 21.02.2023-12:00:00.jsonl.
    • In the last 24 hours under Archived Reports. For example, 20.02.2023.jsonl.gz.
  3. To download reports to check which files were removed from your source filesystem and compare the results with your target filesystem, select the download icon for the report that matches your needs.

You can delete archived reports only.

View reports for deleted migrations

You can view reports for deleted migrations. After a migration is deleted in the UI, you can view the report in the directory /opt/wandisco/livedata-migrator/db/sourcecleanup. The sub-directory names for the cleanup reports are derived from the migration IDs.

Download reports for deleted migrations

You can download reports for deleted migrations using the CLI command migration deletion-report download. For more information, see the Command reference.

Ingest contracts


  • Data Migrator can delete a file after it is made available for migration and successfully migrated to the target.

  • You can't interact with or modify paths within a migration with immediate deletion configured.

  • The only supported source filesystem operation for a migration with immediate deletion configured is moving content into the migration path atomically (using the mv command) from outside the migration.

  • If you replace existing content on the source for a migration with immediate deletion configured, there is no guarantee that the new content will be migrated. The old version of the file may be migrated and the new version deleted.

  • Depending on the Skip or Overwrite Settings, you can replace content on the target for a migration with immediate deletion configured if you verify that the path on the source is empty before writing to it.


    If Data Migrator has deleted a path after successfully migrating it to the target, it is possible to rewrite the source content and expect that the new changes will be replicated to the target.

    Confirm that Data Migrator deleted a path by checking it doesn't exist on the source or checking the audit log to see if it's registered as a deleted path.

    New content written to the source path can be replicated safely by then adding a rescan directory to the path. For recurring migrations, the change will be picked up automatically in the future scan iterations.


    If the target action policy for the migration is SKIP_IF_SIZE_MATCH, the new changes will only be replicated if the file size has changed.

  • In migrations with an event stream that have immediate deletion configured, Data Migrator ignores all events except for moving data into the migration from outside the migration.

After a file has not changed for x days/hours

  • Data Migrator can delete each individual file after it meets the following criteria:
    • The age of the file on the source is at least equal to the delay period
    • Is a file
    • The file on the source is older than the file on the target
    • The file exists on the target and the source, and is consistent
    • The file on the target is older than the delay period
  • A file that can be deleted by Data Migrator is not guaranteed to be deleted immediately.
  • Interaction with a file (reading/appending/replacing) ready for deletion is not safe or recommended.
  • Delete operations are not supported while auto source cleanup is enabled. This is to prevent deletions made by Data Migrator being replicated to the target.