Skip to main content
Version: 1.19.1

Configure exclusions

Define exclusions to exclude specific directories and files from being migrated. You can use the following exclusion types:

Exclusion typeDescription
RegexEnter regular expression (regex) patterns for file and directory names (of either Java PCRE, Automata, or GLOB type). Filepaths that match the regex are excluded.
File sizeEnter a number and select the file size unit (bytes, GiB, and so on). Files larger than the value are excluded.
DateSelect a date and time (Coordinated Universal Time (UTC)) from when file changes should be excluded from migrations. Files modified on or after the date and time are excluded.

Default exclusions#

Data Migrator automatically applies default exclusions to specific filesystems depending on the platform. For example, Azure Data Lake Storage (ADLS) Gen2 filesystem types have a maximum individual file size limit of 4.55TiB, and any files larger are automatically excluded.

You can remove default exclusions from the migration, but not from the system or the exclusion templates list.

Hadoop Distributed File System (HDFS)#

The default exclusions are:

ExclusionExclusion typeDescription
(/|/.*/)..COPYING$.*Regex (Automata)HDFS copying files
(/|/.*/)\\.hive-staging.*Regex (Automata)Hive staging directories
(/|/.*/)\\.spark-staging-.*Regex (Automata)Spark staging directories
(/|/.*/)_temporary.*Regex (Automata)Spark temporary directories
(/|/.*/)\\.Trash(/.*)?Regex (Automata)HDFS trash directories
(/|/.*/)\\.snapshot(/.*)?Regex (Automata)HDFS Snapshot directories

The Hive or Spark directories are used to stage temporary files during Hive or Spark jobs. These are automatically deleted by Hive or Spark after use, and are excluded by default to avoid the migration of unnecessary data.

The HDFS Snapshot and trash directories are (generally) only relevant to the local cluster and excluded to avoid migration of unnecessary data.

Azure Data Lake Storage (ADLS) Gen2#

The default exclusions are:

ExclusionExclusion typeDescription
[.|\\/]$Regex (JAVA_PCRE)File names cannot end with . or ' '
.*([^\\/]*\\/){255,}.*Regex (Automata)Blob names cannot exceed 254 path segments
.{1025,}Regex (JAVA_PCRE)File name length cannot exceed 1024
5 TBFile sizeFile size cannot exceed 5TB

These exclusions cover many of the limitations set by ADLS Gen2 directory and file naming rules.

Google Cloud Storage#

The default exclusions are:

ExclusionExclusion typeDescription
.*[\r\n].*Regex (Automata)File name cannot contain carriage return or line feeds
.{1025,}Regex (JAVA_PCRE)File name length cannot exceed 1024
\\.\\.?Regex (Automata)File name cannot be named . or ..
16 TBFile sizeFile size cannot exceed 16TB

These exclusions cover the limitations set by Google Cloud object naming guidelines.

Configure exclusions#

You can define additional exclusions which you can apply to specific migrations to ignore any matching content.

You can configure exclusions with the UI or the CLI.

Configure exclusions with the UI#

Define and assign exclusions to new or existing migrations.

Add new exclusions#

Adding exclusions to an existing migration changes the future actions performed for that migration, but doesn't affect previously migrated content. To add new exclusions:

  1. Under Migrations, select Exclusion Templates.

  2. Select Add New Exclusion Template to associate the exclusion with the selected filesystem and enter the parameters for the exclusion:

    • Exclusion type - Regex, File Size, or Date.
    • Name - The name given to the exclusion template. For example, 100gibfilelimit.
    • Description - A brief description of what the exclusion is doing. For example, "Files larger than 100GiB are excluded".
    • File Size = Value / Unit - The value and unit for the file size limit. For example, 100 GiB.
    • Regex = Regex - The regex pattern to be used for the filename exclusion. For example, ^test\.*.
    • Date = Select Date - Any files that have been modified before the specified date are excluded during migrations.
  3. Go to Migrations and select Add exclusions from the Bulk Action dropdown list.

  4. Select the migrations in the list to which you want to apply the exclusion.

    note

    You can add multiple exclusions to multiple migrations at once with the UI. For more information, see Bulk add exclusions to migrations with the UI.

    You can add and remove exclusions when you are creating migrations. For more information, see Assign exclusions to a new migration.

Remove exclusions from the templates list#

Removing exclusions from an existing migration changes the future actions performed for that migration, but doesn't affect previously migrated content. To remove exclusions:

  1. Go to the Exclusion Templates page and search for the exclusion you want to remove.

  2. Select the trash can icon.

    note

    This doesn't remove the exclusion from an existing migration. For more information, see Remove exclusions from an existing migration.

    You can't remove default exclusions from the templates list, but you can remove them from an existing migration.