Skip to main content
Version: 1.18.1

Configure exclusions

Define exclusions to prevent certain files and directories from being migrated. There are three types of exclusions, which will exclude by:

  • File size
  • File and directory names (defined using regular expression patterns of either Java PCRE, Automata or GLOB type)
  • Last modification date of directories and files

Exclusion templates are associated with a filesystem, allowing you to selectively ignore content during migration when that filesystem is used as the source.

Use the UI or the CLI to configure exclusions.

Configure exclusions with the UI#

Assign exclusions to new or existing migrations.

note

Default exclusions will automatically apply to certain filesystems depending on the platform. For example, Azure Data Lake Storage (ADLS) Gen2 filesystem types have a maximum individual file size limit of 4.55TiB, and any files larger will be automatically excluded.

Add new exclusions#

  1. In the Filesystems list on the dashboard, click the settings for the appropriate filesystem.
  2. Select LiveData Migrator under the Processes list to display the exclusion templates.
  3. Click Add Exclusion Template to associate the exclusion with the selected filesystem and enter the parameters for the exclusion:
    • Exclusion type - Regex, File Size, or Date.
    • Name - The name given to the exclusion template (for example: 100gibfilelimit).
    • Description - A brief description of what the exclusion is doing. For example: "Files larger than 100GiB are excluded").
    • File Size = Value / Unit - The value and unit for the file size limit (for example: 100 GiB).
    • Regex = Regex - The regex pattern to be used for the filename exclusion (for example: ^test\.*).
    • Date = Select Date - Any files that have been modified before the specified date will be excluded during migrations.

Once the exclusion is added and passed validation, it appears on the exclusion list.

note

You can add exclusions to multiple migrations at once with the UI. For more information, see Bulk add exclusions to migrations with the UI.

Remove exclusions from the templates list#

  1. In the Filesystem list on the dashboard, click the settings for the appropriate filesystem.
  2. Select LiveData Migrator under the Processes list to display the exclusion templates.
  3. Click the trash icon.
note

This will not remove the exclusion from an existing migration. See the Remove exclusions from an existing migration section for guidance on how to do this.

You cannot remove default exclusions from the templates list, but you may remove them from an existing migration.

Configure exclusions with the CLI#

Exclusions constrain content migrated from a source file system. Adding exclusions to an existing migration will change the future actions performed for that migration, but will not affect previously migrated content.

Define exclusions#

Define exclusions so you can apply them to migrations.

CommandAction
exclusion add dateCreate a new date-based rule
exclusion add file-sizeCreate a new file size rule
exclusion add regexCreate a new regex exclusion rule

Manage exclusions#

CommandAction
exclusion deleteDelete an exclusion rule
exclusion listList all exclusion rules
exclusion showGet details for a particular exclusion rule

Default Exclusions#

When you create a new migration, default exclusions are added to it. The default exclusions will depend on the filesystem types used in the migration.

Default exclusions can be removed from the migration, but not from the system or the templates list.

Hadoop Distributed File System (HDFS)#

The default exclusions are as follows:

ExclusionExclusion typeDescription
(/|/.*/)\\.hive-staging.*Regex (Automata)Hive staging directories
(/|/.*/)\\.spark-staging-.*Regex (Automata)Spark staging directories
(/|/.*/)_temporary.*Regex (Automata)Spark temporary directories
(/|/.*/)\\.Trash(/.*)?Regex (Automata)HDFS trash directories
(/|/.*/)\\.snapshot(/.*)?Regex (Automata)HDFS Snapshot directories

The Hive or Spark directories are used to stage temporary files during Hive or Spark jobs. These are automatically deleted by Hive or Spark after use, and are excluded by default to avoid the migration of unnecessary data.

The HDFS Snapshot and trash directories are (generally) only relevant to the local cluster and excluded for the same reason as to avoid migration of unnecessary data.

Azure Data Lake Storage (ADLS) Gen2#

The default exclusions are as follows:

ExclusionExclusion typeDescription
[.|\\/]$Regex (JAVA_PCRE)File names cannot end with . or ' '
.*([^\\/]*\\/){255,}.*Regex (Automata)Blob names cannot exceed 254 path segments
.{1025,}Regex (JAVA_PCRE)File name length cannot exceed 1024
5 TBFile sizeFile size cannot exceed 5TB

These exclusions cover many of the limitations set by ADLS Gen2 directory and file naming rules.

Google Cloud Storage#

The default exclusions are as follows:

ExclusionExclusion typeDescription
.*[\r\n].*Regex (Automata)File name cannot contain carriage return or line feeds
.{1025,}Regex (JAVA_PCRE)File name length cannot exceed 1024
\\.\\.?Regex (Automata)File name cannot be named . or ..
16 TBFile sizeFile size cannot exceed 16TB

These exclusions cover the limitations set by Google Cloud object naming guidelines.