Skip to main content
Version: 3.0 (latest)

Configure a S3 source

Removed files

If a file is removed from the source after a one-time migration has completed the initial scan, subsequent rescans will not remove the file from the target. Rescanning will only update existing files or add new ones; it will not remove anything from the target.

One exception to this is the removal of a partition. Removing a partition is an action taken in Hive, and the metadata change will be replicated live by Hive Migrator. The data under that partition will remain on the target regardless of whether it has been deleted on the source. However, since the partition was removed in the metadata, the data inside won't be visible to queries on the target

Configure an S3 bucket as a source

  1. From the Dashboard, select your Data Migrator product from the Instances panel.
  2. Under Filesystems & Agents, Select Filesystems then Add Source Filesystem.
  3. Select S3 from the Filesystem Type dropdown list.
  4. Enter the following details:
    • Display Name - The name you want to give your source filesystem.
    • Bucket Name - The reference name of the S3 bucket you are using.
    • Access Key - Enter the access key. For example, RANDOMSTRINGACCESSKEY. If you have configured a Vault for secrets storage, use a reference to the value stored in your secrets store.
    • Secret Key - Enter the secret key that corresponds with your access key. For example, RANDOMSTRINGPASSWORD. If you have configured a Vault for secrets storage, use a reference to the value stored in your secrets store.
    • S3A Properties - Data Migrator uses Hadoop's S3A library to connect to S3 filesystems. Enter key/value pairs to apply additional properties.
      info

      You need to define an S3 endpoint using the fs.s3a.endpoint parameter so that Data Migrator can connect to your source. For example, fs.s3a.endpoint=https://example-s3-endpoint:80.

  5. Select Save.
Permissions and ownership of migrated items

When migrating files from an S3 source to an HDFS target, the user that writes the files will be the file owner. In Data Migrator, this is the user that is mapped to the principal used to authenticate with the target.

Additionally, S3 object stores don't retain RWX permissions. Anything migrated from an S3 object store to an HDFS target will have 'rwxrwxrwx' permissions.

Configure an S3 bucket as a source with the CLI

To create a source S3 filesystem, run the filesystem add s3a command in the CLI:

Example

filesystem add s3a --file-system-id mytarget
--bucket-name mybucket1
--credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
--access-key pkExampleAccessKeyiz
--secret-key OP87ChoExampleSecretKeyI904lT7AaDBGJpp0D
--source

For information on properties that are added by default for new S3A filesystems, see the Command reference s3a default properties.

For information on properties that you can customize for new S3A filesystems, see the Command reference s3a custom properties.