Configure an Amazon S3 target
You can migrate data to an Amazon Simple Storage Service (S3) bucket by configuring one as a target filesystem.
Follow these steps to create an Amazon S3 target:
Prerequisites
You need the following:
- An S3 bucket hosted on Amazon Web Services.
- Authentication details for your bucket, depending on your chosen credentials provider. See below for more information.
As of Data Migrator 2.1.1 hcfs.ssl.channel.mode replaces the use of fs.s3a.ssl.channel.mode and fs.azure.ssl.channel.mode which are no longer valid.
See SSL implementation for information on the property and values used.
Configure an Amazon S3 target filesystem in the UI
- From the Dashboard, select an instance under Instances. 
- In the Filesystems & Agents menu, select Filesystems. 
- Select Add target filesystem. 
- Enter the following details: - Filesystem Type - The type of filesystem target. Select Amazon S3. 
- Display Name - Enter a name for your target filesystem. 
- Bucket Name - The reference name of your Amazon S3 bucket. 
- Authentication Method - The Java class name of a credentials provider for authenticating with the S3 endpoint. - The Authentication Method options available include: - Access Key and Secret - org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider- Use this provider to enter credentials as an access key and secret access key with the following entries: - Access Key - Enter the AWS access key. For example, - RANDOMSTRINGACCESSKEY. If you have configured a Vault for secrets storage, use a reference to the value stored in your secrets store.
- Secret Key - Enter the secret key that corresponds with your Access Key. For example, - RANDOMSTRINGPASSWORD. If you have configured a Vault for secrets storage, use a reference to the value stored in your secrets store.
 
- AWS Identity and Access Management - com.amazonaws.auth.InstanceProfileCredentialsProvider- Use this provider if you're running Data Migrator on an EC2 instance that has been assigned an IAM role with policies that allow it to access the S3 bucket. 
- AWS Hierarchical Credential Chain - com.amazonaws.auth.DefaultAWSCredentialsProviderChain- A commonly used credentials provider chain that looks for credentials in this order: - Environment Variables - AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEY, orAWS_ACCESS_KEYandAWS_SECRET_KEY.
- Java System Properties - aws.accessKeyIdandaws.secretKey.
- Web Identity Token credentials from the environment or container.
- Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI.
- Credentials delivered through the Amazon EC2 container service if the AWS_CONTAINER_CREDENTIALS_RELATIVE_URIenvironment variable is set and the security manager has permission to access the variable.
- Instance profile credentials delivered through the Amazon EC2 metadata service.
 
- Environment Variables - 
- Environment Variables - com.amazonaws.auth.EnvironmentVariableCredentialsProvider- Use this provider to enter an access key and a secret access key as either - AWS_ACCESS_KEY_IDand- AWS_SECRET_ACCESS_KEY, or- AWS_ACCESS_KEYand- AWS_SECRET_KEY.
- EC2 Instance Metadata Credentials - com.amazonaws.auth.InstanceProfileCredentialsProvider- Use this provider if you need instance profile credentials delivered through the Amazon EC2 metadata service. 
- Profile Credentials Provider - com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider- Use this provider to enter a custom profile configured to access Amazon S3 storage. You can find AWS credential information in a local file named - credentialsin a folder named- .awsin your home directory.- Enter an AWS Named Profile and a Credentials File Path. For example, - ~/.aws/credentials.
- Custom Provider Class - Use this if you want to enter a class for the credentials provider. 
- JCEKS Keystore - hadoop.security.credential.provider.path- This authentication method uses an access key and a secret key for Amazon S3 contained in a Java Cryptography Extension KeyStore (JCEKS). The keystore needs to contain values for the access key and the secret key. - The access and secret keys are already in the keystore properties file, so you don't need to enter them once you've saved the path. info- You must add the - fs.s3a.security.credential.provider.filesystem.idand- fs.s3a.security.credential.provider.pathadditional properties when you're using a JCEKS keystore.info- You can't select JCEKS Keystore if you don't have a HDFS target configured. The HDFS resource must exist on the same Data Migrator instance as the Amazon S3 filesystem you're adding. - JCEKS HDFS - Select the HDFS filesystem where your JCEKS file is located. 
- JCEKS Keystore Path - Enter the path containing the JCEKS keystore. For example, - jceks://hdfs@active-namenode-host:8020/credentials/aws/aws.jceks.
 info- You must provide an endpoint when using JCEKS for an - s3a-vpctype of S3 bucket.- JCEKS on HDFS with Kerberos - You must add the - dfs.namenode.kerberos.principal.patternconfiguration property.- Include the following steps when you add an HDFS source or target with Kerberos: 
 - Expand S3A Properties and select + Add Key/ValuePair. 
- Add the key - dfs.namenode.kerberos.principal.patternand the value- *.
- Select Save, then restart Data Migrator. - The property - dfs.namenode.kerberos.principal.patternprovides a regular expression wildcard that allows realm authentication. You need to use this if the realms on your source or target filesystems don't have matching truststores or principal patterns.note- When deleting filesystems with JCEKS authentication configured, delete the Amazon S3 filesystem before the HDFS. 
 
- S3 Service Endpoint - The Amazon S3 endpoint for your S3 bucket. 
- S3 Properties - Add optional properties to your S3 target as key-value pairs. 
 
- Select Save. You can now use your Amazon S3 target in data migrations. 
Configure an Amazon S3 target filesystem in the CLI
To create an Amazon S3 target in the Data Migrator CLI, run the filesystem add s3a command:
See all the commands options below, the ones not applicable to a target are listed under Other parameters
filesystem add s3a            [--file-system-id] string  
                              [--bucket-name] string  
                              [--endpoint] string  
                              [--access-key] string  
                              [--secret-key] string  
                              [--sqs-queue] string  
                              [--sqs-endpoint] string  
                              [--credentials-provider] string  
                              [--source]  
                              [--scan-only]  
                              [--properties-files] list  
                              [--properties] string  
                              [--s3type] string  
                              [--bootstrap.servers] string  
                              [--topic] string
For guidance about access, permissions, and security when adding an Amazon S3 bucket as a target filesystem, see Security best practices in IAM.
Amazon S3 mandatory parameters
- --s3typeEnter the value- aws.
- --file-system-idThe ID for the new filesystem resource.
- --bucket-nameThe name of your Amazon S3 bucket.
- --credentials-providerThe Java class name of a credentials provider for authenticating with the Amazon S3 endpoint. The provider options available include:- org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider- Use this provider to offer credentials as an access key and secret access key with the - --access-keyand- --secret-keyparameters.
- com.amazonaws.auth.InstanceProfileCredentialsProvider- Use this provider when running Data Migrator on an Elastic Compute Cloud (EC2) instance that has been assigned an IAM role with policies that allow it to access the Amazon S3 bucket. 
- com.amazonaws.auth.DefaultAWSCredentialsProviderChain- A commonly-used AWS credentials provider chain that looks for credentials in this order: - Environment Variables - AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEY, orAWS_ACCESS_KEYandAWS_SECRET_KEY.
- Java System Properties - aws.accessKeyIdandaws.secretKey.
- Web Identity Token credentials from the environment or container.
- Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI.
- Credentials delivered through the Amazon EC2 container service if AWS_CONTAINER_CREDENTIALS_RELATIVE_URIenvironment variable is set and the security manager has permission to access the variable.
- Instance profile credentials delivered through the Amazon EC2 metadata service.
 
- Environment Variables - 
- com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider- This provider supports the use of multiple AWS credentials, which are stored in a credentials file. - When adding a source filesystem, use the following properties: - awsProfile - Name for the AWS profile. 
- awsCredentialsConfigFile - Path to the AWS credentials file. The default path is - ~/.aws/credentials.- For example: - filesystem add s3a --file-system-id testProfile1Fs --bucket-name profile1-bucket --credentials-provider com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider --properties awsProfile=<profile-name>,
 awsCredentialsConfigFile=</path/to/the/aws/credentials" file>- In the CLI, you can also use - --aws-profileand- --aws-config-file.- For example: - filesystem add s3a --file-system-id testProfile1Fs --bucket-name profile1-bucket --credentials-provider com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider --aws-profile <profile-name>
 --aws-config-file </path/to/the/aws/credentials/file>- Learn more about using AWS profiles: Configuration and credential file settings. 
 
 
Amazon S3 optional parameters
- --access-keyWhen using the- org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvidercredentials provider, enter the access key with this parameter.
- --secret-keyWhen using the- org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvidercredentials provider, enter the secret key using this parameter.
- --properties-filesReference a list of existing properties files, each containing Hadoop configuration properties in the format used by- core-site.xmlor- hdfs-site.xml.
- --propertiesEnter properties to use in a comma-separated key/value list.
- --endpoint(UI & IBM Cloud Object Storage only): This is required when adding an IBM Cloud Object Storage bucket.
Other parameters
These parameters are for S3 sources or other types of S3 targets. Exclude them when you create an Amazon S3 target.
- --sqs-queueEnter an SQS queue name.
- --sqs-endpointEnter an SQS endpoint.
- --sourceThis parameter creates the filesystem as a source.
- --scan-onlyThis parameter creates a static source filesystem for one-time migrations. This parameter needs the- --sourceparameter.
- --success-fileThis parameter uses a file name or glob pattern for files that Data Migrator will migrate last in their directory. For example,- --success-file /mypath/myfile.txtor- --success-file /**_SUCCESS. You can use these files to confirm the source directory they're in has finished migrating. This parameter only applies to source filesystems.
Amazon S3 Examples
Add a Amazon S3 target file system with key and secret.
filesystem add s3a --file-system-id mytarget
--bucket-name mybucket1
--credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
--access-key pkExampleAccessKeyiz
--secret-key OP87ChoExampleSecretKeyI904lT7AaDBGJpp0D
Add a Amazon S3 target file system with a credential provider path.
filesystem add s3a --file-system-id mytarget 
--s3type aws 
--endpoint s3.eu-west-1.amazonaws.com 
--credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider 
--bucket-name mybucket 
--properties fs.s3a.security.credential.provider.path=jceks://hdfs@nameservice01:8020/credentials/aws/aws.jceks,fs.s3a.security.credential.provider.filesystem.id=my-HDFS-Source-filesystem-id,dfs.namenode.kerberos.principal.pattern=* 
S3a properties
Enter additional properties for Amazon S3 filesystems by adding them as key-value pairs in the UI or as a comma-separated key-value pair list with the --properties parameter in the CLI. You can overwrite default property values or add new properties.
Default properties
These properties are defined by default when you add an Amazon S3 filesystem. Overwrite them by specifying their keys with new values in key-value pairs.
- fs.s3a.impl(default- org.apache.hadoop.fs.s3a.S3AFileSystem): The implementation class of the S3a Filesystem.
- fs.AbstractFileSystem.s3a.impl(default- org.apache.hadoop.fs.s3a.S3A): The implementation class of the S3a AbstractFileSystem.
- fs.s3a.user.agent.prefix(default- APN/1.0 WANdisco/1.0 LiveDataMigrator/1.11.6): Sets a custom value that will be pre-pended to the User-Agent header sent in HTTP requests to the S3 back-end by S3aFileSystem.
- fs.s3a.impl.disable.cache(default- true): Disables the S3 filesystem cache when set to 'true'.
- hadoop.tmp.dir(default- tmp): The parent directory for other temporary directories.
- fs.s3a.connection.maximum(default- 120) Defines the maximum number of simultaneous connections to the S3 filesystem.
- fs.s3a.threads.max(default- 150): Defines the total number of threads to make available in the filesystem for data uploads or any other queued filesystem operation.
- fs.s3a.max.total.tasks(default- 60): Defines the number of operations that can be queued for execution at a time.
- fs.s3a.healthcheck(Default- true): Allows the S3A filesystem health check to be turned off by changing- trueto- false. This option is useful for setting up Data Migrator while cloud services are offline. However, when disabled, errors in S3A configuration may be missed, resulting in hard-to-diagnose migration stalls.
Additional properties
These additional properties are not defined by default. Add them by specifying their keys with values in key-value pairs. Find an additional list of S3a properties in the S3a documentation.
- fs.s3a.fast.upload.buffer(default- disk): Defines how the filesystem will buffer the upload.
- fs.s3a.fast.upload.active.blocks(default- 8): Defines how many blocks a single output stream can have uploading or queued at a given time.
- fs.s3a.block.size(default- 32M): Defines the maximum size of blocks during file transfer. Use the suffix- K,- M,- G,- T, or- Pto scale the value in Kilobytes, Megabytes, Gigabytes, Terabytes, or Petabytes, respectively.
- fs.s3a.buffer.dir(default- tmp): Defines the directory used by disk buffering.
- fs.s3a.security.credential.provider.path: Defines the path to the JCEKS keystore, if you're using JCEKS as your credential provider. You must add this parameter if you're using a JCEKS keystore as your credential provider.
- fs.s3a.security.credential.provider.filesystem.id: The ID of a configured HDFS filesystem containing the JCEKS keystore file. You must add this parameter if you're using a JCEKS keystore as your credential provider.
Additional properties example
Add a Amazon S3 target file system using --properties and a credential provider path.
filesystem add s3a --file-system-id mytarget 
--s3type aws 
--endpoint s3.eu-west-1.amazonaws.com 
--credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider 
--bucket-name mybucket 
--properties fs.s3a.security.credential.provider.path=jceks://hdfs@nameservice01:8020/credentials/aws/aws.jceks,fs.s3a.security.credential.provider.filesystem.id=my-HDFS-Source-filesystem-id,dfs.namenode.kerberos.principal.pattern=* 
Upload buffering
Migrations using an S3 target destination will buffer all uploads. By default, the buffering will occur on the local disk of the system Data Migrator is running on, in the /tmp directory.
Data Migrator will automatically delete the temporary buffering files once they are no longer needed.
If you want to use a different type of buffering, you can change the property fs.s3a.fast.upload.buffer. You can enter one of the following values:
| Buffering Option | Details | Property Value | 
|---|---|---|
| Array Buffer | Buffers the uploaded data in memory instead of on the disk, using the Java heap. | array | 
| Byte Buffer | Buffers the uploaded data in memory instead of on the disk, but does not use the Java heap. | bytebuffer | 
| Disk Buffering | The default option. This property buffers the upload to the disk. | disk | 
Both the array and bytebuffer options may consume large amounts of memory. Other properties (such as fs.s3a.fast.upload.active.blocks) may be used to fine-tune the migration to avoid issues.
  If you run out of disk space on which to buffer the migration, the migration will stall with a series of errors. To avoid this, ensure the filesystem containing the directory used for buffering (/tmp by default) has enough remaining space to facilitate the transfer.
Next steps
If you haven't already, configure a source filesystem from which to migrate data. Then, you can create a migration to migrate data to your new S3 target.