Version: 3.2 (latest)

Configure an IBM Cloud Object Storage target

You can migrate data to an IBM Cloud Object Storage (COS) bucket by configuring one as a target filesystem.

Follow these steps to create an IBM COS target:

Prerequisites

You need the following:

An IBM COS bucket.
An access key and corresponding secret key for your IBM COS bucket.

Configure an IBM COS target filesystem in the UI

From the Dashboard, select an instance under Instances.
In the Filesystems & Agents menu, select Filesystems.
Select Add target filesystem.
Enter the following details:
- Filesystem Type - The type of filesystem target. Select IBM Cloud Object Storage.
- Display Name - Enter a name for your target filesystem.
- Access Key - Enter the IBM access key. For example, RANDOMSTRINGACCESSKEY.
- Secret Key - Enter the secret key that corresponds with your access key. For example, RANDOMSTRINGPASSWORD.
- Bucket Name - The reference name of the S3 bucket you're using.
- Endpoint - The endpoint for the region of your S3 bucket.
- S3 Properties - Add optional properties to your S3 target as key-value pairs.
Select Save. You can now use your IBM COS target in data migrations.

Configure an IBM COS filesystem in the CLI

To create an IBM COS target in the Data Migrator CLI, run the filesystem add s3a command with --s3type ibmcos:

Add an S3 filesystem
filesystem add s3a            [--file-system-id] string  
                              [--bucket-name] string  
                              [--endpoint] string  
                              [--access-key] string  
                              [--secret-key] string  
                              [--sqs-queue] string  
                              [--sqs-endpoint] string  
                              [--credentials-provider] string  
                              [--source]  
                              [--scan-only]  
                              [--properties-files] list  
                              [--properties] string  
                              [--s3type] string  
                              [--bootstrap.servers] string  
                              [--topic] string

For guidance about access, permissions, and security when adding an Amazon S3 bucket as a target filesystem, see Security best practices in IAM.

IBM COS mandatory parameters

--file-system-id The ID for the new filesystem resource. In the UI, this is called Display Name.
--bucket-name The name of your Amazon S3 bucket. In the UI, this is called Bucket Name.
--access-key The IBM access key. For example, RANDOMSTRINGACCESSKEY.
--secret-key The secret key to use with your access key. For example, RANDOMSTRINGPASSWORD.
--s3type Indicates an s3a compatibility filesystem type. Set this to ibmcos.
--endpoint The endpoint for your IBM COS bucket. IBM provides a list of available endpoints in their public documentation.

IBM COS optional parameters

--endpoint Enter a specific endpoint to access the S3 bucket, such as an AWS PrivateLink endpoint (for example: https://bucket.vpce-0e25b8cdd720f900e-argc85vg.s3.us-east-1.vpce.amazonaws.com). When using this parameter, do not use the fs.s3a.endpoint property as an additional custom property as this supersedes it.
--sqs-queue Enter an SQS queue name.
--sqs-endpoint Enter an SQS endpoint.
--properties-files Reference a list of existing properties files, each containing Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.
--properties Enter properties to use in a comma-separated key/value list. In the UI, this is called S3A Properties. See the S3a properties section for more information).
--credentials-provider The Java class name of a credentials provider for authenticating with the Amazon S3 endpoint. In the UI, this is called Credentials Provider. IBM COS target filesystems default to a simple credentials provider.

Other parameters

These parameters are for S3 sources or other types of S3 targets. Exclude them when you create an IBM COS target.

--source This parameter creates the filesystem as a source.
--scan-only This parameter creates a static source filesystem for one-time migrations. This parameter needs the --source parameter.
--success-file This parameter uses a file name or glob pattern for files that Data Migrator will migrate last in their directory. For example, --success-file /mypath/myfile.txt or --success-file /**_SUCCESS. You can use these files to confirm the source directory they're in has finished migrating. This parameter only applies to source filesystems.

Example

filesystem add s3a --file-system-id cos_s3_source2
--bucket-name container2
--credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
--access-key pkExampleAccessKeyiz
--secret-key c3vq6vaNtExampleSecretKeyVuqJMIHuV9IF3n9
--s3type ibmcos
--bootstrap.servers=10.0.0.123:9092
--topic newcos-events
--endpoint http://10.0.0.124

S3a properties

Enter additional properties for IBM COS filesystems by adding them as key-value pairs in the UI or as a comma-separated key-value pair list with the --properties parameter in the CLI. You can overwrite default property values or add new properties.

Default properties

These properties are defined by default when you add an IBM COS filesystem. Overwrite them by specifying their keys with new values in key-value pairs.

fs.s3a.impl (default org.apache.hadoop.fs.s3a.S3AFileSystem): The implementation class of the S3a Filesystem.
fs.AbstractFileSystem.s3a.impl (default org.apache.hadoop.fs.s3a.S3A): The implementation class of the S3a AbstractFileSystem.
fs.s3a.user.agent.prefix (default APN/1.0 WANdisco/1.0 LiveDataMigrator/1.11.6): Sets a custom value that will be pre-pended to the User-Agent header sent in HTTP requests to the S3 back-end by S3aFileSystem.
fs.s3a.impl.disable.cache (default true): Disables the S3 filesystem cache when set to 'true'.
hadoop.tmp.dir (default tmp): The parent directory for other temporary directories.
fs.s3a.connection.maximum (default 120) Defines the maximum number of simultaneous connections to the S3 filesystem.
fs.s3a.threads.max (default 150): Defines the total number of threads to make available in the filesystem for data uploads or any other queued filesystem operation.
fs.s3a.max.total.tasks (default 60): Defines the number of operations that can be queued for execution at a time.
fs.s3a.healthcheck (Default true): Allows the S3A filesystem health check to be turned off by changing true to false. This option is useful for setting up Data Migrator while cloud services are offline. However, when disabled, errors in S3A configuration may be missed, resulting in hard-to-diagnose migration stalls.

Additional properties

These additional properties are not defined by default. Add them by specifying their keys with values in key-value pairs.

fs.s3a.fast.upload.buffer (default disk): Defines how the filesystem will buffer the upload.
fs.s3a.fast.upload.active.blocks (default 8): Defines how many blocks a single output stream can have uploading or queued at a given time.
fs.s3a.block.size (default 32M): Defines the maximum size of blocks during file transfer. Use the suffix K, M, G, T, or P to scale the value in Kilobytes, Megabytes, Gigabytes, Terabytes, or Petabytes, respectively.
fs.s3a.buffer.dir (default tmp): Defines the directory used by disk buffering.

Find an additional list of S3a properties in the S3a documentation.

Upload buffering

Migrations using an S3 target destination will buffer all uploads. By default, the buffering will occur on the local disk of the system Data Migrator is running on, in the /tmp directory.

Data Migrator will automatically delete the temporary buffering files once they are no longer needed.

If you want to use a different type of buffering, you can change the property fs.s3a.fast.upload.buffer. You can enter one of the following values:

Buffering Option	Details	Property Value
Array Buffer	Buffers the uploaded data in memory instead of on the disk, using the Java heap.	`array`
Byte Buffer	Buffers the uploaded data in memory instead of on the disk, but does not use the Java heap.	`bytebuffer`
Disk Buffering	The default option. This property buffers the upload to the disk.	`disk`

Both the array and bytebuffer options may consume large amounts of memory. Other properties (such as fs.s3a.fast.upload.active.blocks) may be used to fine-tune the migration to avoid issues.

note

If you run out of disk space on which to buffer the migration, the migration will stall with a series of errors. To avoid this, ensure the filesystem containing the directory used for buffering (/tmp by default) has enough remaining space to facilitate the transfer.

Next steps

If you haven't already, configure a source filesystem from which to migrate data. Then, you can create a migration to migrate data to your new S3 target.

Prerequisites​

Configure an IBM COS target filesystem in the UI​

Configure an IBM COS filesystem in the CLI​

IBM COS mandatory parameters​

IBM COS optional parameters​

Other parameters​

Example​

S3a properties​

Default properties​

Additional properties​

Upload buffering​

Next steps​