Version: 2.6

Command reference

System service commands

The service scripts are used to control operation of each individual service. In most supported Linux distributions, the following commands can be used to manage Data Migrator, Hive Migrator, and UI processes.

Data Migrator

systemd command	Use it to...
`systemctl start livedata-migrator`	Start a service that isn't currently running.
`systemctl stop livedata-migrator`	Stop a running service.
`systemctl restart livedata-migrator`	Run a command that performs a `stop` and then a `start`. If the service isn't running, this works the same as a `start` command.
`systemctl status livedata-migrator`	Get details of the running service's status.

Running without systemd

For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:

service <service name> <command>

Example command to restart Data Migrator:

service livedata-migrator restart

info

If you're working in an environment without systemd or a system and service manager, you need to run the start.sh script located in /opt/wandisco/livedata-migrator again after running the restart command.

Hive Migrator

Service script	Use it to...
`systemctl start hivemigrator`	Start a service that isn't currently running.
`systemctl stop hivemigrator`	Stop a running service.
`systemctl restart hivemigrator`	Run a command that performs a `stop` and then a `start`. If the service isn't running, this works the same as a `start` command.
`systemctl status hivemigrator`	Get details of the running service's status.

info

Always start/restart Hive Migrator services in the following order:

Remote agents
Hive Migrator service.

Not starting services in this order may cause live migrations to fail.

If you're working in an environment without systemd or a system and service manager, you need to run the start.sh script located in /opt/wandisco/hivemigrator again after running the restart command.

Running without systemd

For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:

service <service name> <command>

Example command to view status of Hive Migrator:

service hivemigrator status

Hive Migrator remote server

Service script	Use it to...
`systemctl start hivemigrator-remote-server`	Start a service that isn't currently running.
`systemctl stop hivemigrator-remote-server`	Stop a running service.
`systemctl restart hivemigrator-remote-server`	Run a command that performs a `stop` and then a `start`. If the service isn't running, this works the same as a `start` command.
`systemctl status hivemigrator-remote-server`	Get details of the running service's status.

info

Always start/restart Hive Migrator services in the following order:

Remote agents
Hive Migrator service.

Not starting services in this order may cause live migrations to fail.

Running without systemd

For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:

service <service name> <command>

Example command to view status of Hive Migrator remote server:

service hivemigrator-remote-server status

UI

Service script	Use it to...
`systemctl start livedata-ui`	Start a service that isn't currently running.
`systemctl stop livedata-ui`	Stop a running service.
`systemctl restart livedata-ui`	Run a command that performs a `stop` and then a `start`. If the service isn't running, this works the same as a `start` command.
`systemctl status livedata-ui`	Get details of the running service's status.

Running without systemd

For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:

service <service name> <command>

Example command to see the status of the UI service:

service livedata-ui status

info

If you're working in an environment without systemd or a system and service manager, you need to run the start.sh script located in /opt/wandisco/livedata-ui again after running the restart command.

Data transfer agents

systemd command	Use it to...
`systemctl start livedata-migrator-data-agent`	Start a service that isn't currently running.
`systemctl stop livedata-migrator-data-agent`	Stop a running service.
`systemctl restart livedata-migrator-data-agent`	Run a command that performs a `stop` and then a `start`. If the service isn't running, this works the same as a `start` command.
`systemctl status livedata-migrator-data-agent`	Get details of the running service's status.

Running without systemd

For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:

service <service name> <command>

Example command to restart a data transfer agent

service livedata-migrator-data-agent restart

info

If you're working in an environment without systemd or a system and service manager, you need to run the start.sh scripts located in /opt/wandisco/livedata-migrator-data-agent again after running the restart commands.

Connect to the CLI

Open a terminal session on the Data Migrator host machine and enter the following command:

`livedata-migrator`

When the CLI connects to the Data Migrator and Hive Migrator services, you get the following command prompt:

Cirata LiveData Migrator >>

The CLI is now ready to accept commands.

Optional parameters

--host The IP or hostname of the Data Migrator API to connect to. Defaults to localhost when not specified.
--vm-port Data Migrator API port. Defaults to 18080 when not specified.
--hm-port Hivemigrator API port. Defaults to 6780 when not specified.
--lm-ssl Flag to use https. Defaults to http when not specified.

Version check

Check the current versions of included components by using the livedata-migrator command with the --version parameter. For example:

# livedata-migrator --version

tip

This doesn't start the CLI. You get a list of the current Data Migrator components, along with their version numbers.

CLI features

Feature	How to use it
Review available commands	Use the `help` command to get details of all commands available.
Command completion	Hit the `<tab>` key at any time to get assistance or to complete partially-entered commands.
Cancel input	Type `<Ctrl-C>` before entering a command to return to an empty action prompt.
Syntax indication	Invalid commands are highlighted as you type.
Clear the display	Type `<Ctrl-L>` at any time.
Previous commands	Navigate previous commands using the up and down arrows, and use standard emacs shortcuts.
Interactive or scripted operation	You can interact with the command line interface directly, or send it commands on standard input to incorporate it into shell scripts. See `script` for more information and examples.

CLI commands

You can manage filesystems, migrations, and more in the CLI.

Backup commands

`backup add`

Immediately create a backup file

backup add

`backup config show`

Show the current backups configuration
backup config show

{
"backupsLocation": "/opt/wandisco/livedata-migrator/db/backups",
"lastSuccessfulTs": 0,
"backupSchedule": {
"enabled": true,
"periodMinutes": 10
},
"storedFilePaths": [
"/etc/wandisco/livedata-migrator/application.properties",
"/etc/wandisco/livedata-migrator/vars.env",
"/etc/wandisco/livedata-migrator/logback-spring.xml",
"/etc/wandisco/livedata-migrator/application.properties",
"/etc/wandisco/livedata-migrator/vars.env",
"/etc/wandisco/livedata-migrator/logback-spring.xml"
]
}

`backup list`

List all existing backup files

backup list

`backup restore`

Restore from a specified backup file

backup restore --name string

`backup schedule configure`

Configure a backup schedule for Data Migrator
backup schedule configure --period-minutes 10 --enable

{
"enabled": true,
"periodMinutes": 10
}

`backup schedule show`

Show current backup schedule
backup schedule show

{
"enabled": true,
"periodMinutes": 10
}

`backup show`

Show a specified backup file

backup show --name string

Bandwidth policy commands

`bandwidth policy delete`

Allow the application to use unlimited bandwidth

bandwidth policy delete

`bandwidth policy set`

Set the application bandwidth limit, in bytes per second
bandwidth policy set    [--value] long  
                        [--unit] string
                        [--data-agent] string

Mandatory parameters

--value Define the number of byte units.
--unit Define the byte unit to be used.
Decimal units: B, KB, MB, GB, TB, PB.
Binary units: KiB, MiB, GiB, TiB, PiB.

Optional parameters

--data-agent Apply the limit to a specified data agent.

Example

Set a limit of 10 Megabytes per second

bandwidth policy set --value 10 --unit MB

Set a limit of 10 Megabytes per second for agent

bandwidth policy set --data-agent DTA1 --value 10 --unit MB

`bandwidth policy show`

Get details of the application bandwidth limit, in bytes per second

bandwidth policy show

Data transfer agent commands

`agent add`

Add a new agent.

Mandatory parameters

--agent-name
User-specified agent name.

You must enter a value for either the --agent-token or the --agent-token-file parameter:

--agent-token
Connection token text provided by the token generator. You can use the content of /opt/wandisco/livedata-migrator-data-agent/connection_token in the node on which you're installing the agent.
--agent-token-file
Path to file contains connection token, for example /opt/wandisco/livedata-migrator-data-agent/connection_token. Ensure the token file is accessible on the Data Migrator host.

Example

agent add --agent-name dta1 --agent-token-file /opt/wandisco/livedata-migrator-data-agent/connection_token

To check the agent was added, run:

agent show --agent-name example_name

Register an agent

Curl example

curl -X POST -H "Content-Type: application/json" -d @/opt/wandisco/livedata-migrator-data-agent/reg_data_agent.json http://migrator-host:18080/scaling/dataagents/

Check the agent was added

curl -X GET http://migrator-host:18080/scaling/dataagents/example_name

note

migrator-host is the host where Data Migrator is installed.

Start an agent

service livedata-migrator-data-agent start

Remove an agent

agent delete --agent-name example_name

Example: Remove an agent

agent delete --agent-name agent-example-vm.bdauto.wandisco.com

Mandatory parameters

--agent-name
The name you give the agent which can be a string such as agent-example-vm.bdauto.wandisco.com.

View an agent

agent show --agent-name example_name

Example: View an agent

agent show --agent-name agent-example-vm.bdauto.wandisco.com

Example output
{
"name": "agent-example-vm.bdauto.wandisco.com",
"host": "example-vm.bdauto.wandisco.com",
"port": 1433,
"type": "GRPC",
"version": "2.0.0",
"healthy": true,
"health": {
"lastStatusUpdateTime": 1670924489556,
"lastHealthMessage": "Agent agent-example-vm.bdauto.wandisco.com - health check became OK",
"status": "CONNECTED"

Mandatory parameters

--agent-name
User-specified agent name.

`agent list`

List all agents.

Filesystem commands

`filesystem add adls2 oauth`

Add an Azure Data Lake Storage (ADLS) Gen2 container as a migration target using the filesystem add adls2 oauth command, which requires a service principal and OAuth 2 credentials.

note

The service principal that you want to use must have either the Storage Blob Data Owner role assigned to the ADLS Gen2 storage account, or an access control list with RWX permissions for the migration path and all parent paths. For more information, see the Microsoft documentation.

Add an ADLS Gen2 filesystem with OAuth
    filesystem add adls2 oauth          [--container-name] string  
                                        [--file-system-id] string
                                        [--insecure] 
                                        [--oauth2-client-endpoint] string
                                        [--oauth2-client-id] string 
                                        [--oauth2-client-secret] string
                                        [--properties] string
                                        [--properties-files] list
                                        [--scan-only]  
                                        [--source]                                                              
                                        [--storage-account-name] string 

Mandatory parameters

--container-name The name of the container in the storage account to which content will be migrated.
--file-system-id The ID to give the new filesystem resource.
--oauth2-client-endpoint The client endpoint for the Azure service principal.
This will often take the form of https://login.microsoftonline.com/{tenant}/oauth2/v2.0/token where {tenant} is the directory ID for the Azure service principal. You can enter a custom URL (such as a proxy endpoint that manually interfaces with Azure Active Directory (Azure AD)).
--oauth2-client-id The client ID (also known as application ID) for your Azure service principal.
--oauth2-client-secret The client secret (also known as application secret) for the Azure service principal.
--storage-account-name The name of the ADLS Gen2 storage account to target.

Optional parameters

--insecure If you enter this parameter, Data Migrator will not use TLS to encrypt communication with ADLS Gen2. This may improve throughput, but should only be used when you have other means of securing communication.
--properties Enter properties to use in a comma-separated key/value list.
--properties-files Reference a list of existing properties files, each containing Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.
--scan-only Supply this parameter to create a static source filesystem for use in one-time, non-live migrations. Requires --source.
--source Add this filesystem as the source for migrations.

Example

filesystem add adls2 oauth --file-system-id mytarget
                           --storage-account-name myadls2
                           --oauth2-client-id b67f67ex-ampl-e2eb-bd6d-client9385id
                           --oauth2-client-secret 2IPO8*secretk-9OPs8n*TexampleHJ=
                           --oauth2-client-endpoint https://login.microsoftonline.com/78u098ex-ampl-e498-8bce-ndpoint5f2e5/oauth2/v2.0/token
                           --container-name lm2target

`filesystem add adls2 sharedKey`

Add an ADLS Gen2 container as a migration target using the filesystem add adls2 sharedKey command, which requires credentials in the form of an account key.

Add an ADLS Gen2 filesystem with Shared Key
filesystem add adls2 sharedKey      [--file-system-id] string  
                                    [--storage-account-name] string  
                                    [--container-name] string  
                                    [--insecure]  
                                    [--shared-key] string  
                                    [--properties-files] list  
                                    [--properties] string
                                    [--scan-only]  
                                    [--source]

Mandatory parameters

--file-system-id The ID to give the new filesystem resource.
--storage-account-name The name of the ADLS Gen2 storage account to target.
--shared-key The shared account key to use as credentials to write to the storage account.
--container-name The name of the container in the storage account to which content will be migrated.

Optional parameters

--insecure If you enter this parameter, Data Migrator will not use TLS to encrypt communication with ADLS Gen2. This may improve throughput, but should only be used when you have other means of securing communication.
--properties-files Reference a list of existing properties files, each containing Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.
--properties Enter properties to use in a comma-separated key/value list.
--scan-only Supply this parameter to create a static source filesystem for use in one-time, non-live migrations. Requires --source.
--source Add this filesystem as the source for migrations.

Example

filesystem add adls2 sharedKey  --file-system-id mytarget
                                --storage-account-name myadls2
                                --container-name lm2target
                                --shared-key Yi8NxHGqoQ79DBGLVn+COK/sRDwbNqAEXAMPLEDaMxRkvXt2ijUtASHAREDj/vaS/NbzR5rtjEKEY31eIopUVA==

`filesystem add gcs`

Add a Google Cloud Storage as a migration target using the filesystem add gcs command, which requires credentials in the form of an account key file.

Add a Google Cloud Storage filesystem
filesystem add gcs      [--bucket-name] string
                        [--file-system-id] string
                        [--properties] string
                        [--properties-files] list  
                        [--service-account-json-key-file] string  
                        [--service-account-json-key-file-server-location] string  
                        [--service-account-json-vault-reference] string                           
                        [--source]

Mandatory parameters

--file-system-id The ID to give the new filesystem resource.
--bucket-name The bucket name of a Google Cloud Storage account.

Service account key parameters

info

Enter your service account key for the Google Cloud Storage bucket by choosing one of the parameters below.

--service-account-json-key-file-server-location
The absolute filesystem path on the Data Migrator server of your service account key file in JSON format. You can either create a Google Cloud Storage service account key or use an existing one.
--service-account-json-key-file
The absolute filesystem path on the host running the Data Migrator CLI of your service account key file in JSON format. Use this parameter if you're running the CLI on a different host to your Data Migrator server.
info
Data Migrator imports GCS credentials from your --service-account-json-key-file, stores them internally as configuration properties, then removes the file.
--service-account-json-vault-reference The HashiCorp Vault reference to the location of the content of the GCS Key File using the Reference format. Use this option if you have a secrets store configured.

Optional parameters

--properties Enter properties to use in a comma-separated key/value list.
--properties-files Reference a list of existing properties files, each containing Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.
--source Enter this parameter to use the filesystem resource created as a source.

Example

filesystem add gcs --bucket-name mygcsgbucket --file-system-id GCS1 --service-account-json-key-file-server-location /var/tmp/key-999999.json --source

`filesystem add gpfs`

Add an IBM Spectrum Scale (GPFS) filesystem as a migration source using the filesystem add gpfs CLI command. See the main IBM Spectrum Scale (GPFS) section for additional information.

Add an IBM Spectrum Scale filesystem
filesystem add gpfs      [--default-fs] string
                         [--file-system-id] string
                         [--gpfs-kerberos-keytab] string
                         [--gpfs-kerberos-principal] string                         
                         [--kafka-bootstrap-servers] string
                         [--kafka-group-id] string
                         [--kafka-kerberos-principal] string
                         [--kafka-kerberos-keytab] string                         
                         [--kafka-topic] string                         
                         [--mount-point] string
                         [--properties] string
                         [--properties-files] list                         
                         [--scan-only]                                                 
                         [--use-ssl]
                         [--user] string            

Mandatory parameters

--default-fs The default filesystem URI for this filesystem. For example, hdfs://192.168.1.10:8020, hdfs://myhost.localdomain:8020 or hdfs://mynameservice.
--file-system-id The ID or name to give this new filesystem resource.
--mount-point The root of the GPFS mount on HDFS. For example, /gpfs/myfs/cluster-data/.

Optional parameters

--gpfs-kerberos-keytab The GPFS Kerberos keytab containing the principal defined for the --gpfs-kerberos-principal parameter. This must be accessible to the local system user running the Data Migrator service (default is hdfs).
--gpfs-kerberos-principal The GPFS Kerberos principal to authenticate with and perform migrations as. This principal should map to the GPFS superuser using auth_to_local rules.
--kafka-bootstrap-servers The hostname and port of Kafka Bootstrap servers. Use comma-separated pairs for multiple servers. For example, hostname:9092,hostname2:9092.
--kafka-group-id The Kafka consumer identifier. Group Identifier is a unique ID for Kafka consumer which can be specified by the user. For example, my-group-id.
--kafka-kerberos-principal The Kafka Kerberos principal to authenticate with Kafka.
--kafka-kerberos-keytab The Kafka Kerberos keytab containing the principal defined for the --kafka-kerberos-principal parameter. This must be accessible to the local system user running the Data Migrator service.
--kafka-topic The Kafka topic name for event delivery. See the Apache Kafka documentation for more information on topic creation.
--properties Enter properties to use in a comma-separated key/value list.
--properties-files Reference a list of existing properties files that contain Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.
--scan-only Use this parameter to create a static source filesystem for use in one-time migrations.
--use-ssl Use this parameter if the Kafka server is using TLS. When you submit the command with this option, you'll then be prompted to supply:
- Kafka SSL truststore location The truststore location. This must be accessible to the local system user running the Data Migrator service.
- Kafka SSL truststore password The truststore password.
--user Name of the HDFS user to be used when performing operations against the filesystem. In environments where Kerberos is disabled, this user must be the HDFS superuser, such as hdfs.

Example

Add a live IBM Spectrum Scale (GPFS) source
filesystem add gpfs --default-fs hdfs://SourceCluster:8020 --file-system-id GPFS-Source --gpfs-kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --gpfs-kerberos-principal hdfs04@REALM.HADOOP --kafka-bootstrap-servers bootstapServer1:9093 --kafka-group-id kafGroup1 --kafka-kerberos-keytab /etc/security/keytabs/kafka.service.keytab --kafka-kerberos-principal kafka/gpfsapr@REALM.HADOOP --kafka-topic FS1-WATCH-EVENT --mount-point /gpfs/fs1/cluster-data --properties-files /etc/wandisco/livedata-migrator/conf/ --use-ssl
<ENTER>
Kafka SSL truststore location: /etc/cirata/livedata-migrator/conf/kafka-keystore.p12
<ENTER>
Kafka SSL truststore password: *********
<ENTER>

`filesystem add hdfs`

Add a Hadoop Distributed File System (HDFS) as either a migration source or target using the filesystem add hdfs command.

Creating a HDFS resource with this command will normally only be used when migrating to a target HDFS filesystem (rather than another storage service like ADLS Gen2 or S3a). Data Migrator will attempt to auto-discover the source HDFS when started from the command line unless Kerberos is enabled on your source environment.

If Kerberos is enabled on your source environment, use the filesystem auto-discover-source hdfs command to enter Kerberos credentials and auto-discover your source HDFS configuration.

Add a Hadoop Distributed File System
filesystem add hdfs     [--file-system-id] string  
                        [--default-fs] string  
                        [--user] string
                        [--kerberos-principal] string  
                        [--kerberos-keytab] string  
                        [--source]  
                        [--scan-only]  
                        [--success-file] string
                        [--properties-files] list  
                        [--properties] string

Mandatory parameters

--file-system-id The ID to give the new filesystem resource.
--default-fs A string that defines how Data Migrator accesses HDFS.
It can be specified in a number of forms:
1. As a single HDFS URI, such as hdfs://192.168.1.10:8020 (using an IP address) or hdfs://myhost.localdomain:8020 (using a hostname).
2. As a HDFS URI that references a nameservice if the NameNodes have high availability, for example, hdfs://mynameservice. For more information, see HDFS High Availability.
--properties-files Reference a list of existing properties files that contain Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.

Optional parameters

Kerberos: Cross-realm authentication required between source and target HDFS

Cross-realm authentication is required in the following scenarios:

Migration will occur between a source and target HDFS.
Kerberos is enabled on both clusters.

See the links below for guidance for common Hadoop distributions:

CDH
CDP
Red Hat (Unmanaged)
HDP

--user The name of the HDFS user to be used when performing operations against the filesystem. In environments where Kerberos is disabled, this user must be the HDFS super user, such as hdfs.
--kerberos-principal The Kerberos principal to authenticate with and perform migrations as. This principal should map to the HDFS super user using auth_to_local rules.
--kerberos-keytab The Kerberos keytab containing the principal defined for the --kerberos-principal parameter. This must be accessible to the local system user running the Data Migrator service (default is hdfs).
--source Enter this parameter to use the filesystem resource created as a source.
--scan-only Enter this parameter to create a static source filesystem for use in one-time migrations. Requires --source.
--properties Enter properties to use in a comma-separated key/value list.
--success-file Specify a file name or glob pattern for files that Data Migrator will migrate last from the directory they're contained in. For example, --success-file /mypath/myfile.txt or --success-file /**_SUCCESS. You can use these files to confirm the directory they're in has finished migrating.

Properties files are required for NameNode HA

If your Hadoop cluster has NameNode HA enabled, enter the local filesystem path to the properties files that define the configuration for the nameservice ID.

Source HDFS filesystem: These configuration files will likely be in a default location depending on the distribution of the Hadoop cluster.

Target HDFS filesystem: Ensure that the target Hadoop cluster configuration is available on your Data Migrator host's local filesystem.

Example for path containing source cluster configuration

/etc/hadoop/conf

Example for path containing target cluster configuration

/etc/targetClusterConfig

Alternatively, define the absolute filesystem paths to these files:

Example for absolute paths to source cluster configuration files

/etc/hadoop/conf/core-site.xml
/etc/hadoop/conf/hdfs-site.xml

Example for absolute paths to target cluster configuration files

/etc/targetClusterConfig/core-site.xml
/etc/targetClusterConfig/hdfs-site.xml

For the CLI/API, use the --properties-files parameter and define the absolute paths to the core-site.xml and hdfs-site.xml files (see the Examples section for CLI usage of this parameter).

Examples

HDFS as source

Example for source NameNode HA cluster
filesystem add hdfs     --file-system-id mysource
                        --default-fs hdfs://sourcenameservice
                        --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml

Example for source NameNode HA cluster with Kerberos enabled
filesystem add hdfs     --file-system-id mysource
                        --default-fs hdfs://sourcenameservice
                        --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
                        --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab
                        --kerberos-principal hdfs@SOURCEREALM.COM

HDFS as target

note

If you enter a HDFS filesystem as a target, the property files for the target cluster must exist on the local filesystem and be accessible to the Data Migrator system user.

Example for target NameNode HA cluster with Kerberos enabled

filesystem add hdfs --file-system-id mytarget --default-fs hdfs://targetnameservice --properties-files /etc/targetClusterConfig/core-site.xml,/etc/targetClusterConfig/hdfs-site.xml --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal hdfs@SOURCEREALM.COM

Example for target single NameNode cluster

filesystem add hdfs --file-system-id mytarget --default-fs hdfs://namenode.targetdomain:8020 --user hdfs

`filesystem add local`

Add a local filesystem as either a migration target or source using the filesystem add local command.

Add a Local Filesystem
filesystem add local    [--file-system-id] string
                        [--fs-root] string
                        [--source]
                        [--scan-only]
                        [--properties-files] list
                        [--properties] string

Mandatory parameters

--file-system-id The ID to give the new filesystem resource.

Optional parameters

--fs-root The directory in the local filesystem to scan for data or send data to, depending on whether the filesystem is defined as a source or a target. Should be supplied using the full directory path from the root.
--source Enter this parameter to use the filesystem resource created as a source.
--scan-only Enter this parameter to create a static source filesystem for use in one-time migrations. Requires --source.
--properties-files Reference a list of existing properties files.
--properties Enter properties to use in a comma-separated key/value list.

note

If no fs-root is specified, the file path will default to the root of your system.

Examples

Local filesystem as source

filesystem add local --file-system-id mytarget --fs-root ./tmp --source

Local filesystem as target

filesystem add local --file-system-id mytarget --fs-root ./Users/username/destinationfolder/

`filesystem add s3a`

Add an S3-compatible filesystem as a source or target for migration.

For details on which platforms support S3, see Supported sources and targets.

info

As of Data Migrator 2.1.1 hcfs.ssl.channel.mode replaces the use of fs.s3a.ssl.channel.mode and fs.azure.ssl.channel.mode which are no longer valid. See SSL implementation for information on the property and values used.

Use the filesystem add s3a command with the following parameters:

Add an S3 filesystem
filesystem add s3a          [--access-key] string
                            [--aws-config-file] string  
                            [--aws-profile] string
                            [--bootstrap.servers] string 
                            [--bucket-name] string
                            [--credentials-provider] string
                            [--endpoint] string                             
                            [--file-system-id] string 
                            [--properties] string   
                            [--properties-files] list 
                            [--s3type] string  
                            [--scan-only]
                            [--secret-key] string  
                            [--source]  
                            [--sqs-endpoint] string
                            [--sqs-queue] string                                                    
                            [--topic] string

For guidance about access, permissions, and security when adding an Amazon S3 bucket as a target filesystem, see Security best practices in IAM.

S3A mandatory parameters

--file-system-id The ID for the new filesystem resource.
--bucket-name The name of your S3 bucket.
--credentials-provider The Java class name of a credentials provider for authenticating with the Amazon S3 endpoint.
The Provider options available include:
- org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
  Use this provider to offer credentials as an access key and secret access key with the --access-key and --secret-key Parameters.
- com.amazonaws.auth.InstanceProfileCredentialsProvider
  Use this provider when running Data Migrator on an Elastic Compute Cloud (EC2) instance that has been assigned an IAM role with policies that allow it to access the Amazon S3 bucket.
- com.amazonaws.auth.DefaultAWSCredentialsProviderChain
  A commonly-used credentials provider chain that looks for credentials in this order:
  - Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, or AWS_ACCESS_KEY and AWS_SECRET_KEY.
  - Java System Properties - aws.accessKeyId and aws.secretKey.
  - Web Identity Token credentials from the environment or container.
  - Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI.
  - Credentials delivered through the Amazon EC2 container service if AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable is set and security manager has permission to access the variable.
  - Instance profile credentials delivered through the Amazon EC2 metadata service.
- com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider
  This provider supports the use of multiple AWS credentials, which are stored in a credentials file.
  When adding a source filesystem, use the following properties:
  - awsProfile - Name for the AWS profile.
  - awsCredentialsConfigFile - Path to the AWS credentials file. The default path is ~/.aws/credentials.
    For example:
    filesystem add s3a --file-system-id testProfile1Fs --bucket-name profile1-bucket --credentials-provider com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider --properties awsProfile=<profile-name>, awsCredentialsConfigFile=</path/to/the/aws/credentials" file>
    In the CLI, you can also use --aws-profile and --aws-config-file.
    For example:
    filesystem add s3a --file-system-id testProfile1Fs --bucket-name profile1-bucket --credentials-provider com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider --aws-profile <profile-name> --aws-config-file </path/to/the/aws/credentials/file>
    Learn more about using AWS profiles: Configuration and credential file settings.

S3A optional parameters

--access-key When using the org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider credentials provider, enter the access key with this parameter. This is also a required parameter when adding an IBM Cloud Object Storage bucket.
--secret-key When using the org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider credentials provider, enter the secret key using this parameter. This is also a required parameter when adding an IBM Cloud Object Storage bucket.
--endpoint Enter a specific endpoint to access the S3-compatible bucket, such as an AWS PrivateLink endpoint or an IBM COS public regional endoint. If you don't enter a value, the filesystem defaults to AWS.
note
Using --endpoint, will supercede fs.s3a.endpoint, if used as an additional custom property. Don't use the parameters at the same time.
--sqs-queue [Amazon S3 as a source only] Enter an SQS queue name. This field is required if you enter an SQS endpoint.
--sqs-endpoint [Amazon S3 as a source only] Enter an SQS endpoint.
--source Enter this parameter to add the filesystem as a source. See which platforms are supported as a source.
--scan-only Enter this parameter to create a static source filesystem for use in one-time migrations. Requires --source.
--properties-files Reference a list of existing properties files, each containing Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.
--properties Enter properties to use in a comma-separated key/value list.
--s3type Specifies what parameters are required, based on the requirements of your selected s3a-compatible filesystem. Leave it blank for s3-compatible storage or select from the following:
- aws
- oracle
- ibmcos

IBM COS as a source only

--bootstrap.servers The Kafka server address.
--topic Kafka's topic where s3 object change notifications are provided.

S3a default properties

These properties are defined by default when adding an S3a filesystem.

info

You don't need to define or adjust many of these properties, use caution when making any changes, if you are unsure get in touch with Support for more information.

Enter additional properties for S3 filesystems by adding them as key-value pairs in the UI or as a comma-separated key-value pair list with the --properties parameter in the CLI. You can overwrite default property values or add new properties.

fs.s3a.impl (default org.apache.hadoop.fs.s3a.S3AFileSystem): The implementation class of the S3a Filesystem.
fs.AbstractFileSystem.s3a.impl (default org.apache.hadoop.fs.s3a.S3A): The implementation class of the S3a AbstractFileSystem.
fs.s3a.user.agent.prefix (default APN/1.0 WANdisco/1.0 LiveDataMigrator/(ldm version)): Sets a custom value that will be pre-pended to the User-Agent header sent in HTTP requests to the S3 back-end by S3aFileSystem.
fs.s3a.impl.disable.cache (default true): Disables the S3 filesystem cache when set to 'true'.
hadoop.tmp.dir (default tmp): The parent directory for other temporary directories.
fs.s3a.connection.maximum (default 225) Defines the maximum number of simultaneous connections to the S3 filesystem.
fs.s3a.threads.max (default pull.threads + 10): Defines the total number of threads to make available in the filesystem for data uploads or any other queued filesystem operation. Default is the current LDM pull.threads value plus 10.
fs.s3a.max.total.tasks (default 75): Defines maximum number of tasks allowed for parallel operations.
fs.s3a.sqs.init.dir (default /sqs-init-path): SQS initialization path.
fs.s3a.empty.polls.max.count (default 10): Maximum number of empty listing responses accepted before considering a directory listing operation as finished.
fs.s3a.sqs.messages.max.number (default 10): Maximum number of messages to pull from an SQS queue in a single request.
fs.s3a.sqs.wait.time.sec (default 20): Duration in seconds to wait for messages in the SQS queue when polling for notifications.
fs.s3a.path.events.cache.size (default 0): Number of entries or paths that can be cached.
fs.s3a.path.events.cache.expiration.time.min (default 60): Time-to-live for entries stored in the events cache.
s3a.events.poll.max.retries (default 10): Maximum number of retries the connector attempts for polling events.
fs.s3a.healthcheck (Default true): Allows the S3A filesystem health check to be turned off by changing true to false. This option is useful for setting up Data Migrator while cloud services are offline. However, when disabled, errors in S3A configuration may be missed, resulting in hard-to-diagnose migration stalls.

S3a custom properties

These are some of the additional properties that can be added when creating an S3a filesystem.

fs.s3a.fast.upload.buffer (default disk): Defines how the filesystem will buffer the upload.
fs.s3a.fast.upload.active.blocks (default 4): Defines how many blocks a single output stream can have uploading or queued at a given time.
fs.s3a.block.size (default 32M): Defines the maximum size of blocks during file transfer. Use the suffix K, M, G, T or P to scale the value in Kilobytes, Megabytes, Gigabytes, Terabytes or Petabytes respectively.
fs.s3a.buffer.dir (default tmp): Defines the directory used by disk buffering.
fs.s3a.endpoint.region (default Current region): Explicitly sets the bucket region.

note

To configure a Oracle Cloud Storage bucket which isn't in your default region. Specify a fs.s3a.endpoint.region=<region> with the --properties flag when adding the filesystem with the CLI.

See Oracle Cloud Storage additional properties example.

Find an additional list of S3a properties in the S3a documentation.

Upload buffering

Migrations using an S3A target destination will buffer all uploads. By default, the buffering will occur on the local disk of system Data Migrator is running on, in the /tmp directory.

Data Migrator will automatically delete the temporary buffering files once they are no longer needed.

If you want to use a different type of buffering, you can change the property fs.s3a.fast.upload.buffer. The following values can be supplied:

Buffering Option	Details	Property Value
Array Buffer	Buffers the uploaded data in memory instead of on disk, using the Java heap.	`array`
Byte Buffer	Buffers the uploaded data in memory instead of on disk, but doesn't use the Java heap.	`bytebuffer`
Disk Buffering	The default option. Buffers the upload to disk.	`disk`

Both the array and bytebuffer options may lead to the consumption of large amounts of memory. Other properties (such as fs.s3a.fast.upload.active.blocks) may be used to fine-tune the migration to avoid issues.

note

If you run out of disk space on which to buffer the migration, the migration will stall with a series of errors. To avoid this, ensure the filesystem containing the directory used for buffering (/tmp by default) has enough remaining space to facilitate the transfer.

S3a Example

filesystem add s3a --file-system-id mytarget
--bucket-name mybucket1
--credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
--access-key pkExampleAccessKeyiz
--secret-key OP87ChoExampleSecretKeyI904lT7AaDBGJpp0D

IBM Cloud Object Storage Examples

Add source IBM Cloud Object Storage filesystem. Note that this doesn't work if SSL is used on the endpoint address.

filesystem add s3a --source --file-system-id cos_s3_source2
--bucket-name container2
--credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
--access-key pkExampleAccessKeyiz
--secret-key c3vq6vaNtExampleSecretKeyVuqJMIHuV9IF3n9
--s3type ibmcos
--bootstrap.servers=10.0.0.123:9092
--topic newcos-events--enpoint http://10.0.0.124

Add path mapping.

path mapping add --path-mapping-id testPath
--description description-string
--source-path /
--target targetHdfs2
--target-path /repl_test1
{
"id": "testPath",
"description": "description-string",
"sourceFileSystem": "cos_s3_source2",
"sourcePath": "/",
"targetFileSystem": "targetHdfs2",
"targetPath": "/repl_test1"
}

`filesystem auto-discover-source hdfs`

Discover your local HDFS filesystem by entering the Kerberos credentials for your source environment.

You can also manually configure the source HDFS filesystem using the filesystem add hdfs command.

Auto-discover-source Hadoop Distributed File System (HDFS)
filesystem auto-discover-source hdfs    [--kerberos-principal] string
                                        [--kerberos-keytab] string
                                        [--scan-only] 

Kerberos parameters

--kerberos-principal The Kerberos principal to authenticate with and perform migrations as. This principal should map to the HDFS super user using auth_to_local rules.
--kerberos-keytab The Kerberos keytab containing the principal defined for the --kerberos-principal parameter. This must be accessible to the local system user running the Data Migrator service (default is hdfs).

Optional

--scan-only Supply this parameter to create a static source filesystem for use in one-time, non-live migrations.

Example

filesystem auto-discover-source hdfs --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal hdfs@REALM.COM

`filesystem clear`

Delete all target filesystem references with the filesystem clear. This leaves any migrated content intact in those targets, but removes all resources that act as references to the target filesystems.

Delete all targets

filesystem clear

`filesystem delete`

Delete a specific filesystem resource by ID. This leaves all migrated content intact at that target, but removes the resource that acts as a reference to that filesystem.

Delete a target

filesystem delete [--file-system-id] string

Mandatory parameters

--file-system-id The ID of the filesystem resource to delete.

Example

filesystem delete --file-system-id mytarget

`filesystem list`

List defined filesystem resources.

List targets

filesystem list [--detailed]

Mandatory parameters

--detailed Include all properties for each filesystem in the JSON result.

`filesystem show`

View details for a filesystem resource.

Get target details

filesystem show [--file-system-id] string  
                [--detailed]

Mandatory parameters

--file-system-id The ID of the filesystem resource to show.

Example

filesystem show --file-system-id mytarget

`filesystem types`

View information about the filesystem types available for use with Data Migrator.

List the types of target filesystems available

filesystem types

`filesystem update adls2 oauth`

Update an existing ADLS Gen2 container migration target with a specified filesystem ID using the filesystem update adls2 oauth command. You will be prompted to optionally update the service principal and OAuth 2 credentials.

Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add adls2 oauth section.