Version: 2.1.1

Command reference

System service commands

The service scripts are used to control operation of each individual service. In most supported Linux distributions, the following commands can be used to manage Data Migrator, Hive Migrator, and WANdisco UI processes.

Data Migrator

systemd command	Use it to...
`systemctl start livedata-migrator`	Start a service that isn't currently running.
`systemctl stop livedata-migrator`	Stop a running service.
`systemctl restart livedata-migrator`	Run a command that performs a `stop` and then a `start`. If the service isn't running, this works the same as a `start` command.
`systemctl status livedata-migrator`	Get details of the running service's status.

Running without systemd

For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:

service <service name> <command>

Example command to restart Data Migrator:

service livedata-migrator restart

info

If you're working in an environment without systemd or a system and service manager, you need to run the start.sh script located in /opt/wandisco/livedata-migrator again after running the restart command.

note

CentOS 6 systems don't support the service command. Instead, use initctl with the format:

initctl <command> <service name>

For example:

initctl restart livedata-migrator

Hive Migrator

Service script	Use it to...
`systemctl start hivemigrator`	Start a service that isn't currently running.
`systemctl stop hivemigrator`	Stop a running service.
`systemctl restart hivemigrator`	Run a command that performs a `stop` and then a `start`. If the service isn't running, this works the same as a `start` command.
`systemctl status hivemigrator`	Get details of the running service's status.

info

Always start/restart Hive Migrator services in the following order:

Remote agents
Hive Migrator service.

Not starting services in this order may cause live migrations to fail.

If you're working in an environment without systemd or a system and service manager, you need to run the start.sh script located in /opt/wandisco/hivemigrator again after running the restart command.

Running without systemd

For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:

service <service name> <command>

Example command to view status of Hive Migrator:

service hivemigrator status

note

CentOS 6 systems don't support the service command. Instead, use initctl with the format:

initctl <command> <service name>

For example:

initctl start hivemigrator

Hive Migrator remote server

Service script	Use it to...
`systemctl start hivemigrator-remote-server`	Start a service that isn't currently running.
`systemctl stop hivemigrator-remote-server`	Stop a running service.
`systemctl restart hivemigrator-remote-server`	Run a command that performs a `stop` and then a `start`. If the service isn't running, this works the same as a `start` command.
`systemctl status hivemigrator-remote-server`	Get details of the running service's status.

info

Always start/restart Hive Migrator services in the following order:

Remote agents
Hive Migrator service.

Not starting services in this order may cause live migrations to fail.

Running without systemd

For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:

service <service name> <command>

Example command to view status of Hive Migrator remote server:

service hivemigrator-remote-server status

note

CentOS 6 systems don't support the service command. Instead, use initctl with the format:

initctl <command> <service name>

For example:

initctl start hivemigrator-remote-server

WANdisco UI

Service script	Use it to...
`systemctl start livedata-ui`	Start a service that isn't currently running.
`systemctl stop livedata-ui`	Stop a running service.
`systemctl restart livedata-ui`	Run a command that performs a `stop` and then a `start`. If the service isn't running, this works the same as a `start` command.
`systemctl status livedata-ui`	Get details of the running service's status.

Running without systemd

For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:

service <service name> <command>

Example command to stop a WANdisco UI service:

service livedata-ui status

info

If you're working in an environment without systemd or a system and service manager, you need to run the start.sh script located in /opt/wandisco/livedata-ui again after running the restart command.

note

CentOS 6 systems don't support the service command. Instead, use initctl with the format:

initctl <command> <service name>

For example:

initctl start livedata-ui

Data transfer agents

systemd command	Use it to...
`systemctl start livedata-migrator-data-agent`	Start a service that isn't currently running.
`systemctl stop livedata-migrator-data-agent`	Stop a running service.
`systemctl restart livedata-migrator-data-agent`	Run a command that performs a `stop` and then a `start`. If the service isn't running, this works the same as a `start` command.
`systemctl status livedata-migrator-data-agent`	Get details of the running service's status.

Running without systemd

For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:

service <service name> <command>

Example command to restart a data transfer agent

service livedata-migrator-data-agent restart

info

If you're working in an environment without systemd or a system and service manager, you need to run the start.sh scripts located in /opt/wandisco/livedata-migrator-data-agent again after running the restart commands.

note

CentOS 6 systems don't support the service command. Instead, use initctl with the format:

initctl <command> <service name>

For example:

initctl restart livedata-migrator-data-agent

Connect to the WANdisco CLI

Open a terminal session on the Data Migrator host machine and enter the following command:

`livedata-migrator`

When the CLI connects to the Data Migrator and Hive Migrator services, you get the following command prompt:

WANdisco LiveData Migrator >>

The CLI is now ready to accept commands.

Optional parameters

--host The IP or hostname of the Data Migrator API to connect to. Defaults to localhost when not specified.
--vm-port Data Migrator API port. Defaults to 18080 when not specified.
--hm-port Hivemigrator API port. Defaults to 6780 when not specified.
--lm-ssl Flag to use https. Defaults to http when not specified.

Version check

Check the current versions of included components by using the livedata-migrator command with the --version parameter. For example:

# livedata-migrator --version

tip

This doesn't start the CLI. You get a list of the current Data Migrator components, along with their version numbers.

WANdisco CLI features

Feature	How to use it
Review available commands	Use the `help` command to get details of all commands available.
Command completion	Hit the `<tab>` key at any time to get assistance or to complete partially-entered commands.
Cancel input	Type `<Ctrl-C>` before entering a command to return to an empty action prompt.
Syntax indication	Invalid commands are highlighted as you type.
Clear the display	Type `<Ctrl-L>` at any time.
Previous commands	Navigate previous commands using the up and down arrows, and use standard emacs shortcuts.
Interactive or scripted operation	You can interact with the command line interface directly, or send it commands on standard input to incorporate it into shell scripts. See `script` for more information and examples.

WANdisco CLI commands

You can manage filesystems, migrations, and more in the WANdisco CLI.

Auto source cleanup commands

note

In the following commands, use the:

[--name] parameter for existing migrations.
[--migration-id] parameter for deleted migrations.

`migration deletion-report list`

View a list of recent cleanup reports.

Show a list of source cleanup reports for a migration

migration deletion-report list    [--name] string  
                                  [--date] string

Mandatory parameters

--name or --migration-id Specify the name or the ID of the migration for which you want to view a list of cleanup reports.

Optional parameters

--date Enter the date if you want to view a list of all the cleanup reports after this date. Use one of the following date formats:
- DD.MM.YYYY
- DD-MM-YYYY
- DD/MM/YYYY

`migration deletion-report download`

Download source cleanup reports.

Download a source cleanup report
migration deletion-report download    [--migration-id] string  
                                      [--report-name] string
                                      [--out-dir] string

Mandatory parameters

--migration-id Specify the ID of the migration for which you want to download a cleanup report.
--report-name Specify the names of the report you want to download for a migration.
--out-dir Specify the directory to which you want to download the report.

Example: Download a cleanup report

migration deletion-report download --migration-id ab123c03-697b-48a5-93cc-abc23838d37d-1668593022565 --report-name exampleCleanupReportName --out-dir /user/exampleCleanupDirectory

`migration deletion-report delete`

Delete source cleanup reports.

Delete one or more source cleanup reports for a migration

migration deletion-report delete    [--name] string  
                                    [--report-names] string

Mandatory parameters

--name or --migration-id Specify the name or the ID of the migration for which you want to delete cleanup reports.
--report-names Specify comma-separated report names of the reports you want to delete for a migration.

Backup commands

`backup add`

Immediately create a backup file

backup add

`backup config show`

Show the current backups configuration
backup config show

{
"backupsLocation": "/opt/wandisco/livedata-migrator/db/backups",
"lastSuccessfulTs": 0,
"backupSchedule": {
"enabled": true,
"periodMinutes": 10
},
"storedFilePaths": [
"/etc/wandisco/livedata-migrator/application.properties",
"/etc/wandisco/livedata-migrator/vars.env",
"/etc/wandisco/livedata-migrator/logback-spring.xml",
"/etc/wandisco/livedata-migrator/application.properties",
"/etc/wandisco/livedata-migrator/vars.env",
"/etc/wandisco/livedata-migrator/logback-spring.xml"
]
}

`backup list`

List all existing backup files

backup list

`backup restore`

Restore from a specified backup file

backup restore --name string

`backup schedule configure`

Configure a backup schedule for Data Migrator
backup schedule configure --period-minutes 10 --enable

{
"enabled": true,
"periodMinutes": 10
}

`backup schedule show`

Show current backup schedule
backup schedule show

{
"enabled": true,
"periodMinutes": 10
}

`backup show`

Show a specified backup file

backup show --name string

Bandwidth policy commands

`bandwidth policy delete`

Allow the application to use unlimited bandwidth

bandwidth policy delete

`bandwidth policy set`

Set the application bandwidth limit, in bytes per second

bandwidth policy set    [--value] long  
                        [--unit] string

Mandatory parameters

--value Define the number of byte units.
--unit Define the byte unit to be used.
Decimal units: B, KB, MB, GB, TB, PB.
Binary units: KiB, MiB, GiB, TiB, PiB.

Example

Set a limit of 10 Megabytes per second

bandwidth policy set --value 10 --unit MB

`bandwidth policy show`

Get details of the application bandwidth limit, in bytes per second

bandwidth policy show

Data transfer agent commands

`agent add`

Add a new agent.

Mandatory parameters

--agent-name
User-specified agent name.

You must enter a value for either the --agent-token or the --agent-token-file parameter:

--agent-token
Connection token text provided by the token generator. You can use the content of /opt/wandisco/livedata-migrator-data-agent/connection_token in the node on which you're installing the agent.
--agent-token-file
Path to file contains connection token, for example /opt/wandisco/livedata-migrator-data-agent/connection_token. Ensure the token file is accessible on the Data Migrator host.

Example

agent add --agent-name dta1 --agent-token-file /opt/wandisco/livedata-migrator-data-agent/connection_token

To check the agent was added, run:

agent show --agent-name example_name

Register an agent

Curl example

curl -X POST -H "Content-Type: application/json" -d @/opt/wandisco/livedata-migrator-data-agent/reg_data_agent.json http://migrator-host:18080/scaling/dataagents/

Check the agent was added

curl -X GET http://migrator-host:18080/scaling/dataagents/example_name

note

migrator-host is the host where Data Migrator is installed.

Start an agent

service livedata-migrator-data-agent start

Remove an agent

agent delete --agent-name example_name

Example: Remove an agent

agent delete --agent-name agent-example-vm.bdauto.wandisco.com

Mandatory parameters

--agent-name
The name you give the agent which can be a string such as agent-example-vm.bdauto.wandisco.com.

View an agent

agent show --agent-name example_name

Example: View an agent

agent show --agent-name agent-example-vm.bdauto.wandisco.com

Example output
{
"name": "agent-example-vm.bdauto.wandisco.com",
"host": "example-vm.bdauto.wandisco.com",
"port": 1433,
"type": "GRPC",
"version": "2.0.0",
"healthy": true,
"health": {
"lastStatusUpdateTime": 1670924489556,
"lastHealthMessage": "Agent agent-example-vm.bdauto.wandisco.com - health check became OK",
"status": "CONNECTED"

Mandatory parameters

--agent-name
User-specified agent name.

`agent list`

List all agents.

Filesystem commands

`filesystem add adls2 oauth`

Add an Azure Data Lake Storage (ADLS) Gen2 container as a migration target using the filesystem add adls2 oauth command, which requires a service principal and OAuth 2 credentials.

note

The service principal that you want to use must have either the Storage Blob Data Owner role assigned to the ADLS Gen2 storage account, or an access control list with RWX permissions for the migration path and all parent paths. For more information, see the Microsoft documentation.

Add an ADLS Gen2 filesystem with OAuth
    filesystem add adls2 oauth          [--container-name] string  
                                        [--file-system-id] string
                                        [--insecure] 
                                        [--oauth2-client-endpoint] string
                                        [--oauth2-client-id] string 
                                        [--oauth2-client-secret] string
                                        [--properties] string
                                        [--properties-files] list 
                                        [--source]                                                              
                                        [--storage-account-name] string 

Mandatory parameters

--container-name The name of the container in the storage account to which content will be migrated.
--file-system-id The ID to give the new filesystem resource.
--oauth2-client-endpoint The client endpoint for the Azure service principal.
This will often take the form of https://login.microsoftonline.com/{tenant}/oauth2/v2.0/token where {tenant} is the directory ID for the Azure service principal. You can enter a custom URL (such as a proxy endpoint that manually interfaces with Azure Active Directory (Azure AD)).
--oauth2-client-id The client ID (also known as application ID) for your Azure service principal.
--oauth2-client-secret The client secret (also known as application secret) for the Azure service principal.
--storage-account-name The name of the ADLS Gen2 storage account to target.

Optional parameters

--insecure If you enter this parameter, Data Migrator will not use TLS to encrypt communication with ADLS Gen2. This may improve throughput, but should only be used when you have other means of securing communication.
--properties Enter properties to use in a comma-separated key/value list.
--properties-files Reference a list of existing properties files, each containing Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.
--source Add this filesystem as the source for migrations.

Example

filesystem add adls2 oauth --file-system-id mytarget
                           --storage-account-name myadls2
                           --oauth2-client-id b67f67ex-ampl-e2eb-bd6d-client9385id
                           --oauth2-client-secret 2IPO8*secretk-9OPs8n*TexampleHJ=
                           --oauth2-client-endpoint https://login.microsoftonline.com/78u098ex-ampl-e498-8bce-ndpoint5f2e5/oauth2/v2.0/token
                           --container-name lm2target

`filesystem add adls2 sharedKey`

Add an ADLS Gen2 container as a migration target using the filesystem add adls2 sharedKey command, which requires credentials in the form of an account key.

Add an ADLS Gen2 filesystem with Shared Key
filesystem add adls2 sharedKey      [--file-system-id] string  
                                    [--storage-account-name] string  
                                    [--container-name] string  
                                    [--insecure]  
                                    [--shared-key] string  
                                    [--properties-files] list  
                                    [--properties] string  
                                    [--source]

Mandatory parameters

--file-system-id The ID to give the new filesystem resource.
--storage-account-name The name of the ADLS Gen2 storage account to target.
--shared-key The shared account key to use as credentials to write to the storage account.
--container-name The name of the container in the storage account to which content will be migrated.

Optional parameters

--insecure If you enter this parameter, Data Migrator will not use TLS to encrypt communication with ADLS Gen2. This may improve throughput, but should only be used when you have other means of securing communication.
--properties-files Reference a list of existing properties files, each containing Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.
--properties Enter properties to use in a comma-separated key/value list.
--source Add this filesystem as the source for migrations.

Example

filesystem add adls2 sharedKey  --file-system-id mytarget
                                --storage-account-name myadls2
                                --container-name lm2target
                                --shared-key Yi8NxHGqoQ79DBGLVn+COK/sRDwbNqAEXAMPLEDaMxRkvXt2ijUtASHAREDj/vaS/NbzR5rtjEKEY31eIopUVA==

`filesystem add gcs`

Add a Google Cloud Storage as a migration target using the filesystem add gcs command, which requires credentials in the form of an account key file.

Add a Google Cloud Storage filesystem
filesystem add gcs      [--file-system-id] string  
                        [--service-account-json-key-file] string  
                        [--service-account-p12-key-file] string  
                        [--service-account-json-key-file-server-location] string  
                        [--service-account-p12-key-file-server-location] string  
                        [--service-account-email] string  
                        [--bucket-name] string  
                        [--properties-files] list  
                        [--properties] string

Mandatory parameters

--file-system-id The ID to give the new filesystem resource.
--bucket-name The bucket name of a Google Cloud Storage account.
Service account key parameters
info
Enter your service account key for the Google Cloud Storage bucket by choosing one of the parameters below.
You can also upload the service account key directly when using the UI (this isn't supported through the CLI).

--service-account-json-key-file-server-location
The absolute filesystem path on the Data Migrator server of your service account key file in JSON format. You can either create a Google Cloud Storage service account key or use an existing one.
In the UI, this is called Key File and becomes visible when you select Key File Options -> Provide a Path.
--service-account-p12-key-file-server-location
The absolute filesystem path on the Data Migrator server of your service account key file in P12 format. You can either create a Google Cloud Storage service account key or use an existing one.
--service-account-json-key-file
The absolute filesystem path on the host running the WANdisco CLI of your service account key file in JSON format. Use this parameter if you are running the WANdisco CLI on a different host to your Data Migrator server.
--service-account-p12-key-file
The absolute filesystem path on the host running the WANdisco CLI of your service account key file in P12 format. Use this parameter if you are running the WANdisco CLI on a different host to your Data Migrator server.

Optional parameters

--service-account-email The email address linked to your Google Cloud Storage service account.
--properties-files Reference a list of existing properties files, each containing Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.
--properties Enter properties to use in a comma-separated key/value list.

Example

filesystem add gcs      --file-system-id gcsAgent
                        --bucket-name myGcsBucket
                        --service-account-p12-key-file-server-location /user/hdfs/targetStorage/myAccountKey.p12
                        --service-account-email user@mydomain.com

`filesystem add hdfs`

Add a Hadoop Distributed File System (HDFS) as either a migration source or target using the filesystem add hdfs command.

Creating a HDFS resource with this command will normally only be used when migrating to a target HDFS filesystem (rather than another storage service like ADLS Gen2 or S3a). Data Migrator will attempt to auto-discover the source HDFS when started from the command line unless Kerberos is enabled on your source environment.

If Kerberos is enabled on your source environment, use the filesystem auto-discover-source hdfs command to enter Kerberos credentials and auto-discover your source HDFS configuration.

Add a Hadoop Distributed File System
filesystem add hdfs     [--file-system-id] string  
                        [--default-fs] string  
                        [--user] string
                        [--kerberos-principal] string  
                        [--kerberos-keytab] string  
                        [--source]  
                        [--scan-only]  
                        [--success-file] string
                        [--properties-files] list  
                        [--properties] string

Mandatory parameters

--file-system-id The ID to give the new filesystem resource.
--default-fs A string that defines how Data Migrator accesses HDFS.
It can be specified in a number of forms:
1. As a single HDFS URI, such as hdfs://192.168.1.10:8020 (using an IP address) or hdfs://myhost.localdomain:8020 (using a hostname).
2. As a HDFS URI that references a nameservice if the NameNodes have high availability, for example, hdfs://mynameservice. For more information, see HDFS High Availability.

Optional parameters

Kerberos: Cross-realm authentication required between source and target HDFS

Cross-realm authentication is required in the following scenarios:

Migration will occur between a source and target HDFS.
Kerberos is enabled on both clusters.

See the links below for guidance for common Hadoop distributions:

CDH
CDP
Red Hat (Unmanaged)
HDP

--user The name of the HDFS user to be used when performing operations against the filesystem. In environments where Kerberos is disabled, this user must be the HDFS super user, such as hdfs.
--kerberos-principal The Kerberos principal to authenticate with and perform migrations as. This principal should map to the HDFS super user using auth_to_local rules.
--kerberos-keytab The Kerberos keytab containing the principal defined for the --kerberos-principal parameter. This must be accessible to the local system user running the Data Migrator service (default is hdfs).
--source Enter this parameter to use the filesystem resource created as a source.
--scan-only Enter this parameter to create a static source filesystem for use in one-time migrations. Requires --source.
--properties-files Reference a list of existing properties files that contain Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.
--properties Enter properties to use in a comma-separated key/value list.
--success-file Specify a file name or glob pattern for files that Data Migrator will migrate last from the directory they're contained in. For example, --success-file /mypath/myfile.txt or --success-file /**_SUCCESS. You can use these files to confirm the directory they're in has finished migrating.

Properties files are required for NameNode HA

If your Hadoop cluster has NameNode HA enabled, enter the local filesystem path to the properties files that define the configuration for the nameservice ID.

Source HDFS filesystem: These configuration files will likely be in a default location depending on the distribution of the Hadoop cluster.

Target HDFS filesystem: Ensure that the target Hadoop cluster configuration is available on your Data Migrator host's local filesystem.

Example for path containing source cluster configuration

/etc/hadoop/conf

Example for path containing target cluster configuration

/etc/targetClusterConfig

Alternatively, define the absolute filesystem paths to these files:

Example for absolute paths to source cluster configuration files

/etc/hadoop/conf/core-site.xml
/etc/hadoop/conf/hdfs-site.xml

Example for absolute paths to target cluster configuration files

/etc/targetClusterConfig/core-site.xml
/etc/targetClusterConfig/hdfs-site.xml

For the CLI/API, use the --properties-files parameter and define the absolute paths to the core-site.xml and hdfs-site.xml files (see the Examples section for CLI usage of this parameter).

Examples

HDFS as source

Example for source NameNode HA cluster
filesystem add hdfs     --file-system-id mysource
                        --default-fs hdfs://sourcenameservice
                        --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml

Example for source NameNode HA cluster with Kerberos enabled
filesystem add hdfs     --file-system-id mysource
                        --default-fs hdfs://sourcenameservice
                        --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
                        --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab
                        --kerberos-principal hdfs@SOURCEREALM.COM

HDFS as target

note

If you enter a HDFS filesystem as a target, the property files for the target cluster must exist on the local filesystem and be accessible to the Data Migrator system user.

Example for target NameNode HA cluster with Kerberos enabled

filesystem add hdfs --file-system-id mytarget --default-fs hdfs://targetnameservice --properties-files /etc/targetClusterConfig/core-site.xml,/etc/targetClusterConfig/hdfs-site.xml --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal hdfs@SOURCEREALM.COM

Example for target single NameNode cluster

filesystem add hdfs --file-system-id mytarget --default-fs hdfs://namenode.targetdomain:8020 --user hdfs

`filesystem add local`

Add a local filesystem as either a migration target or source using the filesystem add local command.

Add a Local Filesystem
filesystem add local    [--file-system-id] string
                        [--fs-root] string
                        [--source]
                        [--scan-only]
                        [--properties-files] list
                        [--properties] string

Mandatory parameters

--file-system-id The ID to give the new filesystem resource.

Optional parameters

--fs-root The directory in the local filesystem to scan for data or send data to, depending on whether the filesystem is defined as a source or a target. Should be supplied using the full directory path from the root.
--source Enter this parameter to use the filesystem resource created as a source.
--scan-only Enter this parameter to create a static source filesystem for use in one-time migrations. Requires --source.
--properties-files Reference a list of existing properties files.
--properties Enter properties to use in a comma-separated key/value list.

note

If no fs-root is specified, the file path will default to the root of your system.

Examples

Local filesystem as source

filesystem add local --file-system-id mytarget --fs-root ./tmp --source

Local filesystem as target

filesystem add local --file-system-id mytarget --fs-root ./Users/username/destinationfolder/

`filesystem add s3a`

Add an S3-compatible filesystem as a source or target for migration.

For details on which platforms support S3, see Supported sources and targets.

info

As of Data Migrator 2.1.1 hcfs.ssl.channel.mode replaces the use of fs.s3a.ssl.channel.mode and fs.azure.ssl.channel.mode which are no longer valid. See SSL implementation for information on the property and values used.

Use the filesystem add s3a command with the following parameters:

Add an S3 filesystem
filesystem add s3a          [--access-key] string
                            [--aws-config-file] string  
                            [--aws-profile] string
                            [--bootstrap.servers] string 
                            [--bucket-name] string
                            [--credentials-provider] string
                            [--endpoint] string                             
                            [--file-system-id] string 
                            [--properties] string   
                            [--properties-files] list 
                            [--s3type] string  
                            [--scan-only]
                            [--secret-key] string  
                            [--source]  
                            [--sqs-endpoint] string
                            [--sqs-queue] string                                                    
                            [--topic] string

For guidance about access, permissions, and security when adding an Amazon S3 bucket as a target filesystem, see Security best practices in IAM.

S3A mandatory parameters

--file-system-id The ID for the new filesystem resource.
--bucket-name The name of your S3 bucket.
--credentials-provider The Java class name of a credentials provider for authenticating with the Amazon S3 endpoint.
The Provider options available include:
- org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
  Use this provider to offer credentials as an access key and secret access key with the --access-key and --secret-key Parameters.
- com.amazonaws.auth.InstanceProfileCredentialsProvider
  Use this provider when running Data Migrator on an Elastic Compute Cloud (EC2) instance that has been assigned an IAM role with policies that allow it to access the Amazon S3 bucket.
- com.amazonaws.auth.DefaultAWSCredentialsProviderChain
  A commonly-used credentials provider chain that looks for credentials in this order:
  - Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, or AWS_ACCESS_KEY and AWS_SECRET_KEY.
  - Java System Properties - aws.accessKeyId and aws.secretKey.
  - Web Identity Token credentials from the environment or container.
  - Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI.
  - Credentials delivered through the Amazon EC2 container service if AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable is set and security manager has permission to access the variable.
  - Instance profile credentials delivered through the Amazon EC2 metadata service.
- com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider
  This provider supports the use of multiple AWS credentials, which are stored in a credentials file.
  When adding a source filesystem, use the following properties:
  - awsProfile - Name for the AWS profile.
  - awsCredentialsConfigFile - Path to the AWS credentials file. The default path is ~/.aws/credentials.
    For example:
    filesystem add s3a --file-system-id testProfile1Fs --bucket-name profile1-bucket --credentials-provider com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider --properties awsProfile=<profile-name>, awsCredentialsConfigFile=</path/to/the/aws/credentials" file>
    In the CLI, you can also use --aws-profile and --aws-config-file.
    For example:
    filesystem add s3a --file-system-id testProfile1Fs --bucket-name profile1-bucket --credentials-provider com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider --aws-profile <profile-name> --aws-config-file </path/to/the/aws/credentials/file>
    Learn more about using AWS profiles: Configuration and credential file settings.

S3A optional parameters

--access-key When using the org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider credentials provider, enter the access key with this parameter. This is also a required parameter when adding an IBM Cloud Object Storage bucket.
--secret-key When using the org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider credentials provider, enter the secret key using this parameter. This is also a required parameter when adding an IBM Cloud Object Storage bucket.
--endpoint Enter a specific endpoint to access the S3-compatible bucket, such as an AWS PrivateLink endpoint or an IBM COS public regional endoint. If you don't enter a value, the filesystem defaults to AWS.
note
Using --endpoint, will supercede fs.s3a.endpoint, if used as an additional custom property. Don't use the parameters at the same time.
--sqs-queue [Amazon S3 as a source only] Enter an SQS queue name. This field is required if you enter an SQS endpoint.
--sqs-endpoint [Amazon S3 as a source only] Enter an SQS endpoint.
--source Enter this parameter to add the filesystem as a source. See which platforms are supported as a source.
--scan-only Enter this parameter to create a static source filesystem for use in one-time migrations. Requires --source.
--properties-files Reference a list of existing properties files, each containing Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.
--properties Enter properties to use in a comma-separated key/value list.
--s3type Specifies what parameters are required, based on the requirements of your selected s3a-compatible filesystem. Leave it blank for s3-compatible storage or select from the following:
- aws
- oracle
- ibmcos

IBM COS as a source only

--bootstrap.servers The Kafka server address.
--topic Kafka's topic where s3 object change notifications are provided.

S3a default properties

These properties are defined by default when adding an S3a filesystem.

fs.s3a.impl (default org.apache.hadoop.fs.s3a.S3AFileSystem): The implementation class of the S3a Filesystem.
fs.AbstractFileSystem.s3a.impl (default org.apache.hadoop.fs.s3a.S3A): The implementation class of the S3a AbstractFileSystem.
fs.s3a.user.agent.prefix (default APN/1.0 WANdisco/1.0 LiveDataMigrator/1.11.6): Sets a custom value that will be pre-pended to the User-Agent header sent in HTTP requests to the S3 back-end by S3aFileSystem.
fs.s3a.impl.disable.cache (default true): Disables the S3 filesystem cache when set to 'true'.
hadoop.tmp.dir (default tmp): The parent directory for other temporary directories.
fs.s3a.connection.maximum (default 120) Defines the maximum number of simultaneous connections to the S3 filesystem.
fs.s3a.threads.max (default 150): Defines the total number of threads to make available in the filesystem for data uploads or any other queued filesystem operation.
fs.s3a.max.total.tasks (default 60): Defines the number of operations which can be queued for execution at a time.
fs.s3a.healthcheck (Default true): Allows the S3A filesystem health check to be turned off by changing true to false. This option is useful for setting up Data Migrator while cloud services are offline. However, when disabled, errors in S3A configuration may be missed, resulting in hard-to-diagnose migration stalls.

S3a custom properties

These are some of the additional properties that can be added when creating an S3a filesystem.

fs.s3a.fast.upload.buffer (default disk): Defines how the filesystem will buffer the upload.
fs.s3a.fast.upload.active.blocks (default 8): Defines how many blocks a single output stream can have uploading or queued at a given time.
fs.s3a.block.size (default 32M): Defines the maximum size of blocks during file transfer. Use the suffix K, M, G, T or P to scale the value in Kilobytes, Megabytes, Gigabytes, Terabytes or Petabytes respectively.
fs.s3a.buffer.dir (default tmp): Defines the directory used by disk buffering.
fs.s3a.endpoint.region (default Current region): Explicitly sets the bucket region.

note

To configure a Oracle Cloud Storage bucket which isn't in your default region. Specify a fs.s3a.endpoint.region=<region> with the --properties flag when adding the filesystem with the CLI.

See Oracle Cloud Storage additional properties example.

Find an additional list of S3a properties in the S3a documentation.

Upload buffering

Migrations using an S3A target destination will buffer all uploads. By default, the buffering will occur on the local disk of system Data Migrator is running on, in the /tmp directory.

Data Migrator will automatically delete the temporary buffering files once they are no longer needed.

If you want to use a different type of buffering, you can change the property fs.s3a.fast.upload.buffer. The following values can be supplied:

Buffering Option	Details	Property Value
Array Buffer	Buffers the uploaded data in memory instead of on disk, using the Java heap.	`array`
Byte Buffer	Buffers the uploaded data in memory instead of on disk, but does not use the Java heap.	`bytebuffer`
Disk Buffering	The default option. Buffers the upload to disk.	`disk`

Both the array and bytebuffer options may lead to the consumption of large amounts of memory. Other properties (such as fs.s3a.fast.upload.active.blocks) may be used to fine-tune the migration to avoid issues.

note

If you run out of disk space on which to buffer the migration, the migration will stall with a series of errors. To avoid this, ensure the filesystem containing the directory used for buffering (/tmp by default) has enough remaining space to facilitate the transfer.

S3a Example

filesystem add s3a --file-system-id mytarget
--bucket-name mybucket1
--credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
--access-key pkExampleAccessKeyiz
--secret-key OP87ChoExampleSecretKeyI904lT7AaDBGJpp0D

IBM Cloud Object Storage Examples

Add source IBM Cloud Object Storage filesystem. Note that this does not work if SSL is used on the endpoint address.

filesystem add s3a --source --file-system-id cos_s3_source2
--bucket-name container2
--credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
--access-key pkExampleAccessKeyiz
--secret-key c3vq6vaNtExampleSecretKeyVuqJMIHuV9IF3n9
--s3type ibmcos
--bootstrap.servers=10.0.0.123:9092
--topic newcos-events--enpoint http://10.0.0.124

Add path mapping.

path mapping add --path-mapping-id testPath
--description description-string
--source-path /
--target targetHdfs2
--target-path /repl_test1
{
"id": "testPath",
"description": "description-string",
"sourceFileSystem": "cos_s3_source2",
"sourcePath": "/",
"targetFileSystem": "targetHdfs2",
"targetPath": "/repl_test1"
}

`filesystem auto-discover-source hdfs`

Discover your local HDFS filesystem by entering the Kerberos credentials for your source environment.

You can also manually configure the source HDFS filesystem using the filesystem add hdfs command.

Auto-discover-source Hadoop Distributed File System (HDFS)
filesystem auto-discover-source hdfs    [--kerberos-principal] string
                                        [--kerberos-keytab] string
                                        [--scan-only] 

Kerberos parameters

--kerberos-principal The Kerberos principal to authenticate with and perform migrations as. This principal should map to the HDFS super user using auth_to_local rules.
--kerberos-keytab The Kerberos keytab containing the principal defined for the --kerberos-principal parameter. This must be accessible to the local system user running the Data Migrator service (default is hdfs).

Optional

--scan-only Supply this parameter to create a static source filesystem for use in one-time, non-live migrations.

Example

filesystem auto-discover-source hdfs --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal hdfs@REALM.COM

`filesystem clear`

Delete all target filesystem references with the filesystem clear. This leaves any migrated content intact in those targets, but removes all resources that act as references to the target filesystems.

Delete all targets

filesystem clear

`filesystem delete`

Delete a specific filesystem resource by ID. This leaves all migrated content intact at that target, but removes the resource that acts as a reference to that filesystem.

Delete a target

filesystem delete [--file-system-id] string

Mandatory parameters

--file-system-id The ID of the filesystem resource to delete.

Example

filesystem delete --file-system-id mytarget

`filesystem list`

List defined filesystem resources.

List targets

filesystem list [--detailed]

Mandatory parameters

--detailed Include all properties for each filesystem in the JSON result.

`filesystem show`

View details for a filesystem resource.

Get target details

filesystem show [--file-system-id] string  
                [--detailed]

Mandatory parameters

--file-system-id The ID of the filesystem resource to show.

Example

filesystem show --file-system-id mytarget

`filesystem types`

View information about the filesystem types available for use with Data Migrator.

List the types of target filesystems available

filesystem types

`filesystem update adls2 oauth`

Update an existing ADLS Gen2 container migration target with a specified filesystem ID using the filesystem update adls2 oauth command. You will be prompted to optionally update the service principal and OAuth 2 credentials.

Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add adls2 oauth section.

All parameters are optional except --file-system-id, which specifies the filesystem you want to update.

Example

filesystem update adls2 oauth --file-system-id mytarget --storage-account-name myadls2 --oauth2-client-id b67f67ex-ampl-e2eb-bd6d-client9385id --oauth2-client-secret 2IPO8*secretk-9OPs8n*TexampleHJ= --oauth2-client-endpoint https://login.microsoftonline.com/78u098ex-ampl-e498-8bce-ndpoint5f2e5/oauth2/v2.0/token --container-name lm2target

`filesystem update adls2 sharedKey`

Update an existing ADLS Gen2 container migration target using the filesystem update adls2 sharedKey command. You will be prompted to optionally update the secret key.

All parameters are optional except --file-system-id, which specifies the filesystem you want to update.

Example

filesystem update adls2 sharedKey --file-system-id mytarget --storage-account-name myadls2 --container-name lm2target --shared-key Yi8NxHGqoQ79DBGLVn+COK/sRDwbNqAEXAMPLEDaMxRkvXt2ijUtASHAREDj/vaS/NbzR5rtjEKEY31eIopUVA==

`filesystem update gcs`

Update a Google Cloud Storage migration target using the filesystem update gcs command.

All parameters are optional except --file-system-id, which specifies the filesystem you want to update.

Example

filesystem update gcs --file-system-id gcsAgent --bucket-name myGcsBucket --service-account-p12-key-file-server-location /user/hdfs/targetStorage/myAccountKey.p12 --service-account-email user@mydomain.com

`filesystem update hdfs`

Update either a source or target Hadoop Distributed filesystem using the filesystem update hdfs command.

All parameters are optional except --file-system-id, which specifies the filesystem you want to update.

Examples

Example for source NameNode HA cluster
filesystem update hdfs  --file-system-id mysource
                        --default-fs hdfs://sourcenameservice
                        --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml

Example for source NameNode HA cluster with Kerberos enabled
filesystem update hdfs  --file-system-id mytarget
                        --default-fs hdfs://sourcenameservice
                        --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
                        --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab
                        --kerberos-principal hdfs@SOURCEREALM.COM

`filesystem update local`

Update a target or source local filesystem using the filesystem update local command.

All parameters are optional except --file-system-id, which specifies the filesystem you want to update.

Example

filesystem update local --file-system-id mytarget --fs-root ./tmp

`filesystem update s3a`

Update an S3 bucket target filesystem using the filesystem update s3a command. This method also supports IBM Cloud Object Storage buckets.

All parameters are optional except --file-system-id, which specifies the filesystem you want to update.

Example

filesystem update s3a   --file-system-id mytarget
                        --bucket-name mybucket1 --credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider --access-key pkExampleAccessKeyiz --secret-key eSeCreTkeYd8uEDnDHRHuV9IF3n9

Hive agent configuration commands

info

It's not possible to adjust some TLS parameters for remote metastore agents after creation. Find more information in the following Knowledge base article.

`hive agent add azure`

Add a local or remote Hive agent to connect to an Azure SQL database using the hive agent add azure command.

If your Data Migrator host can communicate directly with the Azure SQL database, then a local Hive agent is sufficient. Otherwise, consider using a remote Hive agent.

remote deployments

For a remote Hive agent connection, enter a remote host (Azure VM, HDI cluster node) to communicate with the local Data Migrator server (constrained to a user-defined port).

A small service is deployed on this remote host so that the data can transfer between the Hive agent and the remote metastore.

Add Azure SQL agent
hive agent add azure    [--name] string
                        [--db-server-name] string
                        [--database-name] string
                        [--database-user] string
                        [--database-password] string
                        [--auth-method]  azure-sqlauthentication-method
                        [--client-id] string
                        [--storage-account] string
                        [--container-name] string
                        [--insecure] boolean
                        [--host] string
                        [--port] integer
                        [--no-ssl]
                        [--autodeploy] boolean
                        [--ssh-user] string
                        [--ssh-key] file
                        [--ssh-port] int
                        [--use-sudo]
                        [--ignore-host-checking]
                        [--file-system-id] string
                        [--keystore-certificate-alias] string
                        [--keystore-password] string
                        [--keystore-path] string
                        [--keystore-trusted-certificate-alias] string
                        [--keystore-type] string
                        [--default-fs-override] string
                        [--certificate-storage-type] string

Mandatory parameters

info

The Azure Hive agent requires a ADLS Gen2 storage account and container name to generate the correct location for the metadata. The agent doesn't access the container and data isn't written to it.

--name The ID for the new Hive agent.
--db-server-name The Azure SQL database server name.
--database-name The Azure SQL database name.
note
Hive Migrator doesn’t support Azure SQL database names containing blank spaces ( ) or hyphens (-).
--storage-account The name of the ADLS Gen2 storage account.
--container-name The name of the container in the ADLS Gen2 storage account.
--auth-method The Azure SQL database connection authentication method (SQL_PASSWORD, AD_MSI, AD_INTEGRATED, AD_PASSWORD, ACCESS_TOKEN).

Additionally, use only one of the following parameters:

--file-system-id The name of the filesystem that will be associated with this agent (for example: myadls2storage). This will ensure any path mappings are correctly linked between the filesystem and the agent.
--default-fs-override Enter an override for the default filesystem URI instead of a filesystem name (for example: abfss://mycontainer@mystorageaccount.dfs.core.windows.net).

Optional parameters

--client-id The Azure resource's client ID.
--insecure Define an insecure connection (TLS disabled) to the Azure SQL database server (default is false).

Authentication parameters

Select one of the authentication methods listed and include the additional parameters required for the chosen method.

--auth-method The authentication method to connect to the Azure SQL server.
The following methods can be used:
- SQL_PASSWORD - Enter a username and password to access the database.
- AD_MSI - Use a system-assigned or user-assigned managed identity.

Required parameters for SQL_PASSWORD

--database-user The username to access the database.
--database-password The user password to access the database.

Required parameters for AD_MSI

To use this method, complete the following prerequisites:

Data Migrator or the remote Azure Hive agent must be installed on an Azure resource with the managed identity assigned to it. The host must also have Azure AD authentication enabled.
Your Azure SQL server must be enabled for Azure AD authentication.
You have created a contained user in the Azure SQL database that is mapped to the Azure AD resource (where Data Migrator or the remote Azure Hive agent is installed).
- The username of the contained user depends on whether you're using a system-assigned or user-assigned identity.
  Azure SQL database command for a system-assigned managed identity
```
CREATE USER "<azure_resource_name>" FROM EXTERNAL PROVIDER;
ALTER ROLE db_owner ADD MEMBER "<azure_resource_name>";
```
  The <azure_resource_name> is the name of the Azure resource where Data Migrator or the remote Azure Hive agent is installed. For example, myAzureVM).
  Azure SQL database command for a user-assigned managed identity
```
CREATE USER <managed_identity_name> FROM EXTERNAL PROVIDER;
ALTER ROLE db_owner ADD MEMBER <managed_identity_name>;
```
  The <managed_identity_name> is the name of the user-assigned managed identity. For example, myManagedIdentity.

After you complete the prerequisites, see the system-assigned identity or user-assigned identity parameters.

System-assigned identity

No other parameters are required for a system-managed identity.

User-assigned identity

Specify the --client-id parameter:

--client-id The client ID of your Azure managed identity.

Parameters for remote Hive agents only

--host The host where the remote Hive agent will be deployed.
--port The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.
--no-ssl TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it. You can't adjust this parameter after agent creation.

Parameters for TLS/SSL only

--certificate-storage-type The certificate storage type, can be specified as either FILE or KEYSTORE.
--keystore-certificate-alias The alias of the certificate stored in the keystore.
--keystore-password The password assigned to the target keystore.
--keystore-path The path to the target side keystore file
--keystore-trusted-certificate-alias The alias of the trusted certificate chain stored in the keystore.
--keystore-type The type of keystore specified, JKS or PKCS12

Parameters for automated deployment

--autodeploy The remote agent is automatically deployed when this flag is used. If using this, the --ssh-key parameter must also be specified.
--ssh-user The SSH user to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter).
--ssh-key The absolute path to the SSH private key to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter).
--ssh-port The SSH port to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter). Default is port 22.
--use-sudo All commands performed by the SSH user use sudo on the remote host when performing automatic deployment (using the --autodeploy parameter).
--ignore-host-checking Ignore strict host key checking when performing the automatic deployment (using the --autodeploy parameter).

Steps for manual deployment

If you do not wish to use the --autodeploy function, follow these steps to deploy a remote Hive agent for Azure SQL manually:

Transfer the remote server installer to your remote host (Azure VM, HDI cluster node):
Example of secure transfer from local to remote host
```
scp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
```
On your remote host, make the installer script executable:
```
chmod +x hivemigrator-remote-server-installer.sh
```
On your remote host, run the installer as root (or sudo) user in silent mode:
```
./hivemigrator-remote-server-installer.sh -- --silent --config <example config string>
```
Find the --config string from the output of hive agent add azure command.
On your remote host, start the remote server service:
```
service hivemigrator-remote-server start
```
On your local host, run the hive agent add azure command without using --autodeploy and its related parameters to configure your remote Hive agent.
See the Example for remote Azure SQL deployment - manual example below for further guidance.

Examples

Example for local Azure SQL deployment with SQL username/password

hive agent add azure --name azureAgent --db-server-name mysqlserver.database.windows.net --database-name mydb1 --auth-method SQL_PASSWORD --database-user azureuser --database-password mypassword --storage-account myadls2 --container-name mycontainer --file-system-id myadls2storage

Example for remote Azure SQL deployment with System-assigned managed identity - automated

hive agent add azure --name azureRemoteAgent --db-server-name mysqlserver.database.windows.net --database-name mydb1 --auth-method AD_MSI --storage-account myadls2 --container-name mycontainer --file-system-id myadls2storage --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5052

Example for remote Azure SQL deployment with User-assigned managed identity - manual

hive agent add azure --name azureRemoteAgent --db-server-name mysqlserver.database.windows.net --database-name mydb1 --auth-method AD_MSI --client-id b67f67ex-ampl-e2eb-bd6d-client9385id --storage-account myadls2 --container-name mycontainer --file-system-id myadls2storage --host myRemoteHost.example.com --port 5052

remote deployments

For a remote Hive agent connection, enter a remote host (Azure VM instance) to communicate with the local Data Migrator server (constrained to a user-defined port).

A small service is deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.

`hive agent add filesystem`

Add a filesystem Hive agent to migrate your metadata to a specified target filesystem location using the hive agent add filesystem command.

Add filesystem agent
hive agent add filesystem    [--file-system-id] string
                             [--root-folder] string
                             [--name] string

--file-system-id The filesystem ID to be used.
--root-folder The path to use as the root directory for the filesystem agent.
--name The ID to give to the new Hive agent.

Example

hive agent add filesystem --file-system-id myfilesystem --root-folder /var/lib/mysql --name fsAgent

`hive agent add glue`

Add an AWS Glue Hive agent to connect to an AWS Glue data catalog using the hive agent add glue command.

If your Data Migrator host can communicate directly with the AWS Glue Data Catalog, then a local Hive agent will be sufficient. Otherwise, consider using a remote Hive agent.

remote deployments

For a remote Hive agent connection, enter a remote host (EC2 instance) that will be used to communicate with the local Data Migrator server (constrained to a user-defined port).

A small service will be deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.

Add AWS Glue agent
hive agent add glue     [--name] string
                        [--access-key] string
                        [--secret-key] string
                        [--glue-endpoint] string
                        [--aws-region] string
                        [--glue-catalog-id] string
                        [--credentials-provider] string
                        [--glue-max-retries] integer
                        [--glue-max-connections] integer
                        [--glue-max-socket-timeout] integer
                        [--glue-connection-timeout] integer
                        [--file-system-id] string
                        [--default-fs-override] string
                        [--host] string
                        [--port] integer
                        [--no-ssl]
                        [--keystore-certificate-alias] string
                        [--keystore-password] string
                        [--keystore-path] string
                        [--keystore-trusted-certificate-alias] string
                        [--keystore-type] string
                        [--certificate-storage-type] string                        

Glue parameters

--name The ID to give to the new Hive agent.
--glue-endpoint The AWS Glue service endpoint for connections to the data catalog. VPC endpoint types are also supported.
--aws-region The AWS region that your data catalog is located in (default is us-east-1). If --glue-endpoint is specified, this parameter will be ignored.

Additionally, use only one of the following parameters:

--file-system-id The name of the filesystem that will be associated with this agent (for example: mys3bucket). This will ensure any path mappings are correctly linked between the filesystem and the agent.
--default-fs-override Enter an override for the default filesystem URI instead of a filesystem name (for example: s3a://mybucket/).

Glue credential parameters

--credentials-provider The AWS catalog credentials provider factory class.
- If you don't enter this parameter, the default is DefaultAWSCredentialsProviderChain.
- If you enter the --access-key and --secret-key parameters, the credentials provider will automatically default to StaticCredentialsProviderFactory.
--access-key The AWS access key.
--secret-key The AWS secret key.

Glue optional parameters

--glue-catalog-id The AWS Account ID to access the Data Catalog. This is used if the Data Catalog is owned by a different account to the one provided by the credentials provider and cross-account access has been granted.
--glue-max-retries The maximum number of retries the Glue client will perform after an error.
--glue-max-connections The maximum number of parallel connections the Glue client will allocate.
--glue-max-socket-timeout The maximum time the Glue client will allow for an established connection to timeout.
--glue-connection-timeout The maximum time the Glue client will allow to establish a connection.

Parameters for remote Hive agents only

--host The host where the remote Hive agent will be deployed.
--port The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.
--no-ssl TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it. You can't adjust this parameter after agent creation.

Parameters for TLS/SSL only

--certificate-storage-type The certificate storage type, can be specified as either FILE or KEYSTORE.
--keystore-certificate-alias The alias of the certificate stored in the keystore.
--keystore-password The password assigned to the target keystore.
--keystore-path The path to the target side keystore file
--keystore-trusted-certificate-alias The alias of the trusted certificate chain stored in the keystore.
--keystore-type The type of keystore specified, JKS or PKCS12

Steps for remote agent deployment

Follow these steps to deploy a remote Hive agent for AWS Glue:

Transfer the remote server installer to your remote host (Amazon EC2 instance):
Example of secure transfer from local to remote host
```
scp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
```
On your remote host, run the installer as root (or sudo) user in silent mode:
```
./hivemigrator-remote-server-installer.sh -- --silent
```
On your remote host, start the remote server service:
```
service hivemigrator-remote-server start
```
On your local host, run the hive agent add glue command to configure your remote Hive agent.
See the Example for remote AWS Glue agent example below for further guidance.

Examples

Example for local AWS Glue agent

hive agent add glue --name glueAgent --access-key ACCESS6HCFPAQIVZTKEY --secret-key SECRET1vTMuqKOIuhET0HAI78UIPfSRjcswTKEY --glue-endpoint glue.eu-west-1.amazonaws.com --aws-region eu-west-1 --file-system-id mys3bucket

Example for remote AWS Glue agent

hive agent add glue --name glueAgent --access-key ACCESS6HCFPAQIVZTKEY --secret-key SECRET1vTMuqKOIuhET0HAI78UIPfSRjcswTKEY --glue-endpoint glue.eu-west-1.amazonaws.com --aws-region eu-west-1 --file-system-id mys3bucket --host myRemoteHost.example.com --port 5052

`hive agent add hive`

Add a Hive agent to connect to a local or remote Apache Hive Metastore using the hive agent add hive command.

remote deployments

When connecting to a remote Apache Hive Metastore, enter a host on the remote cluster that will be used to communicate with the local Data Migrator server (constrained to a user-defined port).

A small service will be deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.

Add local or remote Hive agent
hive agent add hive 
                        [--autodeploy]
                        [--certificate-storage-type] string
                        [--config-files] string
                        [--config-path] string
                        [--default-fs-override] string
                        [--file-system-id] string
                        [--force-scanning-mode]
                        [--host] string
                        [--ignore-host-checking]
                        [--jdbc-driver-name] string                     
                        [--jdbc-password] string                        
                        [--jdbc-url] string                            
                        [--jdbc-username] string
                        [--kerberos-keytab] string                          
                        [--kerberos-principal] string
                        [--keystore-certificate-alias] string
                        [--keystore-password] string
                        [--keystore-path] string
                        [--keystore-trusted-certificate-alias] string
                        [--keystore-type] string
                        [--name] string
                        [--no-ssl]
                        [--port] integer
                        [--ssh-key] file
                        [--ssh-port] int
                        [--ssh-user] string
                        [--use-sudo]
                        [--certificate-storage-type] string                            

Mandatory parameters

--name The ID to give to the new Hive agent.

Additionally, use only one of the following parameters:

--file-system-id The name of the filesystem that will be associated with this agent (for example: myhdfs). This will ensure any path mappings are correctly linked between the filesystem and the agent.
--default-fs-override Enter an override for the default filesystem URI instead of a filesystem name (for example: hdfs://nameservice01).

Optional parameters

--kerberos-principal Not required if Kerberos is disabled. The Kerberos principal to use to access the Hive service (for example: hive/myhost.example.com@REALM.COM).
--kerberos-keytab Not required if Kerberos is disabled. The path to the Kerberos keytab containing the principal to access the Hive service (for example: /etc/security/keytabs/hive.service.keytab).
--config-path For a local agent for a target metastore or when Hive config is not located in /etc/hive/conf, supply a path containing the hive-site.xml, core-site.xml, and hdfs-site.xml.
--config-files If the configuration files are not located on the same path, use this parameter to enter all the paths as a comma-delimited list. For example, /path1/core-site.xml,/path2/hive-site.xml,/path3/hdfs-site.xml.

When configuring a CDP target

--jdbc-url The JDBC URL for the database.
--jdbc-driver-name Full class name of JDBC driver.
--jdbc-username Full class name of JDBC driver.
--jdbc-password Password for connecting to database.

info

Don't use the optional parameters, --config-path and --config-files in the same add command.
Use --config-path when configuration files are on the same path, or --config-files when the configuration files are on separate paths.

Parameters for remote Hive agents only

--host The host where the remote Hive agent will be deployed.
--port The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.
--no-ssl TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it. You can't adjust this parameter after agent creation.

Parameters for TLS/SSL only

--certificate-storage-type The certificate storage type, can be specified as either FILE or KEYSTORE.
--keystore-certificate-alias The alias of the certificate stored in the keystore.
--keystore-password The password assigned to the target keystore.
--keystore-path The path to the target side keystore file
--keystore-trusted-certificate-alias The alias of the trusted certificate chain stored in the keystore.
--keystore-type The type of keystore specified, JKS or PKCS12

Parameters for automated deployment

Use the following parameters when deploying a remote agent automatically with the --autodeploy flag.

--autodeploy The remote agent will be automatically deployed when this flag is used. If using this, the --ssh-key parameter must also be specified.
--ssh-user The SSH user to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter).
--ssh-key The absolute path to the SSH private key to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter).
--ssh-port The SSH port to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter). Default is port 22.
--use-sudo All commands performed by the SSH user will use sudo on the remote host when performing automatic deployment (using the --autodeploy parameter).
--ignore-host-checking Ignore strict host key checking when performing the automatic deployment (using the --autodeploy parameter).

Steps for manual remote agent deployment

If you do not wish to use the --autodeploy function, follow these steps to deploy a remote Hive agent for Apache Hive manually:

Transfer the remote server installer to your remote host:
Example of secure transfer from local to remote host
```
scp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
```
On your remote host, make the installer script executable:
```
chmod +x hivemigrator-remote-server-installer.sh
```
On your remote host, run the installer as root (or sudo) user in silent mode:
```
./hivemigrator-remote-server-installer.sh -- --silent
```
On your remote host, start the remote server service:
```
service hivemigrator-remote-server start
```
On your local host, run the hive agent add hive command without using --autodeploy and its related parameters to configure your remote Hive agent.
See the Example for remote Apache Hive deployment - manual example below for further guidance.

Example for local Apache Hive deployment

hive agent add hive --name sourceAgent --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@LOCALREALM.COM --file-system-id mysourcehdfs

Example for remote Apache Hive deployment - automated

hive agent add hive --name targetautoAgent --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5052 --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@REMOTEREALM.COM --config-path <example directory path> --file-system-id mytargethdfs

Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.

Example for remote Apache Hive deployment - manual

hive agent add hive --name targetmanualAgent --host myRemoteHost.example.com --port 5052 --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@REMOTEREALM.COM --config-path <example directory path> --file-system-id mytargethdfs

Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.

info

If you enter Kerberos and configuration path information for remote agents, ensure that the directories and Kerberos principal are correct for your chosen remote host (not your local host).

info

When deploying remote agents with JDBC overrides, install the additional JDBC driver (e.g. MYSQL or PostreSQL) within /opt/wandisco/hivemigrator-remote-server/agent/hive/.

info

When deploying remote agents with keystore details, your keystore password will need to be manually entered within /etc/wandisco/hivemigrator-remote-server/agent.yaml.

See the troubleshooting guide for more information.

`hive agent add databricks`

note

Databricks agents are currently available as a preview feature.

info

The source table format must be in one of the following formats to ensure a successful migration to Databricks Delta Lake:

CSV
JSON
Avro
ORC
Parquet
Text

Add a Databricks Hive agent to connect to a Databricks Delta Lake metastore (AWS, Azure, or Google Cloud Platform (GCP)) using the hive agent add databricks command.

Add Databricks agent
hive agent add databricks       [--name] string
                                [--jdbc-server-hostname] string
                                [--jdbc-port] int
                                [--jdbc-http-path] string
                                [--access-token] string
                                [--fs-mount-point] string
                                [--convert-to-delta]
                                [--delete-after-conversion]
                                [--file-system-id] string
                                [--default-fs-override] string
                                [--host] string
                                [--port] integer
                                [--no-ssl]
                                [--catalog] string

Enable JDBC connections to Databricks

The following steps are required to enable Java Database Connectivity (JDBC) to Databricks Delta Lake:

Download the Databricks JDBC driver.
Unzip the package and upload the SparkJDBC42.jar file to the LiveData Migrator host machine.
Move the SparkJDBC42.jar file to the LiveData Migrator directory below:
```
/opt/wandisco/hivemigrator/agent/databricks
```
Change ownership of the Jar file to the HiveMigrator system user and group:
Example for hive:hadoop
```
chown hive:hadoop /opt/wandisco/hivemigrator/agent/databricks/SparkJDBC42.jar
```

Databricks mandatory parameters

--name The ID to give to the new Hive agent.
--jdbc-server-hostname The server hostname for the Databricks cluster (AWS, Azure or GCP).
--jdbc-port The port used for JDBC connections to the Databricks cluster (AWS, Azure or GCP).
--jdbc-http-path The HTTP path for the Databricks cluster (AWS, Azure or GCP).
--access-token The personal access token to be used for the Databricks cluster (AWS, Azure or GCP).

Additionally, use only one of the following parameters:

info

If the --convert-to-delta option is used, the --default-fs-override parameter must also be provided with the value set to dbfs:, or a path inside the Databricks filesystem. For example, dbfs:/mount/externalStorage.

--file-system-id The name of the filesystem that will be associated with this agent (for example: myadls2 or mys3bucket). This will ensure any path mappings are correctly linked between the filesystem and the agent.
--default-fs-override Provide an override for the default filesystem URI instead of a filesystem name (for example: dbfs:).

Databricks optional parameters

--fs-mount-point Define the ADLS/S3/GCP location in the Databricks filesystem for containing migrations (for example: /mnt/mybucketname).

note

This parameter is required if --convert-to-delta is used. The Databricks agent will copy all associated table data and metadata into this location within the Databricks filesystem during conversion.

--convert-to-delta All underlying table data and metadata is migrated to the filesystem location defined by the --fs-mount-point parameter. Use this option to automatically copy the associated data and metadata into Delta Lake on Databricks (AWS, Azure or GCP), and convert tables into Delta Lake format.
The following parameter can only be used if --convert-to-delta has been specified:
- --delete-after-conversion Use this option to delete the underlying table data and metadata from the filesystem location (defined by --fs-mount-point) once it has been converted into Delta Lake on Databricks.
  info
  Only use this option if you are performing one-time migrations for the underlying table data. The Databricks agent does not support continuous (live) updates of table data when transferring to Delta Lake on Databricks.

If a migration to Databricks runs without the --convert-to-delta option, then some migrated data may not be visible from the Databricks side. To avoid this issue, ensure that the value of default-fs-override is set to "dbfs:" with the value of --fs-mount-point.
Example:
```
--default-fs-override dbfs:/mnt/mybucketname    
```

--catalog Enter the name of your Databricks Unity Catalog.
note
You can't update an agent's Unity Catalog while it's in an active migration.

Example

Example for local Databricks agent

hive agent add databricks --name databricksAgent --jdbc-server-hostname mydbcluster.cloud.databricks.com  --jdbc-port 443 --jdbc-http-path sql/protocolv1/o/8445611123456789/0234-125567-testy978 --access-token daexamplefg123456789t6f0b57dfdtoken4 --file-system-id mys3bucket --default-fs-override dbfs:/mnt/mybucketname --fs-mount-point /mnt/mybucket --convert-to-delta --catalog myUnityCatalog

`hive agent add dataproc`

Add a Hive agent to connect to a local or remote Google Dataproc Metastore using the hive agent add dataproc command.

remote deployments

When connecting to a remote Dataproc Metastore, enter a host on the remote cluster that will be used to communicate with the local Data Migrator server (constrained to a user-defined port).

A small service will be deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.

Add local or remote Dataproc agent
hive agent add dataproc [--config-path] string
                        [--kerberos-principal] string
                        [--kerberos-keytab] string
                        [--name] string
                        [--host] string
                        [--port] integer
                        [--no-ssl]
                        [--autodeploy]
                        [--ssh-user] string
                        [--ssh-key] file
                        [--ssh-port] int
                        [--use-sudo]
                        [--ignore-host-checking]
                        [--keystore-certificate-alias] string
                        [--keystore-password] string
                        [--keystore-path] string
                        [--keystore-trusted-certificate-alias] string
                        [--keystore-type] string                        
                        [--file-system-id] string
                        [--default-fs-override] string
                        [--certificate-storage-type] string

Mandatory parameters

--kerberos-principal Not required if Kerberos is disabled. The Kerberos principal to use to access the Hive service (for example: hive/myhost.example.com@REALM.COM).
--kerberos-keytab Not required if Kerberos is disabled. The path to the Kerberos keytab containing the principal to access the Hive service (for example: /etc/security/keytabs/hive.service.keytab).
--name The ID to give to the new Hive agent.

Additionally, use only one of the following parameters:

--file-system-id The name of the filesystem that will be associated with this agent (for example: myhdfs). This will ensure any path mappings are correctly linked between the filesystem and the agent.

Optional parameters

--default-fs-override Enter an override for the default filesystem URI instead of a filesystem name (for example: hdfs://nameservice01).
--config-path The path to the directory containing the Hive configuration files core-site.xml, hive-site.xml and hdfs-site.xml. If not specified, Data Migrator will use the default location for the cluster distribution.

Parameters for remote Hive agents only

--host The host where the remote Hive agent will be deployed.
--port The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.
--no-ssl TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it. You can't adjust this parameter after agent creation.

Parameters for TLS/SSL only

--certificate-storage-type The certificate storage type, can be specified as either FILE or KEYSTORE.
--keystore-certificate-alias The alias of the certificate stored in the keystore.
--keystore-password The password assigned to the target keystore.
--keystore-path The path to the target side keystore file
--keystore-trusted-certificate-alias The alias of the trusted certificate chain stored in the keystore.
--keystore-type The type of keystore specified, JKS or PKCS12

Parameters for automated deployment

--autodeploy The remote agent will be automatically deployed when this flag is used. If using this, the --ssh-key parameter must also be specified.
--ssh-user The SSH user to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter).
--ssh-key The absolute path to the SSH private key to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter).
--ssh-port The SSH port to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter). Default is port 22.
--use-sudo All commands performed by the SSH user will use sudo on the remote host when performing automatic deployment (using the --autodeploy parameter).
--ignore-host-checking Ignore strict host key checking when performing the automatic deployment (using the --autodeploy parameter).

Steps for manual deployment

If you do not wish to use the --autodeploy function, follow these steps to deploy a remote Hive agent for Apache Hive manually:

Transfer the remote server installer to your remote host:
Example of secure transfer from local to remote host
```
scp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
```
On your remote host, make the installer script executable:
```
chmod +x hivemigrator-remote-server-installer.sh
```
On your remote host, run the installer as root (or sudo) user in silent mode:
```
./hivemigrator-remote-server-installer.sh -- --silent
```
On your remote host, start the remote server service:
```
service hivemigrator-remote-server start
```
On your local host, run the hive agent add dataproc command without using --autodeploy and its related parameters to configure your remote Hive agent.
See the Example for remote Apache Hive deployment - manual example below for further guidance.

Examples

Example for local Apache Hive deployment

hive agent add dataproc --name sourceAgent --file-system-id mysourcehdfs

Example for remote Apache Hive deployment - automated

hive agent add dataproc --name targetautoAgent --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5052 --config-path <example directory path> --file-system-id mytargethdfs

Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.

Example for remote Apache Hive deployment - manual

hive agent add dataproc --name targetmanualAgent --host myRemoteHost.example.com --port 5052 --config-path <example directory path> --file-system-id mytargethdfs

Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.

note

If you enter Kerberos and configuration path information for remote agents, ensure that the directories and Kerberos principal are correct for your chosen remote host (not your local host).

`hive agent add snowflake basic`

Add an agent using basic authentication.

Add a Snowflake agent using basic authentication
hive agent add snowflake basic  [--account-identifier] string    
                                [--file-system-id] string
                                [--name ] string
                                [--password] string               
                                [--stage] string                 
                                [--stage-schema] string            
                                [--warehouse] string  
                                [--default-fs-override] string              
                                [--schema] string                  
                                [--stage-database] string         
                                [--user] string
                                [--network-timeout] int
                                [--query-timeout] int
                                [--role] string   

Mandatory parameters

--account-identifier is the unique ID for your Snowflake account.
--name is a name that will be used to reference the remote agent.
--warehouse is the Snowflake-based cluster of compute resources.
--stage storage used to temporarily store data that is being moved into Snowflake. A stage can be internal (to Snowflake), or external, using a cloud storage service from Amazon S3, Azure, or Google Cloud.
--user is your Snowflake username.

Additionally, use only one of the following parameters:

--file-system-id is the ID of the target filesystem.
--default-fs-override is an override for the default filesystem URI instead of a filesystem name.

Optional parameters

--stage-database is an optional parameter for a Snowflake stage database with the default value "WANDISCO".
--stage-schema - is an optional parameter for a Snowflake stage schema, with the default value "PUBLIC".
--schema - is an optional parameter for a Snowflake schema, with the default value "PUBLIC".
--role - you can enter a custom role for the JDBC connection used by Hive Migrator.

Timeout parameters

--network-timeout - Number of milliseconds to wait for a response when interacting with the Snowflake service before returning an error.
--query-timeout - Number of seconds to wait for a query to complete before returning an error.

Examples

Example of adding a Snowflake agent with basic authentication

hive agent add snowflake basic --account-identifier test_adls2 --name snowflakeAgent --stage myAzure --user exampleUser -- password examplePassword --warehouse DemoWH2

`hive agent add snowflake privatekey`

Add a Snowflake agent using a private key
hive agent add snowflake privatekey     [--account-ID] string    
                                        [--file-system-id] string         
                                        [--private-key-file]  string
                                        [--private-key-file-pwd]  string       
                                        [--schema]   string                
                                        [--stage-database]  string         
                                        [--warehouse]  string
                                        [--default-fs-override] string   
                                        [--name]  string                          
                                        [--stage]  string                   
                                        [--stage-schema]  string   
                                        [--user] string

Mandatory parameters

--account-identifier is the unique ID for your Snowflake account.
--private-key-file is the path to your private key file.
--private-key-file-pwd is the password that corresponds with the above private-key-file.
--name is a name that will be used to reference the remote agent.
--warehouse is the Snowflake-based cluster of compute resources.
--stage storage used to temporarily store data that is being moved into Snowflake. A stage can be internal (to Snowflake), or external, using a cloud storage service from Amazon S3, Azure, or Google Cloud.
--user is your Snowflake username.

Additionally, use only one of the following parameters:

--file-system-id is the ID of the target filesystem.
--default-fs-override is an override for the default filesystem URI instead of a filesystem name.

Optional parameters

--stage-database is an optional parameter for a Snowflake stage database with the default value "WANDISCO".
--stage-schema - is an optional parameter for a Snowflake stage schema, with the default value "PUBLIC".
--schema - is an optional parameter for a Snowflake schema, with the default value "PUBLIC".

`hive agent check`

Check the configuration of an existing Hive agent using hive agent check.

Check if agent configuration is valid & connectable

hive agent check [--name] string

Example

hive agent check --name azureAgent

`hive agent configure azure`

Change the configuration of an existing Azure Hive agent using hive agent configure azure.

The parameters that can be changed are the same as the ones listed in the hive agent add azure section.

All parameters are optional except --name, which is required to enter the existing Hive agent that you wish to configure.

Example

hive agent configure azure --name azureAgent --database-password CorrectPassword

`hive agent configure filesystem`

Change the configuration of an existing filesystem Hive agent using hive agent configure filesystem.

The parameters that can be changed are the same as the ones listed in the hive agent add filesystem section.

All parameters are optional except --name, which is required to enter the existing Hive agent that you wish to configure.

Example

hive agent configure filesystem --name fsAgent --root-folder /user/dbuser/databases

`hive agent configure glue`

Change the configuration of an existing AWS Glue Hive agent using hive agent configure glue.

The parameters that can be changed are the same as the ones listed in the hive agent add glue section.

All parameters are optional except --name, which is required to enter the existing Hive agent that you wish to configure.

Example

hive agent configure glue --name glueAgent --aws-region us-east-2

`hive agent configure hive`

Change the configuration of an existing Apache Hive agent using hive agent configure hive.

The parameters that can be changed are the same as the ones listed in the hive agent add hive section.

All parameters are optional except --name, which is required to enter the existing Hive agent that you wish to configure.

Example

hive agent configure hive --name sourceAgent --kerberos-keytab /opt/keytabs/hive.keytab --kerberos-principal hive/myhostname.example.com@REALM.COM

`hive agent configure databricks`

Change the configuration of an existing Databricks agent using hive agent configure databricks.

The parameters that can be changed are the same as the ones listed in the hive agent add databricks section.

All parameters are optional except --name, which is required to enter the existing Hive agent that you wish to configure.

note

You can't update an agent's Unity Catalog while it's in an active migration.

Example

hive agent configure hive --name databricksAgent --access-token myexamplefg123456789t6fnew7dfdtoken4

`hive agent configure dataproc`

Change the configuration of an existing Dataproc agent using hive agent configure dataproc.

The parameters that can be changed are the same as the ones listed in the hive agent add dataproc section.

All parameters are optional except --name, which is required to enter the existing Hive agent that you wish to configure.

Example

hive agent configure dataproc --name dataprocAgent --port 9099

`hive agent configure snowflake`

Configure an existing Snowflake remote agent by using the hive agent configure snowflake command.

Add a remote Snowflake agent using basic authentication
hive agent configure snowflake basic    [--account-identifier] string   
                                        [--file-system-id] string
                                        [--user] string     
                                        [--password] string               
                                        [--stage] string                  
                                        [--stage-schema] string           
                                        [--warehouse] string
                                        [--default-fs-override] string    
                                        [--name] string                  
                                        [--schema] string                 
                                        [--stage-database] string        

Example Snowflake remote agent configuration

hive agent configure snowflake basic --user snowflakeAgent --password <password-here> --stage internal

Configure a remote Snowflake agent using privatekey authentication
hive agent configure snowflake privatekey       [--account-identifier] string     
                                                [--file-system-id] string    
                                                [--private-key-file] string
                                                [--private-key-file-pwd] string        
                                                [--schema] string                   
                                                [--stage-database] string           
                                                [--warehouse] string  
                                                [--default-fs-override] string     
                                                [--name] string                     
                                                [--stage] string        
                                                [--stage-schema] string

Example Snowflake remote agent configuration

hive agent configure snowflake privatekey --private-key-file-pwd <password> --private-key-file /path/to/keyfiles/ --user snowflakeAgent --schema star-schema

`hive agent delete`

Delete the specified Hive agent with hive agent delete.

Delete agent

hive agent delete [--name] string

Example

hive agent delete --name azureAgent

`hive agent list`

List configured Hive agents with hive agent list.

List already added agents

hive agent list [--detailed]

Example

hive agent list --detailed

`hive agent show`

Show the configuration of a Hive agent with hive agent show.

Show agent configuration

hive agent show [--name] string

Example

hive agent show --name azureAgent

`hive agent types`

Print a list of supported Hive agent types with hive agent types.

Print list of supported agent types

hive agent types

Example

hive agent types

Exclusion commands

`exclusion add date`

Create a date-based exclusion that checks the 'modified date' of any directory or file that the Data Migrator encounters during a migration to which the exclusion has been applied. If the path or file being examined by Data Migrator has a 'modified date' earlier than the specified date, it will be excluded from the migration.

Once associated with a migration using migration exclusion add, files that match the policy will not be migrated.

Create a new date-based rule
exclusion add date      [--exclusion-id] string
                        [--description] string
                        [--before-date] string

Mandatory parameters

--exclusion-id The ID for the exclusion policy.
--description A user-friendly description for the policy.
--before-date An ISO formatted date and time, which can include an offset for a particular time zone.

Example

exclusion add date --exclusion-id beforeDate --description "Files earlier than 2020-10-01T10:00:00PDT" --before-date 2020-10-01T10:00:00-07:00

`exclusion add file-size`

Create an exclusion that can be applied to migrations to constrain the files transferred by a policy based on file size. Once associated with a migration using migration exclusion add, files that match the policy will not be migrated.

Create a new exclusion by file size policy
exclusion add file-size [--exclusion-id] string
                        [--description] string
                        [--value] long
                        [--unit] string

Mandatory parameters

--exclusion-id The ID for the exclusion policy.
--description A user-friendly description for the policy.
--value The numerical value for the file size, in a unit defined by the --unit parameter.
--unit A string to define the unit used. You can use B for bytes, GB for gigabytes, KB for kilobytes, MB for megabytes, PB for petabytes, TB for terabytes, GiB for gibibytes, KiB for kibibytes, MiB for mebibytes, PiB for pebibytes, or TiB for tebibytes when creating exclusions with the CLI.

Example

exclusion add file-size --exclusion-id 100mbfiles --description "Files greater than 100 MB" --value 100 --unit MB

`exclusion add regex`

Create an exclusion using a regular expression to prevent certain files and directories being transferred based on matching file or directory names. Once associated with a migration using migration exclusion add, files and directories that match the regular expression will not be migrated.

Create a new exclusion by regular expression policy
exclusion add regex     [--exclusion-id] string
                        [--description] string
                        [--regex] string
                        [--type] string

Mandatory parameters

--exclusion-id The ID for the exclusion policy.
--description A user-friendly description for the policy.
--regex A regular expression in a syntax of either Java PCRE, Automata or GLOB type.

Optional parameters

--type Choose the regular expression syntax type. There are three options available:
1. JAVA_PCRE (default)
2. AUTOMATA
3. GLOB

Examples

Example glob pattern

exclusion add regex --description "No paths or files that start with test" --exclusion-id exclusion1 --type GLOB --regex test*

Example Java PCRE pattern

exclusion add regex --description "No paths of files that start with test" --exclusion-id exclusion1 --regex ^test\.*

Using backslash characters within `--regex` parameter

If you wish to use a \ character as part of your regex value, you must escape this character with an additional backslash.

Example

exclusion add regex --description "No paths that start with a backslash followed by test"  --exclusion-id exclusion2 --regex ^\\test\.*

The response displayed if running through the CLI will not hide the additional backslash. However, the internal representation will be as expected within Data Migrator (it will read as ^\test.*).

This workaround isn't required for API inputs, as it only affects the Spring Shell implementation used for the CLI.

`exclusion delete`

Delete an exclusion policy so that it is no longer available for migrations.

exclusion delete [--exclusion-id] string

Mandatory parameters

--exclusion-id The ID for the exclusion policy to delete.

Example

exclusion delete --exclusion-id exclusion1

`exclusion list`

List all exclusion policies defined.

List all exclusion policies

exclusion list - List all exclusion rules.

`exclusion show`

Get details for an individual exclusion policy by ID.

Get details for a specific exclusion rule

exclusion show [--exclusion-id] string

Mandatory parameters

--exclusion-id The ID for the exclusion policy to show.

Example

exclusion show --exclusion-id 100mbfiles

Migration commands

`migration add`

Create a new migration to initiate data migration from your source filesystem.
migration add   [--name or --migration-id] string
                [--path] string
                [--target] string
                [--exclusions] string
                [--priority] string
                [--action-policy] string
                [--auto-start]
                [--source] string
                [--scan-only]
                [--verbose]
                [--detailed]
                [--recurring-migration]
                [--recurring-period]
                [--priority]  

caution

Do not write to target filesystem paths when a migration is underway. This could interfere with Data Migrator functionality and lead to undetermined behavior.

Use different filesystem paths when writing to the target filesystem directly (and not through Data Migrator).

Mandatory parameters

--path Defines the source filesystem directory that is the scope of the migration. All content (other than that excluded) will be migrated to the target.

note

ADLS Gen2 has a filesystem restriction of 60 segments. Make sure your path has less than 60 segments when defining the path string parameter.

--target Specifies the name of the target filesystem resource to which migration will occur.

Optional parameters

--name or --migration-id Enter a name or ID for the new migration. An ID is auto-generated if you don't enter one.
--exclusions A comma-separated list of exclusions by name.
--auto-start Enter this parameter if you want the migration to start immediately. If you don't enter one, the migration will only take effect once you start to run it.
--priority Enter this parameter with a value of high, normal, or low to assign a priority to your migration. Higher-priority migrations are processed first. If not specified, migration priority defaults to normal.
--action-policy This parameter determines what happens if the migration encounters content in the target path with the same name and size.
There are two options available:
1. com.wandisco.livemigrator2.migration.OverwriteActionPolicy (default policy)
  Every file is replaced, even if the file size is identical on the target storage. This option is incompatible with the --recurring-migration option. Use the SkipIfSizeMatchActionPolicy parameter instead.
2. com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy
  If the file size is identical between the source and target, the file is skipped. If it’s a different size, the whole file is replaced.
--source Specifies the name of the source filesystem.
--scan-only Enter this parameter to create a one-time migration.
--verbose Enter this parameter to add additional information to the output for the migration.
--detailed Alternative name for --verbose.
--recurring-migration Add this parameter to enable periodic rescanning of the migration.
--recurring-period Enter a period to schedule the time between migration scan iterations. For example, 12H(hours) or 30M (minutes).

Example

migration add --path /repl1 --target mytarget –-migration-id myNewMigration --exclusions 100mbfiles

`migration delete`

Delete a stopped migration resource.

Delete a migration

migration delete [--name or --migration-id] string

Mandatory parameters

--name or --migration-id The migration name or ID to delete.

Example

migration delete --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e

`migration exclusion add`

Associate an exclusion resource with a migration so that the exclusion policy applies to items processed for the migration. Exclusions must be associated with a migration before they take effect.

Add an exclusion to a migration

migration exclusion add [--name or --migration-id] string
                        [--exclusion-id] string

Mandatory parameters

--name or --migration-id The migration name or ID with which to associate the exclusion.
--exclusion-id The ID of the exclusion to associate with the migration.

Example

migration exclusion add --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e --exclusion-id myexclusion1

`migration exclusion delete`

Remove an exclusion from association with a migration so that its policy no longer applies to items processed for the migration.

Remove an exclusion from a migration

migration exclusion delete      [--name or --migration-id] string
                                [--exclusion-id] string

Mandatory parameters

--name or --migration-id The migration name or ID from which to remove the exclusion.
--exclusion-id The ID of the exclusion to remove from the migration.

Example

migration exclusion delete --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e --exclusion-id myexclusion1

`migration list`

Present the list of all migrations defined.

List running and active migrations

migration list [--detailed or --verbose]

Optional parameters

--detailed or --verbose Returns additional information about each migration.

`migration path status`

View all actions scheduled on a source filesystem in the specified path.

Show information on the migration status of a path on the source filesystem

migration path status   [--source-path] string
                        [--source] string

Mandatory parameters

--source-path The path on the filesystem to review actions for. Supply a full directory.
--source The filesystem ID of the source system the path is in.

Example

migration path status --source-path /root/mypath/ --source mySource

`migration pending-region add`

Add a path for rescanning to a migration.

Add a path for rescanning to a migration
migration pending-region add    [--name or --migration-id] string
                                [--path] string
                                [--action-policy] string

Mandatory parameters

--name or --migration-id The migration name or ID.
--path The path string of the region to add for rescan.

Optional parameters

--action-policy This parameter determines what happens if the migration encounters content in the target path with the same name and size.
There are two options available:
1. com.wandisco.livemigrator2.migration.OverwriteActionPolicy (default policy)
  Every file is replaced, even if file size is identical on the target storage.
2. com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy
  If the file size is identical between the source and target, the file is skipped. If it’s a different size, the whole file is replaced.

Example

migration pending-region add --name myMigration --path etc/files --action-policy com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy

`migration reset`

Reset a stopped migration to the state it was in before it was started. This deletes and replaces it with a new migration that has the same settings as the old one.

Reset a migration
migration reset  [--name or --migration-id] string
                 [--action-policy] string
                 [--reload-mappings]   
                 [--detailed or --verbose]

Mandatory parameters

--name or --migration-id The name of the migration you want to reset.
--migration-id The ID of the migration you want to reset.

Optional parameters

--action-policy Accepts two string values: com.wandisco.livemigrator2.migration.OverwriteActionPolicy causes the new migration to re-migrate all files from scratch, including those already migrated to the target filesystem, regardless of file size. com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy skips migrating files that exist on both the target and source, if the file size is consistent between them. Use tab auto-completion with this parameter to view both options and a short description of each.
--reload-mappings Resets the migration's path mapping configuration, using the newest default path mapping configuration for Data Migrator.
--detailed or --verbose Returns additional information about the reset migration, similarly to migration show.

Example

migration reset --name mymigration

`migration resume`

Resume a migration that you've stopped from transferring content to its target.

Resume a migration

migration resume        [--name or --migration-id] string
                        [--detailed or --verbose]

Mandatory parameters

--name or --migration-id The migration name or ID to resume.

Example

migration resume --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e

`migration run`,`migration start`

Start a migration that was created without the --auto-start parameter.

Start a migration.

migration run   [--name or --migration-id] string
                [--detailed or --verbose]

Mandatory parameters

--name or --migration-id The migration name or ID to run.

Optional parameters

--detailed or --verbose Outputs additional information about the migration.

Example

migration run –-migration-id myNewMigration

`migration show`

Enter a JSON description of a specific migration.

Get migration details

migration show  [--name or --migration-id] string
                [--detailed or --verbose]

Mandatory parameters

--name or --migration-id The migration name or ID to show.

Optional parameters

--detailed or --verbose Outputs additional information about the migration.

Example

migration show --name myNewMigration

`migration stop`

Stop a migration from transferring content to its target, placing it into the STOPPED state. Stopped migrations can be resumed.

Stop a migration

migration stop [--name or --migration-id] string

Mandatory parameters

--name or --migration-id The migration name or ID to stop.

Example

migration stop --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e

`migration stop all`

Stop all migrations from transferring content to their targets, placing them into the STOPPED state. Stopped migrations can be resumed.

Stop all migrations

migration stop all

`migration verification start`

Trigger a new verification for a migration.

Trigger a new verification
migration verification start    [--name or --migration-id] string  
                                [--depth] integer
                                [--date] string
                                [--paths] string

Mandatory parameters

--name or --migration-id Enter the migration name or ID.
--depth Enter a number to specify how deep in the directory you want to run the verification check. This number must be equal to or less than the total number of levels in the directory structure of your migration. The default value is zero. Zero means there's no limit to the verification depth.
--date Enter a date and time as a verification cutoff point in YYYY-MM-DD-THH:MM format.
--paths Enter a comma-separated list of paths to verify.

Example

Trigger a new verification for a migration

migration verification start --migration-id myNewMigration --depth 0 --date 2022-11-15T16:24 --paths /MigrationPath

note

The verification status will show the number of missing paths and files on the target filesystem and the number of file size mismatches between the source and target. You can view the verification status using migration verification show for individual verification jobs or migration verification list for all verification jobs.

`migration verification list`

List summaries for all or specified verifications.

List summaries for all or specified verifications

migration verification list    [--name or --migration-id] string  
                               [--states] string

Optional parameters

--name or --migration-id Enter the migration name or ID. If not specified, the default is to display summaries for all verifications. You can enter multiple migration names or IDs in a comma-separated list.
--states Enter the migration state(s) (IN_PROGRESS, QUEUED, COMPLETED, or CANCELLED) for which you want to list summaries.

Examples

List summaries for all verifications.

migration verification list

List in-progress and queued verification summaries for a specific migration

migration verification list --migration-id myNewMigration --states IN_PROGRESS,QUEUED

`migration verification show`

Show the status of a specific migration verification.

Show a verification job for a migration

migration verification show [--verification-id] string

Mandatory parameters

--verification-id Show the status of the verification job for this verification ID (only one verification job can be running per migration).

Example

Example status of a completed verification
WANdisco LiveData Migrator >> migration verification show --verification-id 91c79b1b-c61f-4c39-be61-18072ac3a086-1676979356465
{
  "id": "91c79b1b-c61f-4c39-be61-18072ac3a086-1676979356465",
  "migrationId": "ver1",
  "migrationInternalId": "91c79b1b-c61f-4c39-be61-18072ac3a086",
  "status": "COMPLETE",
  "createdTimestamp": 1676979356467,
  "startedTimestamp": 1676979356518,
  "finishedTimestamp": 1676979356598,
  "createdAt": "2023-02-21T11:35:56.467Z",
  "startedAt": "2023-02-21T11:35:56.518Z",
  "finishedAt": "2023-02-21T11:35:56.598Z",
  "paths": [
    "/DATA/d1"
  ],
  "ignoreAfterTimestamp": 1676978431233,
  "originalPaths": [
    "/DATA/d1"
  ],
  "verificationDepth": 0,
  "filesOnSource": 1,
  "directoriesOnSource": 0,
  "bytesOnSource": 842,
  "filesExcluded": 0,
  "filesExcludedExistsOnTarget": 0,
  "filesExcludedNotExistsOnTarget": 0,
  "dataExcluded": 0,
  "bytesExcluded": 0,
  "bytesExcludedExistsOnTarget": 0,
  "bytesExcludedNotExistsOnTarget": 0,
  "directoriesExcluded": 0,
  "directoriesExcludedExistsOnTarget": 0,
  "directoriesExcludedNotExistsOnTarget": 0,
  "filesOnTarget": 1,
  "directoriesOnTarget": 0,
  "bytesOnTarget": 842,
  "filesMissingOnTarget": 0,
  "directoriesMissingOnTarget": 0,
  "filesMissingOnSource": 0,
  "directoriesMissingOnSource": 0,
  "fileSizeMismatches": 0,
  "totalDiscrepancies": 0
}

`migration verification stop`

Stop a queued or in-progress migration verification.

Stop a migration verification

migration verification stop  [--verification-id] string

Mandatory parameters

--verification-id Enter the ID of the verification that has been started, for example db257c03-697b-48a5-93cc-abc23838d37d-1668593022565. You can find the verification ID in the output of the migration verification list command.

Example

Stop a migration verification

migration verification stop --verification-id ab123c03-697b-48a5-93cc-abc23838d37d-1668593022565

`migration verification report`

Download a full verification report.

Download a verification report

migration verification report   [--verification-id] string  
                                [--out-dir] string

Mandatory parameters

--verification-id Enter the ID of the verification for which you want to download a report. You can find the verification ID in the output of the migration verification list command.
--out-dir Enter your chosen folder for the report download.

Examples

Download a verification report

migration verification report --verification-id ab123c03-697b-48a5-93cc-abc23838d37d-1668593022565 --out-dir /user/exampleVerificationDirectory

`status`

Get a text description of the overall status of migrations. Information is provided on the following:

Total number of migrations defined.
Average bandwidth being used over 10s, 60s, and 300s intervals.
Peak bandwidth observed over 300s interval.
Average file transfer rate per second over 10s, 60s, and 300s intervals.
Peak file transfer rate per second over a 300s interval.
List of migrations, including one-time migrations, with source path and migration id, and with current progress broken down by migration state: completed, live, stopped, running and ready.

Get migration status
status  [--diagnostics]  
        [--migrations]  
        [--network]  
        [--transfers]  
        [--watch]  
        [--refresh-delay] int  
        [--full-screen]

Optional parameters

--diagnostics Returns additional information about your Data Migrator instance and its migrations, useful for troubleshooting.
--migrations Displays information about each running migration.
--network Displays file transfer throughput in Gib/s during the last 10 seconds, 1 minute and 30 minutes.
--transfers Displays overall performance information about data transfers across the last 10 seconds, 1 minute and 30 minute intervals.
--watch Auto-refresh.
--refresh-delay Auto-refresh interval (in seconds).
--full-screen Auto-refresh fullscreen

Examples

Status
WANdisco LiveMigrator >> status

Network             (10s)       (1m)       (30m)

Average Throughput: 10.4 Gib/s  9.7 Gib/s  10.1 Gib/s
Average Files/s:    425         412        403

11 Migrations                                     dd:hh:mm  dd:hh:mm

Complete: 1         Transferred         Excluded  Duration
 /static1   5a93d5        67.1 GiB       2.3 GiB  00:12:34

Live:     3         Transferred         Excluded  Duration
 /repl1     9088aa       143.2 GiB      17.3 GiB  00:00:34
 /repl_psm1 a4a7e6       423.6 TiB       9.6 GiB  02:05:29
 /repl5     ab140d       118.9 GiB       1.2 GiB  00:00:34

Running:  5         Transferred         Excluded  Duration  Remaining
 /repl123   e3727c  30.3/45.2 GiB 67%    9.8 GiB  00:00:34  00:00:17
 /repl2     88e4e7  26.2/32.4 GiB 81%    0.2 GiB  00:01:27  00:00:12
 /repl3     372056   4.1/12.5 GiB 33%    1.1 GiB  00:00:25  00:01:05
 /repl4     6bc813  10.6/81.7 TiB  8%   12.4 GiB  00:04:21  01:02:43
 /replxyz   dc33cb   2.5/41.1 GiB  6%    6.5 GiB  01:00:12  07:34:23

Ready:    2
 /repl7     070910  543.2 GiB
 /repltest  d05ca0  7.3 GiB

WANdisco LiveMigrator >> status

Status with --transfers
WANdisco LiveMigrator >> status --transfers

Files (10s) (1m) (30m)

Average Migrated/s: 362 158 4781
< 1 KB 14 27 3761
< 1 MB 151 82 0
< 1 GB 27 1 2
< 1 PB 0 0 0
< 1 EB 0 0 0

Peak Migrated/s: 505 161 8712
< 1 KB 125 48 7761
< 1 MB 251 95 4
< 1 GB 29 7 3
< 1 PB 0 0 0
< 1 EB 0 0 0

Average Scanned/s: 550 561 467
Average Rescanned/s: 24 45 56
Average Excluded/s: 7 7 6

Status with --diagnostics
WANdisco LiveMigrator >> status --diagnostics

Uptime: 0 Days 1 Hours 23 Minutes 24 Seconds
SystemCpuLoad: 0.1433 ProcessCpuLoad: 0.0081
JVM GcCount: 192 GcPauseTime: 36 s (36328 ms)
OS Connections: 1, Tx: 0 B, Rx: 0 B, Retransmit: 0
Transfer Bytes (10/30/300s): 0.00 Gib/s, 0.00 Gib/s, 0.00 Gib/s
Transfer Files (10/30/300s): 0.00/s 0.00/s 0.00/s
Active Transfers/pull.threads: 0/100
Migrations: 0 RUNNING, 4 LIVE, 0 STOPPED
Actions Total: 0, Largest: "testmigration" 0, Peak: "MyMigration" 1
PendingRegions Total: 0 Avg: 0, Largest: "MyMigration" 0
FailedPaths Total: 0, Largest: "MyMigration" 0
File Transfer Retries Total: 0, Largest: "MyMigration" 0
Total Excluded Scan files/dirs/bytes: 26, 0, 8.1 MB
Total Iterated Scan files/dirs/bytes: 20082, 9876, 2.7 GB
EventsBehind Current/Avg/Max: 0/0/0, RPC Time Avg/Max: 4/8
EventsQueued: 0, Total Events Added: 504
Transferred File Size Percentiles:
 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B
Transferred File Transfer Rates Percentiles per Second:
 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B
Active File Size Percentiles:
 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B
Active File Transfer Rates Percentiles per Second:
 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B

Hive migration commands

`hive migration add`

Create a new Hive migration to initiate metadata migration from your source Metastore.

info

Create hive rules before initiating a Hive migration to enter which databases and tables are migrated.

Create new migration
hive migration add      [--source] string
                        [--target] string
                        [--name] string
                        [--auto-start]
                        [--once]
                        [--rule-names] list

Mandatory parameters

--source The name of the Hive agent for the source of migration.
--target The name of the Hive agent for the target of migration.
--rule-names The rule name or list of rule names to use with the migration. Multiple rules need to be comma-separated (for example: rule1,rule2,rule3).

tip

Metadata rules determine the scope of a migration, you need to add rules before creating your metadata migration.

Optional parameters

--name The name to identify the migration with.
--auto-start Enter this parameter to start the migration immediately after creation.
--once Enter this parameter to perform a one-time migration, and not continuously scan for new or changing metadata.

Example

hive migration add --source sourceAgent --target remoteAgent --rule-names test_dbs,user_dbs --name hive_migration --auto-start

note

Auto-completion of the --rule-names parameter will not work correctly if it is added at the end of the Hive migration parameters. See the troubleshooting guide for workarounds.

`hive migration delete`

Delete a Hive migration.

note

A Hive migration must be stopped before it can be deleted. This can be achieved by using the --force-stop parameter with this command.

Delete migration from the list, migration should be stopped

hive migration delete [--name] string  [--force-stop]

Example

hive migration delete --name hive_migration --force-stop

`hive migration list`

List all Hive migrations.

print a list of all migrations

hive migration list

`hive migration pause`

Pause a Hive migration. Use the --names flag with a comma-separated list of migration names to pause multiple Hive migrations.

Pause a Hive migration

hive migration pause --names hmig1,hmig2

`hive migration pause all`

Pause all Hive migrations.

Pause all Hive migrations

hive migration pause all

`hive migration reset`

Reset a stopped Hive migration. This returns the migration to a CREATED state.

Reset a Hive migration

hive migration reset    [--names] string
                        [--force-stop]

note

A Hive migration must be stopped before it can be reset. This can be achieved by using the --force-stop parameter with this command.

info

The reset migration will use the latest agent settings.

For example, if the target agent’s Default Filesystem Override setting was updated after the original migration started, the reset migration will use the latest Default Filesystem Override value.

To reset multiple Hive migrations, use a comma-separated list of migration names with the --names parameter.

Example

Reset a Hive migration

hive migration reset --names hive_migration1

Stop and reset a list of migrations

hive migration reset --force-stop --names hive_migration1,hive_migration2

`hive migration reset all`

See the hive migration reset command. Reset all Hive migrations.

Reset all Hive migrations

hive migration reset all

`hive migration resume`

Resume STOPPED, PAUSED or FAILED Hive migrations. Use the --names flag with a comma-separated list of migration names to resume multiple Hive migrations.

Resume a Hive migration

hive migration resume --names Hmig1

`hive migration resume all`

Resume all STOPPED, PAUSED or FAILED Hive migrations.

Resume all Hive migrations

hive migration resume all

`hive migration show`

Display information about a Hive migration.

Show info about specific migration

hive migration show

`hive migration start`

Start a Hive migration or a list of Hive migrations (comma-separated).

note

Enter the --once parameter to perform a one-time migration, and not continuously scan for new or changing metadata.

Start migration

hive migration start [--names] list  [--once]

Example

hive migration start --names hive_migration1,hive_migration2

`hive migration start all`

Start all Hive migrations.

note

Enter the --once parameter to perform a one-time migration, and not continuously scan for new or changing metadata.

Start migration

hive migration start all [--once]

Example

hive migration start all --once

`hive migration status`

Show the status of a Hive migration or a list of Hive migrations (comma-separated).

Show migration status

hive migration status [--names] list

Example

hive migration status --names hive_migration1,hive_migration2

`hive migration status all`

Show the status of all Hive migrations.

Start migration

hive migration status all

Example

hive migration status all

`hive migration stop`

Stop a running hive migration or a list of running hive migrations (comma-separated).

Stop running migration

hive migration stop [--names] list

Example

hive migration stop --names hive_migration1,hive_migration2

`hive migration stop all`

Stop all running Hive migrations.

Stop all running migrations

hive migration stop all

Example

hive migration stop all

Path mapping commands

`path mapping add`

Create a path mapping that allows you to define a alternative target path for a specific target filesystem. These will be automatically applied to a new migration.

When path mapping isn't used, the source path is created on the target filesystem.

note

Path mappings can't be applied to existing migrations. Delete and recreate a migration if you want a path mapping to apply.

Create a new path mapping
path mapping add        [--path-mapping-id] string
                        [--source-path] string
                        [--target] string
                        [--target-path] string
                        [--description] string

Mandatory parameters

--source-path The path on the source filesystem.
--target The target filesystem id (value defined for the --file-system-id parameter).
--target-path The path for the target filesystem.
--description Description of the path mapping enclosed in quotes ("text").

Optional parameters

--path-mapping-id An ID for this path mapping. An ID will be auto-generated if you don't enter one.

Example

Example for HDP to HDI - default Hive warehouse directory

path mapping add --path-mapping-id hdp-hdi --source-path /apps/hive/warehouse --target mytarget --target-path /hive/warehouse --description "HDP to HDI - Hive warehouse directory"

`path mapping delete`

Delete a path mapping.

note

Deleting a path mapping will not affect any existing migrations that have the path mapping applied. Delete and recreate a migration if you no longer want a previous path mapping to apply.

Delete a path mapping

path mapping delete [--path-mapping-id] string

Mandatory parameters

--path-mapping-id The ID of the path mapping.

Example

path mapping delete --path-mapping-id hdp-hdi

`path mapping list`

List all path mappings.

List all path mappings

path mapping list [--target] string

Optional parameters

--target List path mappings for the specified target filesystem id.

Examples

Example for listing all path mappings

path mapping list --target hdp-hdi

Example for listing path mappings for a specific target

path mapping list --target hdp-hdi

`path mapping show`

Show details of a specified path mapping.

Get path mapping details

path mapping show [--path-mapping-id] string

Optional parameters

--path-mapping-id The ID of the path mapping.

Example

path mapping show --path-mapping-id hdp-hdi

Built-in commands

`clear`

Clear the shell screen. You can also type ctrl-L

clear

`echo`

Prints whatever text that you write to the console. This can be used to sanity check a command before running it (for example: echo migration add --path /repl1 --target mytarget –-migration-id myNewMigration --exclusions 100mbfiles).

Print message

echo [--message] string

`exit`, `quit`

Entering either exit or quit will stop operation of Data Migrator when it is run from the command line. All processing will cease, and you will be returned to your system shell.

If your Data Migrator command line is connected to a Data Migrator system service, this command will end your interactive session with that service, which will remain in operation to continue processing migrations.

If this command is encountered during non-interactive processing of input (such as when you pipe input to an instance as part of another shell script) no further commands contained in that input will be processed.

Exit the shell
exit

ALSO KNOWN AS

quit

`help`

Use the help command to get details of all commands available from the action prompt.

Display help about available commands

help [-C] string

For longer commands, you can use backslashes (\) to indicate continuation, or use quotation marks (") to enclose the full command. When using quotation marks, you can press Tab on your keyboard to make Data Migrator automatically suggest the remainder of your typed command.

See the examples below for reference.

Example

help connect

        connect - Connect to Data Migrator and Hive Migrator.

        connect [--host] string  [--ssl]  [--lm2port] int  [--hvm-port] int  [--timeout] integer  [--user] string  

Use of backslashes
help hive\ migration\ add

        hive migration add - Create new migration.

        hive migration add [--source] string  [--target] string  [--name] string  [--auto-start]  [--once]  [--rule-names] list  

Use of quotation marks
help "filesystem add local"

        filesystem add local - Add a local filesystem.

        filesystem add local [--file-system-id] string  [--fs-root] string  [--source]  [--scan-only]  [--properties-files] list  [--properties] string

`history`

Enter history at the action prompt to list all previously entered commands.

Entering history --file <filename> will save up to 500 most recently entered commands in text form to the file specified. Use this to record commands that you have executed.

Display or save the history of previously run commands

history [--file] file

Optional parameters

--file The name of the file in which to save the history of commands.

`script`

Load and execute commands from a text file using the script --file <filename> command. This file should have one command per line, and each will be executed as though they were entered directly at the action prompt in that sequence.

Use scripts outside of the WANdisco CLI by referencing the script when running the livedata-migrator command (see examples).

Read and execute commands from a file

script [--file] file

Mandatory parameters

--file The name of the file containing script commands.

Example contents of a script file

hive agent check --name sourceAgent
hive agent check --name azureAgent

Examples

info

These examples assume that myScript is inside the working directory.

Example inside CLI

script --file myScript

Example outside of CLI (non-interactive)

livedata-migrator --script=./myScript

Change log level commands

`log debug`

Enable debug level logging

log debug

`log info`

Enable info level logging

log info

`log off`

Disable logging

log off

`log trace`

Enable trace level logging

log trace

Connect commands

`connect`

Use the connect command to connect to both Data Migrator and Hivemigrator on the same host with a single command.

Connect Data Migrator
connect                 [--host] string
                        [--hvm-port] integer
                        [--ldm-port] integer
                        [--ssl] 
                        [--timeout] integer
                        [--user] string
                        hivemigrator
                        livemigrator

Mandatory parameters

--host The hostname or IP address for the Data Migrator and host.

Optional parameters

--hvm-port Specify the Hivemigrator port. If not specified, the default port value of 6780 will be used to connect.
--ldm-port Specify the Data Migrator port. If not specified, the default port value of 18080 will be used to connect.
--ssl Enter this parameter if you want to establish a TLS connection to Data Migrator. Enable Server TLS on the Data Migrator service before using this parameter.
--timeout Define the connection timeout in milliseconds. Set this parameter to override the default connection timeout of 5 minutes (300000ms).
--user The username to use for authenticating to both services. Used when instances have basic or LDAP authentication enabled. You will be prompted to enter the user password.

Connect to the Data Migrator and Hivemigrator services on the host with this command.

connect --host localhost --hvm-port 6780 --ldm-port 18080 --user admin

`connect livemigrator`

Connect to the Data Migrator service on your Data Migrator host with this command.

note

This is a manual method of connecting to the Data Migrator service as the livedata-migrator command (shown in CLI - Sign in) will attempt to establish this connection automatically.

Connect Data Migrator
connect livemigrator    [--host] string
                        [--ssl]
                        [--port] int
                        [--timeout] integer
                        [--user] string

Mandatory parameters

--host The hostname or IP address for the Data Migrator host.

Optional parameters

--ssl Enter this parameter if you want to establish a TLS connection to Data Migrator. Enable Server TLS on the Data Migrator service before using this parameter.
--port The Data Migrator port to connect on (default is 18080).
--timeout Define the connection timeout in milliseconds. Set this parameter to override the default connection timeout of 5 minutes (300000ms).
--user The username to use for authenticating to the Data Migrator service. Used only when the Data Migrator instance has basic authentication enabled. You will still be prompted to enter the user password.

Connect to the Data Migrator service on your Data Migrator host with this command.

connect livemigrator --host localhost --port 18080

`connect hivemigrator`

Connect to the Hive Migrator service on your Data Migrator host with this command.

note

This is a manual method of connecting to the Hive Migrator service as the livedata-migrator command (shown in CLI - Log in section) will attempt to establish this connection automatically.

Connect Hivemigrator
connect hivemigrator    [--host] string
                        [--ssl]
                        [--port] int
                        [--timeout] long
                        [--user] string

Mandatory parameters

--host The hostname or IP address for the Data Migrator host that contains the Hive Migrator service.

Optional parameters

--ssl Enter this parameter if you want to establish a TLS connection to Hive Migrator.
--port The Hive Migrator service port to connect on (default is 6780).
--timeout Define the connection timeout in milliseconds. Set this parameter to override the default connection timeout of 5 minutes (300000ms).
--user The username to use for authenticating to the Hive Migrator service. Used only when Hive Migrator has basic authentication enabled. You will still be prompted to enter the user password.

Example

connect hivemigrator --host localhost --port 6780

Email notifications subscription commands

`notification email addresses add`

Add email addresses to the subscription list for email notifications.

Subscribe email address to notifications.

notification email addresses add [--addresses]

Mandatory parameters

--addresses A comma-separated lists of email addresses to be added.

Example

notification email addresses add --addresses myemail@company.org,personalemail@gmail.com

`notification email addresses remove`

Remove email addresses from the subscription list for email notifications.

Unsubscribe email address to notifications.

notification email addresses remove [--addresses]

Mandatory parameters

--addresses A comma-separated lists of email addresses to be removed. Use auto-completion to quickly select from subscribed emails.

Example

notification email addresses remove --addresses myemail@company.org,personalemail@gmail.com

`notification email smtp set`

Configure the details of an SMTP server for Data Migrator to connect to.

Configure the SMTP adapter.
notification email smtp set     [--host] string  
                                [--port] integer  
                                [--security] security-enum  
                                [--email] string  
                                [--login] string  
                                [--password] string  
                                [--subject-prefix]  string

Mandatory parameters

--host The host address of the SMTP server.
--port The port to connect to the SMTP server. Many SMTP servers use port 25.
--security The type of security the server uses. Available options: NONE,SSL,STARTLS_ENABLED,STARTTLS_REQUIRED, or TLS.
--email The email address for Data Migrator to use with emails sent through the SMTP server. This address will be the sender of all configured email notifications.

Optional parameters

--login The username to authenticate with the SMTP server.
--password The password to authenticate with the SMTP server sign-in. Required if you sign in.
--subject-prefix Set an email subject prefix to help identify and filter Data Migrator notifications.

Example

notification email smtp set --host my.internal.host --port 587 --security TLS --email livedatamigrator@wandisco.com  --login myusername --password mypassword

`notification email smtp show`

Display the details of the SMTP server Data Migrator is configured to use.

Show the current configuration of SMTP adapter.

notification email smtp show

`notification email subscriptions show`

Show a list of currently subscribed emails and notifications.

Show email notification subscriptions.

notification email subscriptions show

`notification email types add`

Add notification types to the email notification subscription list.

See the output from the command notification email types show for a list of all currently available notification types.

Subscribe on notification types.

notification email types add [--types]

Mandatory parameters

--types A comma-separated list of notification types to subscribe to.

Example

notification email types add MISSING_EVENTS,EVENTS_BEHIND,MIGRATION_AUTO_STOPPED

`notification email types remove`

Remove notification types from the email notification subscription list.

Unsubscribe on notification types.

notification email types remove [--types]

Mandatory parameters

--types A comma-separated list of notification types to unsubscribe from.

Example

notification email types remove MISSING_EVENTS,EVENTS_BEHIND,MIGRATION_AUTO_STOPPED

`notification email types show`

Return a list of all available notification types to subscribe to.

Show email notification types.

notification email types show

Hive Backup Commands

`hive backup add`

Immediately create a metadata backup file.

Create new backup

hive backup add

`hive backup config show`

Show the current metadata backup configuration.

Show configuration of backups.

hive backup config show

`hive backup list`

List all existing metadata backup files.

List all backups

hive backup list

`hive backup restore`

Restore from a specified metadata backup file.

Restore backup by name

hive backup restore --name string

`hive backup schedule configure`

Configure a backup schedule for metadata migrations.

Configure backup schedule
hive backup schedule configure --period-minutes 10 --enable

{
  "enabled": true,
  "periodMinutes": 10
}

`hive backup schedule show`

Show the current metadata backup schedule.

Show current backup schedule
hive backup schedule show

  {
  "enabled": true,
  "periodMinutes": 10
  }

`hive backup show`

Show a specified metadata backup file.

hive backup show

hive backup show --name string

Hive configuration commands

`hive config certificate generate`

Generate system certificates

hive config certificate generate

`hive config certificate upload`

Create a new path mapping
hive config certificate upload  [--path-mapping-id] string
                                [--private-key] file
                                [--certificate] file
                                [--trusted-certificate] file

Mandatory parameters

--private-key Client private key used to establish a TLS connection to the remote agent.
--certificate Client certificate used to establish a TLS connection to the remote agent.
--trusted-certificate Trusted certificate used to establish a TLS connection to the remote agent.

Hive rule configuration commands

`hive rule add`,`hive rule create`

Create a Hive migration rule that is used to define which databases and tables are migrated.

info

Enter these rules when starting a new migration to control which databases and tables are migrated.

Add new Hive migration rule
hive rule add   [--database-pattern] string
                [--table-pattern] string
                [--name] string

ALSO KNOWN AS

hive rule create

Mandatory parameters

--database-pattern Enter a Hive DDL pattern that will match the database names you want to migrate.
--table-pattern Enter a Hive DDL pattern that will match the table names you want to migrate.

tip

You can use a single asterisk (*) if you want to match all databases and/or all tables within the Metastore/database.

Optional parameters

--name The name for the Hive rule.

Example

Match all database names that start with test and all tables inside of them

hive rule add --name test_databases --database-pattern test* --table-pattern *

`hive rule configure`

Change the parameters of an existing Hive rule.

The parameters that can be changed are the same as the ones listed in the hive rule add,hive rule create section.

All parameters are optional except --name, which is required to enter the existing Hive rule that you wish to configure.

Example

hive rule configure --name test_databases --database-pattern test_db*

`hive rule delete`

Delete selected Hive migration rule

hive rule delete [--name] string

Example

hive rule delete --name test_databases

`hive rule list`

Get a list of defined rules

hive rule list

`hive rule show`

Show rule details

hive rule show [--name] string

Example

hive rule show --name test_databases

Hive show commands

`hive show conf`

Returns a description of the specified Hive configuration property.

hive show conf  [--parameter] string  
                [--agent-name] string

Hive show configuration parameters

--agent-name The name of the agent.
--parameter The configuration parameter/property that you want to show the value of.

Example

Example when sourceAgent is an Apache Hive agent

hive show conf --agent-name sourceAgent --parameter hive.metastore.uris

`hive show database`

Show detailed information about a given database and agent (or sourceAgent if not set).

hive show database      [--database] string  
                        [--agent-name] string

Hive show database parameters

--database The database name. If not specified, the default will be default.
--agent-name The name of the agent.

Example

hive show database --agent-name sourceAgent --database mydb01

`hive show databases`

Get databases list from a given agent or sourceAgent if agent isn't set.

hive show databases      [--like] string  
                         [--agent-name] string

Hive show databases parameters

--like The Hive DDL pattern to use to match the database names (for example: testdb* will match any database name that begins with "testdb").
--agent-name The name of the agent.

Example

hive show database --agent-name sourceAgent --like testdb*

`hive show indexes`

Get indexes list for a given database/table and agent (or sourceAgent if not set).
hive show indexes       [--database] string  
                        [--table] string  
                        [--agent-name] string

Hive show indexes parameters

--database The database name.
--table The table name.
--agent-name The name of the agent.

Example

hive show indexes --agent-name sourceAgent --database mydb01 --table mytbl01

`hive show partitions`

Get partitions list for a given database/table and agent (or sourceAgent if not set).
hive show partitions    [--database] string  
                        [--table] string  
                        [--agent-name] string

Hive show partitions parameters

--database The database name.
--table The table name.
--agent-name The name of the agent.

Example

hive show partitions --agent-name sourceAgent --database mydb01 --table mytbl01

`hive show table`

Show detailed information about a given table using the given agent (or sourceAgent if not set).
hive show table [--database] string  
                [--table] string  
                [--agent-name] string

Hive show table parameters

--database The database name where the table is located.
--table The table name.
--agent-name The name of the agent.

Example

hive show table --agent-name sourceAgent --database mydb01 --table mytbl01

`hive show tables`

Get tables list for a given database (default if not set ) and agent (sourceAgent if not set).

hive show tables [[--like] string]  [[--database] string]  [[--agent-name] string]

Hive show tables parameters

--like The Hive DDL pattern to use to match the table names (for example: testtbl* will match any table name that begins with "testtbl").
--database Database name. Defaults to default if not set.
--agent-name The name of the agent.

Example

hive show tables --agent-name sourceAgent --database mydb01 --like testtbl*

License manipulation commands

`license show`

Show the details of the active license

license show [--full]

`license upload`

Upload a new license by submitting its location on the local filesystem

license upload [--path] string

Example

license upload --path /user/hdfs/license.key

Notification commands

`notification latest`

Get the latest notification

notification latest

`notification list`

Get notifications
notification list       [--count] integer
                        [--since] string
                        [--type] string
                        [--exclude-resolved]
                        [--level] string

Optional parameters

--count The number of notifications to return.
--since Return notifications created after this date/time.
--type The type of notification to return e.g. LicenseExceptionNotification.
--exclude-resolved Exclude resolved notifications.
--level The level of notification to return.

`notification show`

Show notification details

notification show [--notification-id]  string

Mandatory parameters

--notification-id The id of the notification to return.

Source commands

`source clear`

Clear all information that Data Migrator maintains about the source filesystem by issuing the source clear command. This will allow you to define an alternative source to one previously defined or detected automatically.

Delete all sources

source clear

`source delete`

Use source delete to delete information about a specific source by ID. You can obtain the ID for a source filesystem with the output of the source show command.

Delete a source

source delete [--file-system-id] string

Mandatory parameters

--file-system-id The ID of the source filesystem resource you want to delete.

Example

source delete --file-system-id auto-discovered-source-hdfs

`source show`

Get information about the source filesystem configuration.

Show the source filesystem configuration

source show [--detailed]

Optional parameters

---detailed Include all configuration properties for the source filesystem in the response.

System service commands
Connect to the WANdisco CLI
- livedata-migrator
- WANdisco CLI features
WANdisco CLI commands

System service commands​

Data Migrator​

Running without systemd​

Hive Migrator​

Running without systemd​

Hive Migrator remote server​

Running without systemd​

WANdisco UI​

Running without systemd​

Data transfer agents​

Running without systemd​

Connect to the WANdisco CLI​

livedata-migrator​

Optional parameters​

Version check​

WANdisco CLI features​

WANdisco CLI commands​

Auto source cleanup commands​

migration deletion-report list​

Mandatory parameters​

Optional parameters​

migration deletion-report download​

Mandatory parameters​

Example: Download a cleanup report​

migration deletion-report delete​

Mandatory parameters​

Backup commands​

backup add​

backup config show​

backup list​

backup restore​

backup schedule configure​

backup schedule show​

backup show​

Bandwidth policy commands​

bandwidth policy delete​

bandwidth policy set​

Mandatory parameters​

Example​

bandwidth policy show​

Data transfer agent commands​

agent add​

Mandatory parameters​

Register an agent​

Start an agent​

Remove an agent​

Mandatory parameters​

View an agent​

Mandatory parameters​

agent list​

Filesystem commands​

filesystem add adls2 oauth​

Mandatory parameters​

Optional parameters​

Example​

filesystem add adls2 sharedKey​

Mandatory parameters​

Optional parameters​

Example​

filesystem add gcs​

Mandatory parameters​

Optional parameters​

Example​

filesystem add hdfs​

Mandatory parameters​

Optional parameters​

Properties files are required for NameNode HA​

Examples​

HDFS as source​

HDFS as target​

filesystem add local​

Mandatory parameters​

Optional parameters​

Examples​

Local filesystem as source​

Local filesystem as target​

filesystem add s3a​

S3A mandatory parameters​

S3A optional parameters​

IBM COS as a source only​

System service commands

Data Migrator

Running without systemd

Hive Migrator

Running without systemd

Hive Migrator remote server

Running without systemd

WANdisco UI

Running without systemd

Data transfer agents

Running without systemd

Connect to the WANdisco CLI

`livedata-migrator`

Optional parameters

Version check

WANdisco CLI features

WANdisco CLI commands

Auto source cleanup commands

`migration deletion-report list`

Mandatory parameters

Optional parameters

`migration deletion-report download`

Mandatory parameters

Example: Download a cleanup report

`migration deletion-report delete`

Mandatory parameters

Backup commands

`backup add`

`backup config show`

`backup list`

`backup restore`

`backup schedule configure`

`backup schedule show`

`backup show`

Bandwidth policy commands

`bandwidth policy delete`

`bandwidth policy set`

Mandatory parameters

Example

`bandwidth policy show`

Data transfer agent commands

`agent add`

Mandatory parameters

Register an agent

Start an agent

Remove an agent

Mandatory parameters

View an agent

Mandatory parameters

`agent list`

Filesystem commands

`filesystem add adls2 oauth`

Mandatory parameters

Optional parameters

Example

`filesystem add adls2 sharedKey`

Mandatory parameters

Optional parameters

Example

`filesystem add gcs`

Mandatory parameters

Optional parameters

Example

`filesystem add hdfs`

Mandatory parameters

Optional parameters

Properties files are required for NameNode HA

Examples

HDFS as source

HDFS as target

`filesystem add local`

Mandatory parameters

Optional parameters

Examples

Local filesystem as source

Local filesystem as target

`filesystem add s3a`

S3A mandatory parameters

S3A optional parameters

IBM COS as a source only