Skip to main content
Version: 2.6

Command reference

System service commands

The service scripts are used to control operation of each individual service. In most supported Linux distributions, the following commands can be used to manage Data Migrator, Hive Migrator, and UI processes.

Data Migrator

systemd commandUse it to...
systemctl start livedata-migratorStart a service that isn't currently running.
systemctl stop livedata-migratorStop a running service.
systemctl restart livedata-migratorRun a command that performs a stop and then a start.
If the service isn't running, this works the same as a start command.
systemctl status livedata-migratorGet details of the running service's status.

Running without systemd

For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:

service <service name> <command>
Example command to restart Data Migrator:
service livedata-migrator restart
info

If you're working in an environment without systemd or a system and service manager, you need to run the start.sh script located in /opt/wandisco/livedata-migrator again after running the restart command.

Hive Migrator

Service scriptUse it to...
systemctl start hivemigratorStart a service that isn't currently running.
systemctl stop hivemigratorStop a running service.
systemctl restart hivemigratorRun a command that performs a stop and then a start.
If the service isn't running, this works the same as a start command.
systemctl status hivemigratorGet details of the running service's status.
info

Always start/restart Hive Migrator services in the following order:

  1. Remote agents
  2. Hive Migrator service.

Not starting services in this order may cause live migrations to fail.

If you're working in an environment without systemd or a system and service manager, you need to run the start.sh script located in /opt/wandisco/hivemigrator again after running the restart command.

Running without systemd

For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:

service <service name> <command>
Example command to view status of Hive Migrator:
service hivemigrator status

Hive Migrator remote server

Service scriptUse it to...
systemctl start hivemigrator-remote-serverStart a service that isn't currently running.
systemctl stop hivemigrator-remote-serverStop a running service.
systemctl restart hivemigrator-remote-serverRun a command that performs a stop and then a start.
If the service isn't running, this works the same as a start command.
systemctl status hivemigrator-remote-serverGet details of the running service's status.
info

Always start/restart Hive Migrator services in the following order:

  1. Remote agents
  2. Hive Migrator service.

Not starting services in this order may cause live migrations to fail.

If you're working in an environment without systemd or a system and service manager, you need to run the start.sh script located in /opt/wandisco/hivemigrator again after running the restart command.

Running without systemd

For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:

service <service name> <command>
Example command to view status of Hive Migrator remote server:
service hivemigrator-remote-server status

UI

Service scriptUse it to...
systemctl start livedata-uiStart a service that isn't currently running.
systemctl stop livedata-uiStop a running service.
systemctl restart livedata-uiRun a command that performs a stop and then a start.
If the service isn't running, this works the same as a start command.
systemctl status livedata-uiGet details of the running service's status.

Running without systemd

For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:

service <service name> <command>
Example command to see the status of the UI service:
service livedata-ui status
info

If you're working in an environment without systemd or a system and service manager, you need to run the start.sh script located in /opt/wandisco/livedata-ui again after running the restart command.

Data transfer agents

systemd commandUse it to...
systemctl start livedata-migrator-data-agentStart a service that isn't currently running.
systemctl stop livedata-migrator-data-agentStop a running service.
systemctl restart livedata-migrator-data-agentRun a command that performs a stop and then a start.
If the service isn't running, this works the same as a start command.
systemctl status livedata-migrator-data-agentGet details of the running service's status.

Running without systemd

For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:

service <service name> <command>
Example command to restart a data transfer agent
service livedata-migrator-data-agent restart
info

If you're working in an environment without systemd or a system and service manager, you need to run the start.sh scripts located in /opt/wandisco/livedata-migrator-data-agent again after running the restart commands.

Connect to the CLI

Open a terminal session on the Data Migrator host machine and enter the following command:

livedata-migrator

When the CLI connects to the Data Migrator and Hive Migrator services, you get the following command prompt:

Cirata LiveData Migrator >>

The CLI is now ready to accept commands.

Optional parameters

  • --host The IP or hostname of the Data Migrator API to connect to. Defaults to localhost when not specified.
  • --vm-port Data Migrator API port. Defaults to 18080 when not specified.
  • --hm-port Hivemigrator API port. Defaults to 6780 when not specified.
  • --lm-ssl Flag to use https. Defaults to http when not specified.

Version check

Check the current versions of included components by using the livedata-migrator command with the --version parameter. For example:

# livedata-migrator --version
tip

This doesn't start the CLI. You get a list of the current Data Migrator components, along with their version numbers.

CLI features

FeatureHow to use it
Review available commandsUse the help command to get details of all commands available.
Command completionHit the <tab> key at any time to get assistance or to complete partially-entered commands.
Cancel inputType <Ctrl-C> before entering a command to return to an empty action prompt.
Syntax indicationInvalid commands are highlighted as you type.
Clear the displayType <Ctrl-L> at any time.
Previous commandsNavigate previous commands using the up and down arrows, and use standard emacs shortcuts.
Interactive or scripted operationYou can interact with the command line interface directly, or send it commands on standard input to incorporate it into shell scripts. See script for more information and examples.

CLI commands

You can manage filesystems, migrations, and more in the CLI.

Backup commands

backup add

Immediately create a backup file
backup add

backup config show

Show the current backups configuration
backup config show

{
"backupsLocation": "/opt/wandisco/livedata-migrator/db/backups",
"lastSuccessfulTs": 0,
"backupSchedule": {
"enabled": true,
"periodMinutes": 10
},
"storedFilePaths": [
"/etc/wandisco/livedata-migrator/application.properties",
"/etc/wandisco/livedata-migrator/vars.env",
"/etc/wandisco/livedata-migrator/logback-spring.xml",
"/etc/wandisco/livedata-migrator/application.properties",
"/etc/wandisco/livedata-migrator/vars.env",
"/etc/wandisco/livedata-migrator/logback-spring.xml"
]
}

backup list

List all existing backup files
backup list

backup restore

Restore from a specified backup file
backup restore --name string

backup schedule configure

Configure a backup schedule for Data Migrator
backup schedule configure --period-minutes 10 --enable

{
"enabled": true,
"periodMinutes": 10
}

backup schedule show

Show current backup schedule
backup schedule show

{
"enabled": true,
"periodMinutes": 10
}

backup show

Show a specified backup file
backup show --name string

Bandwidth policy commands

bandwidth policy delete

Allow the application to use unlimited bandwidth
bandwidth policy delete

bandwidth policy set

Set the application bandwidth limit, in bytes per second
bandwidth policy set    [--value] long  
[--unit] string
[--data-agent] string

Mandatory parameters

  • --value Define the number of byte units.
  • --unit Define the byte unit to be used.
    Decimal units: B, KB, MB, GB, TB, PB.
    Binary units: KiB, MiB, GiB, TiB, PiB.

Optional parameters

  • --data-agent Apply the limit to a specified data agent.

Example

Set a limit of 10 Megabytes per second
bandwidth policy set --value 10 --unit MB
Set a limit of 10 Megabytes per second for agent
bandwidth policy set --data-agent DTA1 --value 10 --unit MB

bandwidth policy show

Get details of the application bandwidth limit, in bytes per second
bandwidth policy show

Data transfer agent commands

agent add

Add a new agent.

Mandatory parameters

  • --agent-name
    User-specified agent name.

You must enter a value for either the --agent-token or the --agent-token-file parameter:

  • --agent-token
    Connection token text provided by the token generator. You can use the content of /opt/wandisco/livedata-migrator-data-agent/connection_token in the node on which you're installing the agent.

  • --agent-token-file
    Path to file contains connection token, for example /opt/wandisco/livedata-migrator-data-agent/connection_token. Ensure the token file is accessible on the Data Migrator host.

Example
agent add --agent-name dta1 --agent-token-file /opt/wandisco/livedata-migrator-data-agent/connection_token

To check the agent was added, run:

agent show --agent-name example_name

Register an agent

Curl example
curl -X POST -H "Content-Type: application/json" -d @/opt/wandisco/livedata-migrator-data-agent/reg_data_agent.json http://migrator-host:18080/scaling/dataagents/
Check the agent was added
curl -X GET http://migrator-host:18080/scaling/dataagents/example_name
note

migrator-host is the host where Data Migrator is installed.

Start an agent

Start an agent
service livedata-migrator-data-agent start

Remove an agent

Remove an agent
agent delete --agent-name example_name
Example: Remove an agent
agent delete --agent-name agent-example-vm.bdauto.wandisco.com

Mandatory parameters

  • --agent-name
    The name you give the agent which can be a string such as agent-example-vm.bdauto.wandisco.com.

View an agent

View an agent
agent show --agent-name example_name
Example: View an agent
agent show --agent-name agent-example-vm.bdauto.wandisco.com
Example output
{
"name": "agent-example-vm.bdauto.wandisco.com",
"host": "example-vm.bdauto.wandisco.com",
"port": 1433,
"type": "GRPC",
"version": "2.0.0",
"healthy": true,
"health": {
"lastStatusUpdateTime": 1670924489556,
"lastHealthMessage": "Agent agent-example-vm.bdauto.wandisco.com - health check became OK",
"status": "CONNECTED"

Mandatory parameters

  • --agent-name
    User-specified agent name.

agent list

List all agents.

Filesystem commands

filesystem add adls2 oauth

Add an Azure Data Lake Storage (ADLS) Gen2 container as a migration target using the filesystem add adls2 oauth command, which requires a service principal and OAuth 2 credentials.

note

The service principal that you want to use must have either the Storage Blob Data Owner role assigned to the ADLS Gen2 storage account, or an access control list with RWX permissions for the migration path and all parent paths. For more information, see the Microsoft documentation.

Add an ADLS Gen2 filesystem with OAuth
    filesystem add adls2 oauth          [--container-name] string  
[--file-system-id] string
[--insecure]
[--oauth2-client-endpoint] string
[--oauth2-client-id] string
[--oauth2-client-secret] string
[--properties] string
[--properties-files] list
[--scan-only]
[--source]
[--storage-account-name] string

Mandatory parameters

  • --container-name The name of the container in the storage account to which content will be migrated.
  • --file-system-id The ID to give the new filesystem resource.
  • --oauth2-client-endpoint The client endpoint for the Azure service principal.
    This will often take the form of https://login.microsoftonline.com/{tenant}/oauth2/v2.0/token where {tenant} is the directory ID for the Azure service principal. You can enter a custom URL (such as a proxy endpoint that manually interfaces with Azure Active Directory (Azure AD)).
  • --oauth2-client-id The client ID (also known as application ID) for your Azure service principal.
  • --oauth2-client-secret The client secret (also known as application secret) for the Azure service principal.
  • --storage-account-name The name of the ADLS Gen2 storage account to target.

Optional parameters

  • --insecure If you enter this parameter, Data Migrator will not use TLS to encrypt communication with ADLS Gen2. This may improve throughput, but should only be used when you have other means of securing communication.
  • --properties Enter properties to use in a comma-separated key/value list.
  • --properties-files Reference a list of existing properties files, each containing Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.
  • --scan-only Supply this parameter to create a static source filesystem for use in one-time, non-live migrations. Requires --source.
  • --source Add this filesystem as the source for migrations.

Example

filesystem add adls2 oauth --file-system-id mytarget
--storage-account-name myadls2
--oauth2-client-id b67f67ex-ampl-e2eb-bd6d-client9385id
--oauth2-client-secret 2IPO8*secretk-9OPs8n*TexampleHJ=
--oauth2-client-endpoint https://login.microsoftonline.com/78u098ex-ampl-e498-8bce-ndpoint5f2e5/oauth2/v2.0/token
--container-name lm2target

filesystem add adls2 sharedKey

Add an ADLS Gen2 container as a migration target using the filesystem add adls2 sharedKey command, which requires credentials in the form of an account key.

Add an ADLS Gen2 filesystem with Shared Key
filesystem add adls2 sharedKey      [--file-system-id] string  
[--storage-account-name] string
[--container-name] string
[--insecure]
[--shared-key] string
[--properties-files] list
[--properties] string
[--scan-only]
[--source]

Mandatory parameters

  • --file-system-id The ID to give the new filesystem resource.
  • --storage-account-name The name of the ADLS Gen2 storage account to target.
  • --shared-key The shared account key to use as credentials to write to the storage account.
  • --container-name The name of the container in the storage account to which content will be migrated.

Optional parameters

  • --insecure If you enter this parameter, Data Migrator will not use TLS to encrypt communication with ADLS Gen2. This may improve throughput, but should only be used when you have other means of securing communication.
  • --properties-files Reference a list of existing properties files, each containing Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.
  • --properties Enter properties to use in a comma-separated key/value list.
  • --scan-only Supply this parameter to create a static source filesystem for use in one-time, non-live migrations. Requires --source.
  • --source Add this filesystem as the source for migrations.

Example

filesystem add adls2 sharedKey  --file-system-id mytarget
--storage-account-name myadls2
--container-name lm2target
--shared-key Yi8NxHGqoQ79DBGLVn+COK/sRDwbNqAEXAMPLEDaMxRkvXt2ijUtASHAREDj/vaS/NbzR5rtjEKEY31eIopUVA==

filesystem add gcs

Add a Google Cloud Storage as a migration target using the filesystem add gcs command, which requires credentials in the form of an account key file.

Add a Google Cloud Storage filesystem
filesystem add gcs      [--bucket-name] string
[--file-system-id] string
[--properties] string
[--properties-files] list
[--service-account-json-key-file] string
[--service-account-json-key-file-server-location] string
[--service-account-json-vault-reference] string
[--source]

Mandatory parameters

  • --file-system-id The ID to give the new filesystem resource.
  • --bucket-name The bucket name of a Google Cloud Storage account.

Service account key parameters

info

Enter your service account key for the Google Cloud Storage bucket by choosing one of the parameters below.

  • --service-account-json-key-file-server-location
    The absolute filesystem path on the Data Migrator server of your service account key file in JSON format. You can either create a Google Cloud Storage service account key or use an existing one.
  • --service-account-json-key-file
    The absolute filesystem path on the host running the Data Migrator CLI of your service account key file in JSON format. Use this parameter if you're running the CLI on a different host to your Data Migrator server.
    info

    Data Migrator imports GCS credentials from your --service-account-json-key-file, stores them internally as configuration properties, then removes the file.

  • --service-account-json-vault-reference The HashiCorp Vault reference to the location of the content of the GCS Key File using the Reference format. Use this option if you have a secrets store configured.

Optional parameters

  • --properties Enter properties to use in a comma-separated key/value list.
  • --properties-files Reference a list of existing properties files, each containing Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.
  • --source Enter this parameter to use the filesystem resource created as a source.

Example

filesystem add gcs --bucket-name mygcsgbucket --file-system-id GCS1 --service-account-json-key-file-server-location /var/tmp/key-999999.json --source

filesystem add gpfs

Add an IBM Spectrum Scale (GPFS) filesystem as a migration source using the filesystem add gpfs CLI command. See the main IBM Spectrum Scale (GPFS) section for additional information.

Add an IBM Spectrum Scale filesystem
filesystem add gpfs      [--default-fs] string
[--file-system-id] string
[--gpfs-kerberos-keytab] string
[--gpfs-kerberos-principal] string
[--kafka-bootstrap-servers] string
[--kafka-group-id] string
[--kafka-kerberos-principal] string
[--kafka-kerberos-keytab] string
[--kafka-topic] string
[--mount-point] string
[--properties] string
[--properties-files] list
[--scan-only]
[--use-ssl]
[--user] string

Mandatory parameters

  • --default-fs The default filesystem URI for this filesystem. For example, hdfs://192.168.1.10:8020, hdfs://myhost.localdomain:8020 or hdfs://mynameservice.
  • --file-system-id The ID or name to give this new filesystem resource.
  • --mount-point The root of the GPFS mount on HDFS. For example, /gpfs/myfs/cluster-data/.

Optional parameters

  • --gpfs-kerberos-keytab The GPFS Kerberos keytab containing the principal defined for the --gpfs-kerberos-principal parameter. This must be accessible to the local system user running the Data Migrator service (default is hdfs).
  • --gpfs-kerberos-principal The GPFS Kerberos principal to authenticate with and perform migrations as. This principal should map to the GPFS superuser using auth_to_local rules.
  • --kafka-bootstrap-servers The hostname and port of Kafka Bootstrap servers. Use comma-separated pairs for multiple servers. For example, hostname:9092,hostname2:9092.
  • --kafka-group-id The Kafka consumer identifier. Group Identifier is a unique ID for Kafka consumer which can be specified by the user. For example, my-group-id.
  • --kafka-kerberos-principal The Kafka Kerberos principal to authenticate with Kafka.
  • --kafka-kerberos-keytab The Kafka Kerberos keytab containing the principal defined for the --kafka-kerberos-principal parameter. This must be accessible to the local system user running the Data Migrator service.
  • --kafka-topic The Kafka topic name for event delivery. See the Apache Kafka documentation for more information on topic creation.
  • --properties Enter properties to use in a comma-separated key/value list.
  • --properties-files Reference a list of existing properties files that contain Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.
  • --scan-only Use this parameter to create a static source filesystem for use in one-time migrations.
  • --use-ssl Use this parameter if the Kafka server is using TLS. When you submit the command with this option, you'll then be prompted to supply:
    • Kafka SSL truststore location The truststore location. This must be accessible to the local system user running the Data Migrator service.
    • Kafka SSL truststore password The truststore password.
  • --user Name of the HDFS user to be used when performing operations against the filesystem. In environments where Kerberos is disabled, this user must be the HDFS superuser, such as hdfs.

Example

Add a live IBM Spectrum Scale (GPFS) source
filesystem add gpfs --default-fs hdfs://SourceCluster:8020 --file-system-id GPFS-Source --gpfs-kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --gpfs-kerberos-principal hdfs04@REALM.HADOOP --kafka-bootstrap-servers bootstapServer1:9093 --kafka-group-id kafGroup1 --kafka-kerberos-keytab /etc/security/keytabs/kafka.service.keytab --kafka-kerberos-principal kafka/gpfsapr@REALM.HADOOP --kafka-topic FS1-WATCH-EVENT --mount-point /gpfs/fs1/cluster-data --properties-files /etc/wandisco/livedata-migrator/conf/ --use-ssl
<ENTER>
Kafka SSL truststore location: /etc/cirata/livedata-migrator/conf/kafka-keystore.p12
<ENTER>
Kafka SSL truststore password: *********
<ENTER>

filesystem add hdfs

Add a Hadoop Distributed File System (HDFS) as either a migration source or target using the filesystem add hdfs command.

Creating a HDFS resource with this command will normally only be used when migrating to a target HDFS filesystem (rather than another storage service like ADLS Gen2 or S3a). Data Migrator will attempt to auto-discover the source HDFS when started from the command line unless Kerberos is enabled on your source environment.

If Kerberos is enabled on your source environment, use the filesystem auto-discover-source hdfs command to enter Kerberos credentials and auto-discover your source HDFS configuration.

Add a Hadoop Distributed File System
filesystem add hdfs     [--file-system-id] string  
[--default-fs] string
[--user] string
[--kerberos-principal] string
[--kerberos-keytab] string
[--source]
[--scan-only]
[--success-file] string
[--properties-files] list
[--properties] string

Mandatory parameters

  • --file-system-id The ID to give the new filesystem resource.
  • --default-fs A string that defines how Data Migrator accesses HDFS.
    It can be specified in a number of forms:
    1. As a single HDFS URI, such as hdfs://192.168.1.10:8020 (using an IP address) or hdfs://myhost.localdomain:8020 (using a hostname).
    2. As a HDFS URI that references a nameservice if the NameNodes have high availability, for example, hdfs://mynameservice. For more information, see HDFS High Availability.
  • --properties-files Reference a list of existing properties files that contain Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.

Optional parameters

Kerberos: Cross-realm authentication required between source and target HDFS

Cross-realm authentication is required in the following scenarios:

  • Migration will occur between a source and target HDFS.
  • Kerberos is enabled on both clusters.

See the links below for guidance for common Hadoop distributions:

  • --user The name of the HDFS user to be used when performing operations against the filesystem. In environments where Kerberos is disabled, this user must be the HDFS super user, such as hdfs.
  • --kerberos-principal The Kerberos principal to authenticate with and perform migrations as. This principal should map to the HDFS super user using auth_to_local rules.
  • --kerberos-keytab The Kerberos keytab containing the principal defined for the --kerberos-principal parameter. This must be accessible to the local system user running the Data Migrator service (default is hdfs).
  • --source Enter this parameter to use the filesystem resource created as a source.
  • --scan-only Enter this parameter to create a static source filesystem for use in one-time migrations. Requires --source.
  • --properties Enter properties to use in a comma-separated key/value list.
  • --success-file Specify a file name or glob pattern for files that Data Migrator will migrate last from the directory they're contained in. For example, --success-file /mypath/myfile.txt or --success-file /**_SUCCESS. You can use these files to confirm the directory they're in has finished migrating.
Properties files are required for NameNode HA

If your Hadoop cluster has NameNode HA enabled, enter the local filesystem path to the properties files that define the configuration for the nameservice ID.

Source HDFS filesystem: These configuration files will likely be in a default location depending on the distribution of the Hadoop cluster.

Target HDFS filesystem: Ensure that the target Hadoop cluster configuration is available on your Data Migrator host's local filesystem.

Example for path containing source cluster configuration
/etc/hadoop/conf
Example for path containing target cluster configuration
/etc/targetClusterConfig

Alternatively, define the absolute filesystem paths to these files:

Example for absolute paths to source cluster configuration files
/etc/hadoop/conf/core-site.xml
/etc/hadoop/conf/hdfs-site.xml
Example for absolute paths to target cluster configuration files
/etc/targetClusterConfig/core-site.xml
/etc/targetClusterConfig/hdfs-site.xml
  • For the CLI/API, use the --properties-files parameter and define the absolute paths to the core-site.xml and hdfs-site.xml files (see the Examples section for CLI usage of this parameter).

Examples

HDFS as source
Example for source NameNode HA cluster
filesystem add hdfs     --file-system-id mysource
--default-fs hdfs://sourcenameservice
--properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
Example for source NameNode HA cluster with Kerberos enabled
filesystem add hdfs     --file-system-id mysource
--default-fs hdfs://sourcenameservice
--properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
--kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab
--kerberos-principal hdfs@SOURCEREALM.COM
HDFS as target
note

If you enter a HDFS filesystem as a target, the property files for the target cluster must exist on the local filesystem and be accessible to the Data Migrator system user.

Example for target NameNode HA cluster with Kerberos enabled
filesystem add hdfs --file-system-id mytarget --default-fs hdfs://targetnameservice --properties-files /etc/targetClusterConfig/core-site.xml,/etc/targetClusterConfig/hdfs-site.xml --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal hdfs@SOURCEREALM.COM
Example for target single NameNode cluster
filesystem add hdfs --file-system-id mytarget --default-fs hdfs://namenode.targetdomain:8020 --user hdfs

filesystem add local

Add a local filesystem as either a migration target or source using the filesystem add local command.

Add a Local Filesystem
filesystem add local    [--file-system-id] string
[--fs-root] string
[--source]
[--scan-only]
[--properties-files] list
[--properties] string

Mandatory parameters

  • --file-system-id The ID to give the new filesystem resource.

Optional parameters

  • --fs-root The directory in the local filesystem to scan for data or send data to, depending on whether the filesystem is defined as a source or a target. Should be supplied using the full directory path from the root.
  • --source Enter this parameter to use the filesystem resource created as a source.
  • --scan-only Enter this parameter to create a static source filesystem for use in one-time migrations. Requires --source.
  • --properties-files Reference a list of existing properties files.
  • --properties Enter properties to use in a comma-separated key/value list.
note

If no fs-root is specified, the file path will default to the root of your system.

Examples

Local filesystem as source
filesystem add local --file-system-id mytarget --fs-root ./tmp --source
Local filesystem as target
filesystem add local --file-system-id mytarget --fs-root ./Users/username/destinationfolder/

filesystem add s3a

Add an S3-compatible filesystem as a source or target for migration.

For details on which platforms support S3, see Supported sources and targets.

info

As of Data Migrator 2.1.1 hcfs.ssl.channel.mode replaces the use of fs.s3a.ssl.channel.mode and fs.azure.ssl.channel.mode which are no longer valid. See SSL implementation for information on the property and values used.

Use the filesystem add s3a command with the following parameters:

Add an S3 filesystem
filesystem add s3a          [--access-key] string
[--aws-config-file] string
[--aws-profile] string
[--bootstrap.servers] string
[--bucket-name] string
[--credentials-provider] string
[--endpoint] string
[--file-system-id] string
[--properties] string
[--properties-files] list
[--s3type] string
[--scan-only]
[--secret-key] string
[--source]
[--sqs-endpoint] string
[--sqs-queue] string
[--topic] string

For guidance about access, permissions, and security when adding an Amazon S3 bucket as a target filesystem, see Security best practices in IAM.

S3A mandatory parameters

  • --file-system-id The ID for the new filesystem resource.

  • --bucket-name The name of your S3 bucket.

  • --credentials-provider The Java class name of a credentials provider for authenticating with the Amazon S3 endpoint.
    The Provider options available include:

    • org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider

      Use this provider to offer credentials as an access key and secret access key with the --access-key and --secret-key Parameters.

    • com.amazonaws.auth.InstanceProfileCredentialsProvider

      Use this provider when running Data Migrator on an Elastic Compute Cloud (EC2) instance that has been assigned an IAM role with policies that allow it to access the Amazon S3 bucket.

    • com.amazonaws.auth.DefaultAWSCredentialsProviderChain

      A commonly-used credentials provider chain that looks for credentials in this order:

      • Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, or AWS_ACCESS_KEY and AWS_SECRET_KEY.
      • Java System Properties - aws.accessKeyId and aws.secretKey.
      • Web Identity Token credentials from the environment or container.
      • Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI.
      • Credentials delivered through the Amazon EC2 container service if AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable is set and security manager has permission to access the variable.
      • Instance profile credentials delivered through the Amazon EC2 metadata service.
    • com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider

      This provider supports the use of multiple AWS credentials, which are stored in a credentials file.

      When adding a source filesystem, use the following properties:

      • awsProfile - Name for the AWS profile.

      • awsCredentialsConfigFile - Path to the AWS credentials file. The default path is ~/.aws/credentials.

        For example:

        filesystem add s3a --file-system-id testProfile1Fs --bucket-name profile1-bucket --credentials-provider com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider --properties awsProfile=<profile-name>,
        awsCredentialsConfigFile=</path/to/the/aws/credentials" file>

        In the CLI, you can also use --aws-profile and --aws-config-file.

        For example:

        filesystem add s3a --file-system-id testProfile1Fs --bucket-name profile1-bucket --credentials-provider com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider --aws-profile <profile-name>
        --aws-config-file </path/to/the/aws/credentials/file>

        Learn more about using AWS profiles: Configuration and credential file settings.

S3A optional parameters

  • --access-key When using the org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider credentials provider, enter the access key with this parameter. This is also a required parameter when adding an IBM Cloud Object Storage bucket.

  • --secret-key When using the org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider credentials provider, enter the secret key using this parameter. This is also a required parameter when adding an IBM Cloud Object Storage bucket.

  • --endpoint Enter a specific endpoint to access the S3-compatible bucket, such as an AWS PrivateLink endpoint or an IBM COS public regional endoint. If you don't enter a value, the filesystem defaults to AWS.

    note

    Using --endpoint, will supercede fs.s3a.endpoint, if used as an additional custom property. Don't use the parameters at the same time.

  • --sqs-queue [Amazon S3 as a source only] Enter an SQS queue name. This field is required if you enter an SQS endpoint.

  • --sqs-endpoint [Amazon S3 as a source only] Enter an SQS endpoint.

  • --source Enter this parameter to add the filesystem as a source. See which platforms are supported as a source.

  • --scan-only Enter this parameter to create a static source filesystem for use in one-time migrations. Requires --source.

  • --properties-files Reference a list of existing properties files, each containing Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.

  • --properties Enter properties to use in a comma-separated key/value list.

  • --s3type Specifies what parameters are required, based on the requirements of your selected s3a-compatible filesystem. Leave it blank for s3-compatible storage or select from the following:

    • aws
    • oracle
    • ibmcos
IBM COS as a source only
  • --bootstrap.servers The Kafka server address.
  • --topic Kafka's topic where s3 object change notifications are provided.

S3a default properties

These properties are defined by default when adding an S3a filesystem.

info

You don't need to define or adjust many of these properties, use caution when making any changes, if you are unsure get in touch with Support for more information.

Enter additional properties for S3 filesystems by adding them as key-value pairs in the UI or as a comma-separated key-value pair list with the --properties parameter in the CLI. You can overwrite default property values or add new properties.

  • fs.s3a.impl (default org.apache.hadoop.fs.s3a.S3AFileSystem): The implementation class of the S3a Filesystem.
  • fs.AbstractFileSystem.s3a.impl (default org.apache.hadoop.fs.s3a.S3A): The implementation class of the S3a AbstractFileSystem.
  • fs.s3a.user.agent.prefix (default APN/1.0 WANdisco/1.0 LiveDataMigrator/(ldm version)): Sets a custom value that will be pre-pended to the User-Agent header sent in HTTP requests to the S3 back-end by S3aFileSystem.
  • fs.s3a.impl.disable.cache (default true): Disables the S3 filesystem cache when set to 'true'.
  • hadoop.tmp.dir (default tmp): The parent directory for other temporary directories.
  • fs.s3a.connection.maximum (default 225) Defines the maximum number of simultaneous connections to the S3 filesystem.
  • fs.s3a.threads.max (default pull.threads + 10): Defines the total number of threads to make available in the filesystem for data uploads or any other queued filesystem operation. Default is the current LDM pull.threads value plus 10.
  • fs.s3a.max.total.tasks (default 75): Defines maximum number of tasks allowed for parallel operations.
  • fs.s3a.sqs.init.dir (default /sqs-init-path): SQS initialization path.
  • fs.s3a.empty.polls.max.count (default 10): Maximum number of empty listing responses accepted before considering a directory listing operation as finished.
  • fs.s3a.sqs.messages.max.number (default 10): Maximum number of messages to pull from an SQS queue in a single request.
  • fs.s3a.sqs.wait.time.sec (default 20): Duration in seconds to wait for messages in the SQS queue when polling for notifications.
  • fs.s3a.path.events.cache.size (default 0): Number of entries or paths that can be cached.
  • fs.s3a.path.events.cache.expiration.time.min (default 60): Time-to-live for entries stored in the events cache.
  • s3a.events.poll.max.retries (default 10): Maximum number of retries the connector attempts for polling events.
  • fs.s3a.healthcheck (Default true): Allows the S3A filesystem health check to be turned off by changing true to false. This option is useful for setting up Data Migrator while cloud services are offline. However, when disabled, errors in S3A configuration may be missed, resulting in hard-to-diagnose migration stalls.

S3a custom properties

These are some of the additional properties that can be added when creating an S3a filesystem.

  • fs.s3a.fast.upload.buffer (default disk): Defines how the filesystem will buffer the upload.
  • fs.s3a.fast.upload.active.blocks (default 4): Defines how many blocks a single output stream can have uploading or queued at a given time.
  • fs.s3a.block.size (default 32M): Defines the maximum size of blocks during file transfer. Use the suffix K, M, G, T or P to scale the value in Kilobytes, Megabytes, Gigabytes, Terabytes or Petabytes respectively.
  • fs.s3a.buffer.dir (default tmp): Defines the directory used by disk buffering.
  • fs.s3a.endpoint.region (default Current region): Explicitly sets the bucket region.
note

To configure a Oracle Cloud Storage bucket which isn't in your default region. Specify a fs.s3a.endpoint.region=<region> with the --properties flag when adding the filesystem with the CLI.

See Oracle Cloud Storage additional properties example.

Find an additional list of S3a properties in the S3a documentation.

Upload buffering

Migrations using an S3A target destination will buffer all uploads. By default, the buffering will occur on the local disk of system Data Migrator is running on, in the /tmp directory.

Data Migrator will automatically delete the temporary buffering files once they are no longer needed.

If you want to use a different type of buffering, you can change the property fs.s3a.fast.upload.buffer. The following values can be supplied:

Buffering OptionDetailsProperty Value
Array BufferBuffers the uploaded data in memory instead of on disk, using the Java heap.array
Byte BufferBuffers the uploaded data in memory instead of on disk, but doesn't use the Java heap.bytebuffer
Disk BufferingThe default option. Buffers the upload to disk.disk

Both the array and bytebuffer options may lead to the consumption of large amounts of memory. Other properties (such as fs.s3a.fast.upload.active.blocks) may be used to fine-tune the migration to avoid issues.

note

If you run out of disk space on which to buffer the migration, the migration will stall with a series of errors. To avoid this, ensure the filesystem containing the directory used for buffering (/tmp by default) has enough remaining space to facilitate the transfer.

S3a Example

filesystem add s3a --file-system-id mytarget
--bucket-name mybucket1
--credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
--access-key pkExampleAccessKeyiz
--secret-key OP87ChoExampleSecretKeyI904lT7AaDBGJpp0D

IBM Cloud Object Storage Examples

  • Add source IBM Cloud Object Storage filesystem. Note that this doesn't work if SSL is used on the endpoint address.

    filesystem add s3a --source --file-system-id cos_s3_source2
    --bucket-name container2
    --credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
    --access-key pkExampleAccessKeyiz
    --secret-key c3vq6vaNtExampleSecretKeyVuqJMIHuV9IF3n9
    --s3type ibmcos
    --bootstrap.servers=10.0.0.123:9092
    --topic newcos-events--enpoint http://10.0.0.124
  • Add path mapping.

    path mapping add --path-mapping-id testPath
    --description description-string
    --source-path /
    --target targetHdfs2
    --target-path /repl_test1
    {
    "id": "testPath",
    "description": "description-string",
    "sourceFileSystem": "cos_s3_source2",
    "sourcePath": "/",
    "targetFileSystem": "targetHdfs2",
    "targetPath": "/repl_test1"
    }

filesystem auto-discover-source hdfs

Discover your local HDFS filesystem by entering the Kerberos credentials for your source environment.

You can also manually configure the source HDFS filesystem using the filesystem add hdfs command.

Auto-discover-source Hadoop Distributed File System (HDFS)
filesystem auto-discover-source hdfs    [--kerberos-principal] string
[--kerberos-keytab] string
[--scan-only]

Kerberos parameters

  • --kerberos-principal The Kerberos principal to authenticate with and perform migrations as. This principal should map to the HDFS super user using auth_to_local rules.
  • --kerberos-keytab The Kerberos keytab containing the principal defined for the --kerberos-principal parameter. This must be accessible to the local system user running the Data Migrator service (default is hdfs).

Optional

  • --scan-only Supply this parameter to create a static source filesystem for use in one-time, non-live migrations.

Example

filesystem auto-discover-source hdfs --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal hdfs@REALM.COM

filesystem clear

Delete all target filesystem references with the filesystem clear. This leaves any migrated content intact in those targets, but removes all resources that act as references to the target filesystems.

Delete all targets
filesystem clear

filesystem delete

Delete a specific filesystem resource by ID. This leaves all migrated content intact at that target, but removes the resource that acts as a reference to that filesystem.

Delete a target
filesystem delete [--file-system-id] string

Mandatory parameters

  • --file-system-id The ID of the filesystem resource to delete.

Example

filesystem delete --file-system-id mytarget

filesystem list

List defined filesystem resources.

List targets
filesystem list [--detailed]

Mandatory parameters

  • --detailed Include all properties for each filesystem in the JSON result.

filesystem show

View details for a filesystem resource.

Get target details
filesystem show [--file-system-id] string  
[--detailed]

Mandatory parameters

  • --file-system-id The ID of the filesystem resource to show.

Example

filesystem show --file-system-id mytarget

filesystem types

View information about the filesystem types available for use with Data Migrator.

List the types of target filesystems available
filesystem types

filesystem update adls2 oauth

Update an existing ADLS Gen2 container migration target with a specified filesystem ID using the filesystem update adls2 oauth command. You will be prompted to optionally update the service principal and OAuth 2 credentials.

Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add adls2 oauth section.

All parameters are optional except --file-system-id, which specifies the filesystem you want to update.

Example

filesystem update adls2 oauth --file-system-id mytarget --storage-account-name myadls2 --oauth2-client-id b67f67ex-ampl-e2eb-bd6d-client9385id --oauth2-client-secret 2IPO8*secretk-9OPs8n*TexampleHJ= --oauth2-client-endpoint https://login.microsoftonline.com/78u098ex-ampl-e498-8bce-ndpoint5f2e5/oauth2/v2.0/token --container-name lm2target

filesystem update adls2 sharedKey

Update an existing ADLS Gen2 container migration target using the filesystem update adls2 sharedKey command. You will be prompted to optionally update the secret key.

Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add adls2 sharedKey section.

All parameters are optional except --file-system-id, which specifies the filesystem you want to update.

Example

filesystem update adls2 sharedKey --file-system-id mytarget --storage-account-name myadls2 --container-name lm2target --shared-key Yi8NxHGqoQ79DBGLVn+COK/sRDwbNqAEXAMPLEDaMxRkvXt2ijUtASHAREDj/vaS/NbzR5rtjEKEY31eIopUVA==

filesystem update gcs

Update a Google Cloud Storage migration target using the filesystem update gcs command.

Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add gcs section.

All parameters are optional except --file-system-id, which specifies the filesystem you want to update.

Example

filesystem update gcs --file-system-id gcsAgent --bucket-name myGcsBucket --service-account-json-key-file /path/to/json/file.json

filesystem update gpfs

Update an IBM Spectrum Scale(GPFS) source filesystem using the filesystem update gpfs command.

Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add gpfs section.

All parameters are optional except --file-system-id, which specifies the filesystem you want to update.

Example

filesystem update gpfs --file-system-id GPFS-Source --default-fs hdfs://SourceCluster:8020 --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml

filesystem update hdfs

Update either a source or target Hadoop Distributed filesystem using the filesystem update hdfs command.

Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add hdfs section.

All parameters are optional except --file-system-id, which specifies the filesystem you want to update.

Examples

Example for source NameNode HA cluster
filesystem update hdfs  --file-system-id mysource
--default-fs hdfs://sourcenameservice
--properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
Example for source NameNode HA cluster with Kerberos enabled
filesystem update hdfs  --file-system-id mytarget
--default-fs hdfs://sourcenameservice
--properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
--kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab
--kerberos-principal hdfs@SOURCEREALM.COM

filesystem update local

Update a target or source local filesystem using the filesystem update local command.

Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add local section.

All parameters are optional except --file-system-id, which specifies the filesystem you want to update.

Example

filesystem update local --file-system-id mytarget --fs-root ./tmp

filesystem update s3a

Update an S3 bucket target filesystem using the filesystem update s3a command. This method also supports IBM Cloud Object Storage buckets.

Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add s3a section.

All parameters are optional except --file-system-id, which specifies the filesystem you want to update.

Example

filesystem update s3a   --file-system-id mytarget
--bucket-name mybucket1 --credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider --access-key pkExampleAccessKeyiz --secret-key eSeCreTkeYd8uEDnDHRHuV9IF3n9

Hive agent configuration commands

info

It's not possible to adjust some TLS parameters for remote metastore agents after creation. Find more information in the following Knowledge base article.

hive agent add azure

Add a local or remote Hive agent to connect to an Azure SQL database using the hive agent add azure command.

If your Data Migrator host can communicate directly with the Azure SQL database, then a local Hive agent is sufficient. Otherwise, consider using a remote Hive agent.

remote deployments

For a remote Hive agent connection, enter a remote host (Azure VM, HDI cluster node) to communicate with the local Data Migrator server (constrained to a user-defined port).

A small service is deployed on this remote host so that the data can transfer between the Hive agent and the remote metastore.

Add Azure SQL agent
hive agent add azure    [--name] string
[--db-server-name] string
[--database-name] string
[--database-user] string
[--database-password] string
[--auth-method] azure-sqlauthentication-method
[--client-id] string
[--storage-account] string
[--container-name] string
[--insecure] boolean
[--host] string
[--port] integer
[--no-ssl]
[--autodeploy] boolean
[--ssh-user] string
[--ssh-key] file
[--ssh-port] int
[--use-sudo]
[--ignore-host-checking]
[--file-system-id] string
[--keystore-certificate-alias] string
[--keystore-password] string
[--keystore-path] string
[--keystore-trusted-certificate-alias] string
[--keystore-type] string
[--default-fs-override] string
[--certificate-storage-type] string

Mandatory parameters

info

The Azure Hive agent requires a ADLS Gen2 storage account and container name to generate the correct location for the metadata. The agent doesn't access the container and data isn't written to it.

  • --name The ID for the new Hive agent.
  • --db-server-name The Azure SQL database server name.
  • --database-name The Azure SQL database name.
    note

    Hive Migrator doesn’t support Azure SQL database names containing blank spaces ( ), hyphens (-), semicolons (;), open curly braces ({) or close curly braces (}). Additionaly, see Microsoft's documentation for a list special characters which can't be used.

  • --storage-account The name of the ADLS Gen2 storage account.
  • --container-name The name of the container in the ADLS Gen2 storage account.
  • --auth-method The Azure SQL database connection authentication method (SQL_PASSWORD, AD_MSI, AD_INTEGRATED, AD_PASSWORD, ACCESS_TOKEN).

Additionally, use only one of the following parameters:

  • --file-system-id The name of the filesystem that will be associated with this agent (for example: myadls2storage). This will ensure any path mappings are correctly linked between the filesystem and the agent.
  • --default-fs-override Enter an override for the default filesystem URI instead of a filesystem name (for example: abfss://mycontainer@mystorageaccount.dfs.core.windows.net).

Optional parameters

  • --client-id The Azure resource's client ID.
  • --insecure Define an insecure connection (TLS disabled) to the Azure SQL database server (default is false).

Authentication parameters

Select one of the authentication methods listed and include the additional parameters required for the chosen method.

  • --auth-method The authentication method to connect to the Azure SQL server.
    The following methods can be used:
    • SQL_PASSWORD - Enter a username and password to access the database.
    • AD_MSI - Use a system-assigned or user-assigned managed identity.
Required parameters for SQL_PASSWORD
  • --database-user The username to access the database.
  • --database-password The user password to access the database.
Required parameters for AD_MSI

To use this method, complete the following prerequisites:

  • Data Migrator or the remote Azure Hive agent must be installed on an Azure resource with the managed identity assigned to it. The host must also have Azure AD authentication enabled.

  • Your Azure SQL server must be enabled for Azure AD authentication.

  • You have created a contained user in the Azure SQL database that is mapped to the Azure AD resource (where Data Migrator or the remote Azure Hive agent is installed).

    • The username of the contained user depends on whether you're using a system-assigned or user-assigned identity.

      Azure SQL database command for a system-assigned managed identity
      CREATE USER "<azure_resource_name>" FROM EXTERNAL PROVIDER;
      ALTER ROLE db_owner ADD MEMBER "<azure_resource_name>";

      The <azure_resource_name> is the name of the Azure resource where Data Migrator or the remote Azure Hive agent is installed. For example, myAzureVM).

      Azure SQL database command for a user-assigned managed identity
      CREATE USER <managed_identity_name> FROM EXTERNAL PROVIDER;
      ALTER ROLE db_owner ADD MEMBER <managed_identity_name>;

      The <managed_identity_name> is the name of the user-assigned managed identity. For example, myManagedIdentity.

After you complete the prerequisites, see the system-assigned identity or user-assigned identity parameters.

System-assigned identity

No other parameters are required for a system-managed identity.

User-assigned identity

Specify the --client-id parameter:

  • --client-id The client ID of your Azure managed identity.

Parameters for remote Hive agents only

  • --host The host where the remote Hive agent will be deployed.
  • --port The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.
  • --no-ssl TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it. You can't adjust this parameter after agent creation.
Parameters for TLS/SSL only
  • --certificate-storage-type The certificate storage type, can be specified as either FILE or KEYSTORE.
  • --keystore-certificate-alias The alias of the certificate stored in the keystore.
  • --keystore-password The password assigned to the target keystore.
  • --keystore-path The path to the target side keystore file
  • --keystore-trusted-certificate-alias The alias of the trusted certificate chain stored in the keystore.
  • --keystore-type The type of keystore specified, JKS or PKCS12
Parameters for automated deployment
  • --autodeploy The remote agent is automatically deployed when this flag is used. If using this, the --ssh-key parameter must also be specified.
  • --ssh-user The SSH user to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter).
  • --ssh-key The absolute path to the SSH private key to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter).
  • --ssh-port The SSH port to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter). Default is port 22.
  • --use-sudo All commands performed by the SSH user use sudo on the remote host when performing automatic deployment (using the --autodeploy parameter).
  • --ignore-host-checking Ignore strict host key checking when performing the automatic deployment (using the --autodeploy parameter).
Steps for manual deployment

If you do not wish to use the --autodeploy function, follow these steps to deploy a remote Hive agent for Azure SQL manually:

  1. Transfer the remote server installer to your remote host (Azure VM, HDI cluster node):

    Example of secure transfer from local to remote host
    scp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
  2. On your remote host, make the installer script executable:

    chmod +x hivemigrator-remote-server-installer.sh
  3. On your remote host, run the installer as root (or sudo) user in silent mode:

    ./hivemigrator-remote-server-installer.sh -- --silent --config <example config string>

    Find the --config string from the output of hive agent add azure command.

  4. On your remote host, start the remote server service:

    service hivemigrator-remote-server start
  5. On your local host, run the hive agent add azure command without using --autodeploy and its related parameters to configure your remote Hive agent.

    See the Example for remote Azure SQL deployment - manual example below for further guidance.

Examples

Example for local Azure SQL deployment with SQL username/password
hive agent add azure --name azureAgent --db-server-name mysqlserver.database.windows.net --database-name mydb1 --auth-method SQL_PASSWORD --database-user azureuser --database-password mypassword --storage-account myadls2 --container-name mycontainer --file-system-id myadls2storage
Example for remote Azure SQL deployment with System-assigned managed identity - automated
hive agent add azure --name azureRemoteAgent --db-server-name mysqlserver.database.windows.net --database-name mydb1 --auth-method AD_MSI --storage-account myadls2 --container-name mycontainer --file-system-id myadls2storage --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5052
Example for remote Azure SQL deployment with User-assigned managed identity - manual
hive agent add azure --name azureRemoteAgent --db-server-name mysqlserver.database.windows.net --database-name mydb1 --auth-method AD_MSI --client-id b67f67ex-ampl-e2eb-bd6d-client9385id --storage-account myadls2 --container-name mycontainer --file-system-id myadls2storage --host myRemoteHost.example.com --port 5052
remote deployments

For a remote Hive agent connection, enter a remote host (Azure VM instance) to communicate with the local Data Migrator server (constrained to a user-defined port).

A small service is deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.

hive agent add filesystem

Add a filesystem Hive agent to migrate your metadata to a specified target filesystem location using the hive agent add filesystem command.

Add filesystem agent
hive agent add filesystem    [--file-system-id] string
[--root-folder] string
[--name] string
  • --file-system-id The filesystem ID to be used.
  • --root-folder The path to use as the root directory for the filesystem agent.
  • --name The ID to give to the new Hive agent.

Example

hive agent add filesystem --file-system-id myfilesystem --root-folder /var/lib/mysql --name fsAgent

hive agent add glue

Add an AWS Glue Hive agent to connect to an AWS Glue data catalog using the hive agent add glue command.

If your Data Migrator host can communicate directly with the AWS Glue Data Catalog, then a local Hive agent will be sufficient. Otherwise, consider using a remote Hive agent.

remote deployments

For a remote Hive agent connection, enter a remote host (EC2 instance) that will be used to communicate with the local Data Migrator server (constrained to a user-defined port).

A small service will be deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.

Add AWS Glue agent
hive agent add glue     [--name] string
[--access-key] string
[--secret-key] string
[--glue-endpoint] string
[--aws-region] string
[--glue-catalog-id] string
[--credentials-provider] string
[--glue-max-retries] integer
[--glue-max-connections] integer
[--glue-max-socket-timeout] integer
[--glue-connection-timeout] integer
[--file-system-id] string
[--default-fs-override] string
[--host] string
[--port] integer
[--no-ssl]
[--keystore-certificate-alias] string
[--keystore-password] string
[--keystore-path] string
[--keystore-trusted-certificate-alias] string
[--keystore-type] string
[--certificate-storage-type] string

Glue parameters

  • --name The ID to give to the new Hive agent.
  • --glue-endpoint The AWS Glue service endpoint for connections to the data catalog. VPC endpoint types are also supported.
  • --aws-region The AWS region that your data catalog is located in (default is us-east-1). If --glue-endpoint is specified, this parameter will be ignored.

Additionally, use only one of the following parameters:

  • --file-system-id The name of the filesystem that will be associated with this agent (for example: mys3bucket). This will ensure any path mappings are correctly linked between the filesystem and the agent.
  • --default-fs-override Enter an override for the default filesystem URI instead of a filesystem name (for example: s3a://mybucket/).

Glue credential parameters

Glue optional parameters

  • --glue-catalog-id The AWS Account ID to access the Data Catalog. This is used if the Data Catalog is owned by a different account to the one provided by the credentials provider and cross-account access has been granted.
  • --glue-max-retries The maximum number of retries the Glue client will perform after an error.
  • --glue-max-connections The maximum number of parallel connections the Glue client will allocate.
  • --glue-max-socket-timeout The maximum time the Glue client will allow for an established connection to timeout.
  • --glue-connection-timeout The maximum time the Glue client will allow to establish a connection.

Parameters for remote Hive agents only

  • --host The host where the remote Hive agent will be deployed.
  • --port The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.
  • --no-ssl TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it. You can't adjust this parameter after agent creation.
Parameters for TLS/SSL only
  • --certificate-storage-type The certificate storage type, can be specified as either FILE or KEYSTORE.
  • --keystore-certificate-alias The alias of the certificate stored in the keystore.
  • --keystore-password The password assigned to the target keystore.
  • --keystore-path The path to the target side keystore file
  • --keystore-trusted-certificate-alias The alias of the trusted certificate chain stored in the keystore.
  • --keystore-type The type of keystore specified, JKS or PKCS12
Steps for remote agent deployment

Follow these steps to deploy a remote Hive agent for AWS Glue:

  1. Transfer the remote server installer to your remote host (Amazon EC2 instance):

    Example of secure transfer from local to remote host
    scp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
  2. On your remote host, run the installer as root (or sudo) user in silent mode:

    ./hivemigrator-remote-server-installer.sh -- --silent
  3. On your remote host, start the remote server service:

    service hivemigrator-remote-server start
  4. On your local host, run the hive agent add glue command to configure your remote Hive agent.

    See the Example for remote AWS Glue agent example below for further guidance.

Examples

Example for local AWS Glue agent
hive agent add glue --name glueAgent --access-key ACCESS6HCFPAQIVZTKEY --secret-key SECRET1vTMuqKOIuhET0HAI78UIPfSRjcswTKEY --glue-endpoint glue.eu-west-1.amazonaws.com --aws-region eu-west-1 --file-system-id mys3bucket
Example for remote AWS Glue agent
hive agent add glue --name glueAgent --access-key ACCESS6HCFPAQIVZTKEY --secret-key SECRET1vTMuqKOIuhET0HAI78UIPfSRjcswTKEY --glue-endpoint glue.eu-west-1.amazonaws.com --aws-region eu-west-1 --file-system-id mys3bucket --host myRemoteHost.example.com --port 5052

hive agent add hive

Add a Hive agent to connect to a local or remote Apache Hive Metastore using the hive agent add hive command.

remote deployments

When connecting to a remote Apache Hive Metastore, enter a host on the remote cluster that will be used to communicate with the local Data Migrator server (constrained to a user-defined port).

A small service will be deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.

Add local or remote Hive agent
hive agent add hive 
[--autodeploy]
[--certificate-storage-type] string
[--config-files] string
[--config-path] string
[--default-fs-override] string
[--file-system-id] string
[--force-scanning-mode]
[--host] string
[--ignore-host-checking]
[--jdbc-driver-name] string
[--jdbc-password] string
[--jdbc-url] string
[--jdbc-username] string
[--kerberos-keytab] string
[--kerberos-principal] string
[--keystore-certificate-alias] string
[--keystore-password] string
[--keystore-path] string
[--keystore-trusted-certificate-alias] string
[--keystore-type] string
[--name] string
[--no-ssl]
[--port] integer
[--ssh-key] file
[--ssh-port] int
[--ssh-user] string
[--use-sudo]
[--certificate-storage-type] string

Mandatory parameters

  • --name The ID to give to the new Hive agent.

Additionally, use only one of the following parameters:

  • --file-system-id The name of the filesystem that will be associated with this agent (for example: myhdfs). This will ensure any path mappings are correctly linked between the filesystem and the agent.
  • --default-fs-override Enter an override for the default filesystem URI instead of a filesystem name (for example: hdfs://nameservice01).

Optional parameters

  • --kerberos-principal Not required if Kerberos is disabled. The Kerberos principal to use to access the Hive service (for example: hive/myhost.example.com@REALM.COM).
  • --kerberos-keytab Not required if Kerberos is disabled. The path to the Kerberos keytab containing the principal to access the Hive service (for example: /etc/security/keytabs/hive.service.keytab).
  • --config-path For a local agent for a target metastore or when Hive config is not located in /etc/hive/conf, supply a path containing the hive-site.xml, core-site.xml, and hdfs-site.xml.
  • --config-files If the configuration files are not located on the same path, use this parameter to enter all the paths as a comma-delimited list. For example, /path1/core-site.xml,/path2/hive-site.xml,/path3/hdfs-site.xml.

When configuring a CDP target

  • --jdbc-url The JDBC URL for the database.
  • --jdbc-driver-name Full class name of JDBC driver.
  • --jdbc-username Full class name of JDBC driver.
  • --jdbc-password Password for connecting to database.
info

Don't use the optional parameters, --config-path and --config-files in the same add command.
Use --config-path when configuration files are on the same path, or --config-files when the configuration files are on separate paths.

Parameters for remote Hive agents only

  • --host The host where the remote Hive agent will be deployed.
  • --port The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.
  • --no-ssl TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it. You can't adjust this parameter after agent creation.
Parameters for TLS/SSL only
  • --certificate-storage-type The certificate storage type, can be specified as either FILE or KEYSTORE.
  • --keystore-certificate-alias The alias of the certificate stored in the keystore.
  • --keystore-password The password assigned to the target keystore.
  • --keystore-path The path to the target side keystore file
  • --keystore-trusted-certificate-alias The alias of the trusted certificate chain stored in the keystore.
  • --keystore-type The type of keystore specified, JKS or PKCS12
Parameters for automated deployment

Use the following parameters when deploying a remote agent automatically with the --autodeploy flag.

  • --autodeploy The remote agent will be automatically deployed when this flag is used. If using this, the --ssh-key parameter must also be specified.
  • --ssh-user The SSH user to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter).
  • --ssh-key The absolute path to the SSH private key to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter).
  • --ssh-port The SSH port to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter). Default is port 22.
  • --use-sudo All commands performed by the SSH user will use sudo on the remote host when performing automatic deployment (using the --autodeploy parameter).
  • --ignore-host-checking Ignore strict host key checking when performing the automatic deployment (using the --autodeploy parameter).
Steps for manual remote agent deployment

If you do not wish to use the --autodeploy function, follow these steps to deploy a remote Hive agent for Apache Hive manually:

  1. Transfer the remote server installer to your remote host:

    Example of secure transfer from local to remote host
    scp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
  2. On your remote host, make the installer script executable:

    chmod +x hivemigrator-remote-server-installer.sh
  3. On your remote host, run the installer as root (or sudo) user in silent mode:

    ./hivemigrator-remote-server-installer.sh -- --silent
  4. On your remote host, start the remote server service:

    service hivemigrator-remote-server start
  5. On your local host, run the hive agent add hive command without using --autodeploy and its related parameters to configure your remote Hive agent.

    See the Example for remote Apache Hive deployment - manual example below for further guidance.

Example for local Apache Hive deployment
hive agent add hive --name sourceAgent --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@LOCALREALM.COM --file-system-id mysourcehdfs
Example for remote Apache Hive deployment - automated
hive agent add hive --name targetautoAgent --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5052 --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@REMOTEREALM.COM --config-path <example directory path> --file-system-id mytargethdfs

Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.

Example for remote Apache Hive deployment - manual
hive agent add hive --name targetmanualAgent --host myRemoteHost.example.com --port 5052 --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@REMOTEREALM.COM --config-path <example directory path> --file-system-id mytargethdfs

Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.

info

If you enter Kerberos and configuration path information for remote agents, ensure that the directories and Kerberos principal are correct for your chosen remote host (not your local host).

info

When deploying remote agents with JDBC overrides, install the additional JDBC driver (e.g. MYSQL or PostreSQL) within /opt/wandisco/hivemigrator-remote-server/agent/hive/.

info

When deploying remote agents with keystore details, your keystore password will need to be manually entered within /etc/wandisco/hivemigrator-remote-server/agent.yaml.

See the troubleshooting guide for more information.

hive agent add iceberg

Add an Apache Iceberg agent. See the Iceberg section for additional information on this agent type.

Add an Iceberg agent
hive agent add iceberg  
[--catalog-name] string
[--catalog-type] string
[--config-files] string
[--config-path] string
[--default-fs-override] string
[--file-system-id] string
[--metastore-uri] string
[--name] string
[--username] string
[--warehouse-dir] string

Mandatory parameters

  • --catalog-name Specify the Iceberg catalog name.
  • --catalog-type Specify the type of Iceberg catalog, accepted values are currently: HIVE for the Hive metastore catalog type.
  • --file-system-id The name of the existing filesystem that will be associated with this agent.
  • --metastore-uri The thrift endpoint for the Hive Metastore, for example thrift://<host>:<port>.
  • --username Username for connection to the Hive Metastore.
  • --warehouse-dir Root path of the Iceberg warehouse storage on your filesystem.

Optional parameters

  • --name The ID/name to give this agent. If you don't supply a name it will be automatically generated.
  • --config-files Specify paths to specific files, used for additional Iceberg Hive configuration(hive-site.xml), as a comma-delimited list. For example, /path2/hive-site.xml.
  • --config-path Specify a local path containing files(hive-site.xml) used for additional Iceberg Hive configuration. For example, /path2/hive-site.xml. Ensure the user running Data Migrator can access this path.
  • --default-fs-override Enter an override for the default filesystem URI instead of a filesystem name (for example: s3a://thisbucket/).

hive agent add databricks legacy

Add a Databricks Workspace Hive Metastore target to connect to a Databricks Delta Lake metastore (AWS, Azure, or Google Cloud Platform (GCP)) using the hive agent add databricks legacy command.

info

Review and complete the prerequisites section linked here before attempting to add a Databricks Metastore Agent.

Add Databricks agent
hive agent add databricks legacy 
[--access-token] string
[--convert-to-delta] boolean
[--default-fs-override] string
[--delete-after-conversion] boolean
[--file-system-id] string
[--fs-mount-point] string
[--jdbc-http-path] string
[--jdbc-port] string
[--jdbc-server-hostname] string
[--name] string

Mandatory parameters

  • --name The ID to give to the new Hive agent.
  • --file-system-id The name of the filesystem that will be associated with this agent.
  • --jdbc-server-hostname The server hostname for the Databricks cluster (AWS, Azure or GCP).
  • --jdbc-port The port used for JDBC connections to the Databricks cluster (AWS, Azure or GCP).
  • --jdbc-http-path The HTTP path for the Databricks cluster (AWS, Azure or GCP).
  • --access-token The personal access token to be used for the Databricks cluster (AWS, Azure or GCP).
  • --fs-mount-point Enter the mount point path of your cloud storage on your DBFS (Databricks File System) for example: /mnt/mybucketname. This mount point value is required for the migration process.
    info

    The filesystem must already be mounted on DBFS. Learn more on mounting storage on Databricks for ADLS/S3/GCP filesystems.

  • --convert-to-delta Convert tables to Delta Lake format. Enter either TRUE or FALSE.

Optional parameters

  • --default-fs-override Enter the DBFS table location value in the format dbfs:<location>. If you intend to Convert to Delta format, enter the location on DBFS to store tables converted to Delta Lake. To store Delta Lake tables on cloud storage, enter the path to the mount point and the path on the cloud storage.

The following parameter can only be used if --convert-to-delta has been specified as TRUE:

  • --delete-after-conversion Delete the raw data after it has been converted to Delta format and migrated to Databricks. When selected Live migration is not possible. Enter either TRUE or FALSE.
info

If the --convert-to-delta option is set to TRUE, the --default-fs-override parameter must also be provided with the value set to dbfs:, or a path inside the Databricks filesystem. For example, dbfs:/mount/externalStorage. If the --convert-to-delta option is set to FALSE, some migrated data may not be visible from the Databricks side. To avoid this issue, ensure that the value of default-fs-override is set to "dbfs:" with the value of --fs-mount-point.

Example:

--default-fs-override dbfs:/mnt/mybucketname    

Example

Example for Workspace Hive Metastore (Legacy) Databricks agent
hive agent add databricks legacy --name LegacyExample --jdbc-server-hostname mydbcluster.cloud.databricks.com --jdbc-port 443 --jdbc-http-path sql/protocolv1/o/8445611123456789/0234-125567-testy978 --access-token *** --fs-mount-point /mnt/mybucket --convert-to-delta TRUE --default-fs-override dbfs:/mnt/mybucketname/conv --file-system-id aws-target

hive agent add databricks unity

Add a Databricks Unity Catalog agent to connect to a Databricks Delta Lake metastore (AWS, Azure, or Google Cloud Platform (GCP)) using the hive agent add databricks unity command.

info

Review and complete the prerequisites section linked here before attempting to add a Databricks Metastore Agent.

Add Databricks agent
hive agent add databricks unity 
[--access-token] string
[--catalog] string
[--convert-to-delta] boolean
[--converted-data-location] string
[--delete-after-conversion] boolean
[--external-location] string
[--file-system-id] string
[--jdbc-http-path] string
[--jdbc-port] string
[--jdbc-server-hostname] string
[--name] string
[--table-type] string

Mandatory parameters

  • --name The ID to give to the new agent.
  • --file-system-id The name of the filesystem that will be associated with this agent.
  • --jdbc-server-hostname The server hostname for the Databricks cluster (AWS, Azure or GCP).
  • --jdbc-port The port used for JDBC connections to the Databricks cluster (AWS, Azure or GCP).
  • --jdbc-http-path The HTTP path for the Databricks cluster (AWS, Azure or GCP).
  • --access-token The personal access token to be used for the Databricks cluster (AWS, Azure or GCP).
  • --catalog The name of your Databricks Unity Catalog.
    note

    You can't update an agent's Unity Catalog while it's in an active migration.

  • --external-location The full URI of your storage path from the external location configured in Databricks. For example, abfss://container@account.dfs.core.windows.net/
    info

    Ensure the external location you specify has been created in Databricks. Learn more from Azure, AWS and GCP.

  • --convert-to-delta Convert tables to Delta Lake format. Enter either FALSE or TRUE.

Optional parameters

The following parameters can only be used if --convert-to-delta has been set toTRUE:

  • --delete-after-conversion Delete the raw data after it has been converted to Delta format and migrated to Databricks. When selected Live migration is not possible. Enter either FALSE or TRUE.
  • --table-type Specify how converted tables are migrated. Enter either MANAGED to convert Hive source tables to managed delta or EXTERNAL to convert Hive source tables to external delta.
  • --converted-data-location If your --table-type is EXTERNAL, specify the full URI of the external location to store the tables converted to Delta Lake. For example, abfss://container@account.dfs.core.windows.net/converted
note

When using a Unity Catalog agent, Delta tables, when migrated, are created as external tables in Databricks regardless of --table-type specified. Other Source formats are created as managed Delta tables and data is converted and copied into the table.

Example

Example Unity Catalog Databricks agent
hive agent add databricks unity --name UnityExample --file-system-id FStarget1  --jdbc-server-hostname 123.azuredatabricks.net --jdbc-port 443 --jdbc-http-path sql/pro/o/2517/0417-19-example --access-token actoken123 --catalog cat1 --external-location abfss://container@account.dfs.core.windows.net --convert-to-delta TRUE --table-type EXTERNAL --converted-data-location abfss://container@account.dfs.core.windows.net/converted

hive agent add dataproc

Add a Hive agent to connect to a local or remote Google Dataproc Metastore using the hive agent add dataproc command.

remote deployments

When connecting to a remote Dataproc Metastore, enter a host on the remote cluster that will be used to communicate with the local Data Migrator server (constrained to a user-defined port).

A small service will be deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.

Add local or remote Dataproc agent
hive agent add dataproc [--config-path] string
[--kerberos-principal] string
[--kerberos-keytab] string
[--name] string
[--host] string
[--port] integer
[--no-ssl]
[--autodeploy]
[--ssh-user] string
[--ssh-key] file
[--ssh-port] int
[--use-sudo]
[--ignore-host-checking]
[--keystore-certificate-alias] string
[--keystore-password] string
[--keystore-path] string
[--keystore-trusted-certificate-alias] string
[--keystore-type] string
[--file-system-id] string
[--default-fs-override] string
[--certificate-storage-type] string

Mandatory parameters

  • --kerberos-principal Not required if Kerberos is disabled. The Kerberos principal to use to access the Hive service (for example: hive/myhost.example.com@REALM.COM).
  • --kerberos-keytab Not required if Kerberos is disabled. The path to the Kerberos keytab containing the principal to access the Hive service (for example: /etc/security/keytabs/hive.service.keytab).
  • --name The ID to give to the new Hive agent.

Additionally, use only one of the following parameters:

  • --file-system-id The name of the filesystem that will be associated with this agent (for example: myhdfs). This will ensure any path mappings are correctly linked between the filesystem and the agent.

Optional parameters

  • --default-fs-override Enter an override for the default filesystem URI instead of a filesystem name (for example: hdfs://nameservice01).
  • --config-path The path to the directory containing the Hive configuration files core-site.xml, hive-site.xml and hdfs-site.xml. If not specified, Data Migrator will use the default location for the cluster distribution.

Parameters for remote Hive agents only

  • --host The host where the remote Hive agent will be deployed.
  • --port The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.
  • --no-ssl TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it. You can't adjust this parameter after agent creation.
Parameters for TLS/SSL only
  • --certificate-storage-type The certificate storage type, can be specified as either FILE or KEYSTORE.
  • --keystore-certificate-alias The alias of the certificate stored in the keystore.
  • --keystore-password The password assigned to the target keystore.
  • --keystore-path The path to the target side keystore file
  • --keystore-trusted-certificate-alias The alias of the trusted certificate chain stored in the keystore.
  • --keystore-type The type of keystore specified, JKS or PKCS12
Parameters for automated deployment
  • --autodeploy The remote agent will be automatically deployed when this flag is used. If using this, the --ssh-key parameter must also be specified.
  • --ssh-user The SSH user to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter).
  • --ssh-key The absolute path to the SSH private key to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter).
  • --ssh-port The SSH port to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter). Default is port 22.
  • --use-sudo All commands performed by the SSH user will use sudo on the remote host when performing automatic deployment (using the --autodeploy parameter).
  • --ignore-host-checking Ignore strict host key checking when performing the automatic deployment (using the --autodeploy parameter).
Steps for manual deployment

If you do not wish to use the --autodeploy function, follow these steps to deploy a remote Hive agent for Apache Hive manually:

  1. Transfer the remote server installer to your remote host:

    Example of secure transfer from local to remote host
    scp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
  2. On your remote host, make the installer script executable:

    chmod +x hivemigrator-remote-server-installer.sh
  3. On your remote host, run the installer as root (or sudo) user in silent mode:

    ./hivemigrator-remote-server-installer.sh -- --silent
  4. On your remote host, start the remote server service:

    service hivemigrator-remote-server start
  5. On your local host, run the hive agent add dataproc command without using --autodeploy and its related parameters to configure your remote Hive agent.

    See the Example for remote Apache Hive deployment - manual example below for further guidance.

Examples

Example for local Apache Hive deployment
hive agent add dataproc --name sourceAgent --file-system-id mysourcehdfs
Example for remote Apache Hive deployment - automated
hive agent add dataproc --name targetautoAgent --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5052 --config-path <example directory path> --file-system-id mytargethdfs

Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.

Example for remote Apache Hive deployment - manual
hive agent add dataproc --name targetmanualAgent --host myRemoteHost.example.com --port 5052 --config-path <example directory path> --file-system-id mytargethdfs

Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.

note

If you enter Kerberos and configuration path information for remote agents, ensure that the directories and Kerberos principal are correct for your chosen remote host (not your local host).

hive agent add snowflake basic

Add an agent using basic authentication.

Add a Snowflake agent using basic authentication
hive agent add snowflake basic  [--account-identifier] string    
[--file-system-id] string
[--name ] string
[--password] string
[--stage] string
[--stage-schema] string
[--warehouse] string
[--default-fs-override] string
[--schema] string
[--stage-database] string
[--user] string
[--network-timeout] int
[--query-timeout] int
[--role] string

Mandatory parameters

  • --account-identifier is the unique ID for your Snowflake account.
  • --name is a name that will be used to reference the remote agent.
  • --warehouse is the Snowflake-based cluster of compute resources.
  • --stage storage used to temporarily store data that is being moved into Snowflake. A stage can be internal (to Snowflake), or external, using a cloud storage service from Amazon S3, Azure, or Google Cloud.
  • --user is your Snowflake username.

Additionally, use only one of the following parameters:

  • --file-system-id is the ID of the target filesystem.
  • --default-fs-override is an override for the default filesystem URI instead of a filesystem name.

Optional parameters

  • --stage-database is an optional parameter for a Snowflake stage database with the default value "WANDISCO".
  • --stage-schema - is an optional parameter for a Snowflake stage schema, with the default value "PUBLIC".
  • --schema - is an optional parameter for a Snowflake schema, with the default value "PUBLIC".
  • --role - you can enter a custom role for the JDBC connection used by Hive Migrator.
Timeout parameters
  • --network-timeout - Number of milliseconds to wait for a response when interacting with the Snowflake service before returning an error.
  • --query-timeout - Number of seconds to wait for a query to complete before returning an error.

Examples

Example of adding a Snowflake agent with basic authentication
hive agent add snowflake basic --account-identifier test_adls2 --name snowflakeAgent --stage myAzure --user exampleUser -- password examplePassword --warehouse DemoWH2

hive agent add snowflake privatekey

Add a Snowflake agent using a private key
hive agent add snowflake privatekey     [--account-ID] string    
[--file-system-id] string
[--private-key-file] string
[--private-key-file-pwd] string
[--schema] string
[--stage-database] string
[--warehouse] string
[--default-fs-override] string
[--name] string
[--stage] string
[--stage-schema] string
[--user] string

Mandatory parameters

  • --account-identifier is the unique ID for your Snowflake account.
  • --private-key-file is the path to your private key file.
  • --private-key-file-pwd is the password that corresponds with the above private-key-file.
  • --name is a name that will be used to reference the remote agent.
  • --warehouse is the Snowflake-based cluster of compute resources.
  • --stage storage used to temporarily store data that is being moved into Snowflake. A stage can be internal (to Snowflake), or external, using a cloud storage service from Amazon S3, Azure, or Google Cloud.
  • --user is your Snowflake username.

Additionally, use only one of the following parameters:

  • --file-system-id is the ID of the target filesystem.
  • --default-fs-override is an override for the default filesystem URI instead of a filesystem name.

Optional parameters

  • --stage-database is an optional parameter for a Snowflake stage database with the default value "WANDISCO".
  • --stage-schema - is an optional parameter for a Snowflake stage schema, with the default value "PUBLIC".
  • --schema - is an optional parameter for a Snowflake schema, with the default value "PUBLIC".

hive agent check

Check the configuration of an existing Hive agent using hive agent check.

Check if agent configuration is valid & connectable
hive agent check [--name] string

Example

hive agent check --name azureAgent

hive agent configure azure

Change the configuration of an existing Azure Hive agent using hive agent configure azure.

The parameters that can be changed are the same as the ones listed in the hive agent add azure section.

All parameters are optional except --name, which is required to enter the existing Hive agent that you wish to configure.

Example

hive agent configure azure --name azureAgent --database-password CorrectPassword

hive agent configure filesystem

Change the configuration of an existing filesystem Hive agent using hive agent configure filesystem.

The parameters that can be changed are the same as the ones listed in the hive agent add filesystem section.

All parameters are optional except --name, which is required to enter the existing Hive agent that you wish to configure.

Example

hive agent configure filesystem --name fsAgent --root-folder /user/dbuser/databases

hive agent configure glue

Change the configuration of an existing AWS Glue Hive agent using hive agent configure glue.

The parameters that can be changed are the same as the ones listed in the hive agent add glue section.

All parameters are optional except --name, which is required to enter the existing Hive agent that you wish to configure.

Example

hive agent configure glue --name glueAgent --aws-region us-east-2

hive agent configure hive

Change the configuration of an existing Apache Hive agent using hive agent configure hive.

The parameters that can be changed are the same as the ones listed in the hive agent add hive section.

All parameters are optional except --name, which is required to enter the existing Hive agent that you wish to configure.

Example

hive agent configure hive --name sourceAgent --kerberos-keytab /opt/keytabs/hive.keytab --kerberos-principal hive/myhostname.example.com@REALM.COM

hive agent configure databricks legacy

Adjust the configuration of an existing Databricks Workspace Hive Metastore target using hive agent configure databricks legacy.

The parameters that can be changed are the same as listed in the hive agent add databricks legacy section.

All parameters are optional except --name, which is required to select the existing agent you need to reconfigure.

Example

hive agent configure databricks legacy --name databricksAgent --access-token myexamplefg123456789t6fnew7dfdtoken4

hive agent configure databricks unity

Adjust the configuration of an existing Databricks Unity Catalog agent using hive agent configure databricks unity.

The parameters that can be changed are the same as listed in the hive agent add databricks unity section.

All parameters are optional except --name, which is required to select the existing agent that you need to reconfigure.

note

You can't update an agent's Unity Catalog while it's in an active migration. See the 'hive migration add databricks unity' and 'hive migration add databricks legacy' comamnds for override options when creating a migration.

Example

hive agent configure databricks unity --name databricksUnityAgent --access-token myexamplefg123456789t6fnew7dfdtoken4

hive agent configure dataproc

Change the configuration of an existing Dataproc agent using hive agent configure dataproc.

The parameters that can be changed are the same as the ones listed in the hive agent add dataproc section.

All parameters are optional except --name, which is required to enter the existing Hive agent that you wish to configure.

Example

hive agent configure dataproc --name dataprocAgent --port 9099

hive agent configure iceberg

Change the configuration of an existing Iceberg agent.

You can change config-path, metastore-uri, warehouse-dir, username, catalog-name, and file-system-id.

You can find these parameters with descriptions listed in the hive agent add iceberg section.

All parameters are optional except --name, which is required to specify the agent you want to change.

Example

hive agent configure iceberg --name ice1 --username admin2

hive agent configure snowflake

Configure an existing Snowflake remote agent by using the hive agent configure snowflake command.

Add a remote Snowflake agent using basic authentication
hive agent configure snowflake basic    [--account-identifier] string   
[--file-system-id] string
[--user] string
[--password] string
[--stage] string
[--stage-schema] string
[--warehouse] string
[--default-fs-override] string
[--name] string
[--schema] string
[--stage-database] string

Example Snowflake remote agent configuration

hive agent configure snowflake basic --user snowflakeAgent --password <password-here> --stage internal
Configure a remote Snowflake agent using privatekey authentication
hive agent configure snowflake privatekey       [--account-identifier] string     
[--file-system-id] string
[--private-key-file] string
[--private-key-file-pwd] string
[--schema] string
[--stage-database] string
[--warehouse] string
[--default-fs-override] string
[--name] string
[--stage] string
[--stage-schema] string

Example Snowflake remote agent configuration

hive agent configure snowflake privatekey --private-key-file-pwd <password> --private-key-file /path/to/keyfiles/ --user snowflakeAgent --schema star-schema

hive agent delete

Delete the specified Hive agent with hive agent delete.

Delete agent
hive agent delete [--name] string

Example

hive agent delete --name azureAgent

hive agent list

List configured Hive agents with hive agent list.

List already added agents
hive agent list [--detailed]

Example

hive agent list --detailed

hive agent show

Show the configuration of a Hive agent with hive agent show.

Show agent configuration
hive agent show [--name] string

Example

hive agent show --name azureAgent

hive agent types

Print a list of supported Hive agent types with hive agent types.

Print list of supported agent types
hive agent types

Example

hive agent types

Exclusion commands

exclusion add date

Create a date-based exclusion that checks the 'modified date' of any directory or file that the Data Migrator encounters during a migration to which the exclusion has been applied. If the path or file being examined by Data Migrator has a 'modified date' earlier than the specified date, it will be excluded from the migration.

Once associated with a migration using migration exclusion add, files that match the policy will not be migrated.

Create a new date-based rule
exclusion add date      [--exclusion-id] string
[--description] string
[--before-date] string

Mandatory parameters

  • --exclusion-id The ID for the exclusion policy.
  • --description A user-friendly description for the policy.
  • --before-date An ISO formatted date and time, which can include an offset for a particular time zone.

Example

exclusion add date --exclusion-id beforeDate --description "Files earlier than 2020-10-01T10:00:00PDT" --before-date 2020-10-01T10:00:00-07:00

exclusion add file-size

Create an exclusion that can be applied to migrations to constrain the files transferred by a policy based on file size. Once associated with a migration using migration exclusion add, files that match the policy will not be migrated.

Create a new exclusion by file size policy
exclusion add file-size [--exclusion-id] string
[--description] string
[--value] long
[--unit] string

Mandatory parameters

  • --exclusion-id The ID for the exclusion policy.
  • --description A user-friendly description for the policy.
  • --value The numerical value for the file size, in a unit defined by the --unit parameter.
  • --unit A string to define the unit used. You can use B for bytes, GB for gigabytes, KB for kilobytes, MB for megabytes, PB for petabytes, TB for terabytes, GiB for gibibytes, KiB for kibibytes, MiB for mebibytes, PiB for pebibytes, or TiB for tebibytes when creating exclusions with the CLI.

Example

exclusion add file-size --exclusion-id 100mbfiles --description "Files greater than 100 MB" --value 100 --unit MB

exclusion add regex

Create an exclusion using a regular expression to prevent certain files and directories being transferred based on matching file or directory names. Once associated with a migration using migration exclusion add, files and directories that match the regular expression will not be migrated.

Create a new exclusion by regular expression policy
exclusion add regex     [--exclusion-id] string
[--description] string
[--regex] string
[--type] string

Mandatory parameters

  • --exclusion-id The ID for the exclusion policy.
  • --description A user-friendly description for the policy.
  • --regex A regular expression in a syntax of either Java PCRE, Automata or GLOB type.

Optional parameters

  • --type Choose the regular expression syntax type. There are three options available:

    1. JAVA_PCRE (default)
    2. AUTOMATA
    3. GLOB

Examples

Example glob pattern
exclusion add regex --description "No paths or files that start with test" --exclusion-id exclusion1 --type GLOB --regex test*
Example Java PCRE pattern
exclusion add regex --description "No paths of files that start with test" --exclusion-id exclusion1 --regex ^test\.*

Using backslash characters within --regex parameter

If you wish to use a \ character as part of your regex value, you must escape this character with an additional backslash.

Example
exclusion add regex --description "No paths that start with a backslash followed by test"  --exclusion-id exclusion2 --regex ^\\test\.*

The response displayed if running through the CLI will not hide the additional backslash. However, the internal representation will be as expected within Data Migrator (it will read as ^\test.*).

This workaround isn't required for API inputs, as it only affects the Spring Shell implementation used for the CLI.

exclusion add age

Files less than this age at the time of scanning are excluded. Files need to be this age or older to be migrated.

Create an age-based exclusion that checks the 'modified date' of any file that the Data Migrator encounters during a migration to which the exclusion has been applied. At scan time, the age of a file is determined by the difference between the current scan time and the files' modification time. If the file examined has an age less than the age specified, it will be excluded from the migration.

Once associated with a migration using migration exclusion add, files that match the policy will not be migrated.

Create a new age-based rule
exclusion add age       [--exclusion-id] string
[--description] string
[--unit] string
[--value] long

Mandatory parameters

  • --exclusion-id The ID for the exclusion policy.
  • --description A user-friendly description for the policy.
  • --unit The time unit of the value supplied, use DAYS, HOURS or MINUTES.
  • --value The number of units.

Example

exclusion add age --exclusion-id ExcludeLessThan10d --description "Exclude files changed in the last 10 days" --unit DAYS --value 10

exclusion check regex

Check if a given GLOB, JAVA_PCRE or AUTOMATA regex pattern will match a given path.

Mandatory parameters

  • --regex The regex pattern to be checked.
  • --type Regex pattern type from either GLOB, JAVA_PCRE and AUTOMATA
  • --path The path being checked.

Example

exclusion check regex --path /data/1 --regex [1-4] --type JAVA_PCRE

exclusion delete

Delete an exclusion policy so that it is no longer available for migrations.

exclusion delete [--exclusion-id] string

Mandatory parameters

  • --exclusion-id The ID for the exclusion policy to delete.

Example

exclusion delete --exclusion-id exclusion1

exclusion list

List all exclusion policies defined.

List all exclusion policies
exclusion list

exclusion show

Get details for an individual exclusion policy by ID.

Get details for a specific exclusion rule
exclusion show [--exclusion-id] string

Mandatory parameters

  • --exclusion-id The ID for the exclusion policy to show.

Example

exclusion show --exclusion-id 100mbfiles

exclusion source list

See the restrictions applied automatically on the source file system.

Mandatory parameters

  • --fs-type The file system type, choose either adls2, gcs, hdfs, local, or s3a.

Example

exclusion source list --fs-type hdfs

exclusion target list

See the restrictions applied automatically on the target file system.

Mandatory parameters

  • --fs-type The file system type, choose either adls2, gcs, hdfs, local, or s3a.

Example

exclusion target list --fs-type adls2

exclusion user-defined list

See a list of all user defined restrictions.

Example

exclusion user-defined list

Migration commands

migration add

Create a new migration to initiate data migration from your source filesystem.
migration add   [--action-policy] string
[--auto-start]
[--detailed]
[--exclusions] string
[--management-group] string
[--migration-id] string
[--name] string
[--path] string
[--priority] string
[--recurring-migration]
[--recurring-period]
[--scan-only]
[--source] string
[--target] string
[--target-match]
[--verbose]
caution

Do not write to target filesystem paths when a migration is underway. This could interfere with Data Migrator functionality and lead to undetermined behavior.

Use different filesystem paths when writing to the target filesystem directly (and not through Data Migrator).

Mandatory parameters

  • --path Defines the source filesystem directory that is the scope of the migration. All content (other than that excluded) will be migrated to the target.
note

ADLS Gen2 has a filesystem restriction of 60 segments. Make sure your path has less than 60 segments when defining the path string parameter.

  • --target Specifies the name of the target filesystem resource to which migration will occur.

Optional parameters

  • --name or --migration-id Enter a name or ID for the new migration. An ID is auto-generated if you don't enter one.
  • --exclusions A comma-separated list of exclusions by name.
  • --auto-start Enter this parameter if you want the migration to start immediately. If you don't enter one, the migration will only take effect once you start to run it.
  • --priority Enter this parameter with a value of high, normal, or low to assign a priority to your migration. Higher-priority migrations are processed first. If not specified, migration priority defaults to normal.
  • --action-policy This parameter determines what happens if the migration encounters content in the target path with the same name and size.
    There are two options available:
    1. com.wandisco.livemigrator2.migration.OverwriteActionPolicy (default policy)
      Every file is replaced, even if the file size is identical on the target storage. This option is incompatible with the --recurring-migration option. Use the SkipIfSizeMatchActionPolicy parameter instead.
    2. com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy
      If the file size is identical between the source and target, the file is skipped. If it’s a different size, the whole file is replaced.
  • --source Specifies the name of the source filesystem.
  • --scan-only Enter this parameter to create a one-time migration.
  • --target-match Enable Target Match on this migration to scan source and target and remove extra files from target. If not present Target Match is disabled except for live migrations on ADLS Gen2 and IBM Spectrum Scale sources. See Target Match for more info.
  • --verbose Enter this parameter to add additional information to the output for the migration.
  • --detailed Alternative name for --verbose.
  • --recurring-migration Add this parameter to enable periodic rescanning of the migration. See Recurring-migration.
  • --recurring-period Enter a period to schedule the time between migration scan iterations. For example, 12H(hours) or 30M (minutes).
  • --management-group If using Migration Managment Delegation, and are an admin user, supply the group name to assign this new migration to an existing group.

Example

migration add --path /repl1 --target mytarget –-migration-id myNewMigration --exclusions 100mbfiles

migration delete

Delete a stopped migration.

Delete a migration
migration delete [--name or --migration-id] string

Mandatory parameters

  • --name or --migration-id The name or ID of the migration to delete.

Example

migration delete --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e 

Optional parameters

  • --without-assets Leave associated assets in place.

Example

migration delete --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e --without-assets

migration exclusion add

Associate an exclusion resource with a migration so that the exclusion policy applies to items processed for the migration. Exclusions must be associated with a migration before they take effect.

Add an exclusion to a migration
migration exclusion add [--name or --migration-id] string
[--exclusion-id] string

Mandatory parameters

  • --name or --migration-id The migration name or ID with which to associate the exclusion.
  • --exclusion-id The ID of the exclusion to associate with the migration.

Example

migration exclusion add --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e --exclusion-id myexclusion1

migration exclusion delete

Remove an exclusion from association with a migration so that its policy no longer applies to items processed for the migration.

Remove an exclusion from a migration
migration exclusion delete      [--name or --migration-id] string
[--exclusion-id] string

Mandatory parameters

  • --name or --migration-id The migration name or ID from which to remove the exclusion.
  • --exclusion-id The ID of the exclusion to remove from the migration.

Example

migration exclusion delete --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e --exclusion-id myexclusion1

migration list

Present the list of all migrations defined.

List running and active migrations
migration list [--detailed or --verbose]

Optional parameters

  • --detailed or --verbose Returns additional information about each migration.

migration path status

View all actions scheduled on a source filesystem in the specified path.

Show information on the migration status of a path on the source filesystem
migration path status   [--source-path] string
[--source] string

Mandatory parameters

  • --source-path The path on the filesystem to review actions for. Supply a full directory.
  • --source The filesystem ID of the source system the path is in.

Example

migration path status --source-path /root/mypath/ --source mySource

migration pending-region add

Add a path for rescanning to a migration.

Add a path for rescanning to a migration
migration pending-region add    [--name or --migration-id] string
[--path] string
[--action-policy] string

Mandatory parameters

  • --name or --migration-id The migration name or ID.
  • --path The path string of the region to add for rescan.

Optional parameters

  • --action-policy This parameter determines what happens if the migration encounters content in the target path with the same name and size.
    There are two options available:
    1. com.wandisco.livemigrator2.migration.OverwriteActionPolicy (default policy)
      Every file is replaced, even if file size is identical on the target storage.
    2. com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy
      If the file size is identical between the source and target, the file is skipped. If it’s a different size, the whole file is replaced.

Example

migration pending-region add --name myMigration --path etc/files --action-policy com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy

migration reset

Reset a stopped migration to the state it was in before it was started. This deletes and replaces it with a new migration that has the same settings as the old one.

Reset a migration
migration reset  [--name or --migration-id] string
[--action-policy] string
[--reload-mappings]
[--target-match] ENABLE|DISABLE
[--detailed or --verbose]

Mandatory parameters

  • --name or --migration-id The name of the migration you want to reset.
  • --migration-id The ID of the migration you want to reset.

Optional parameters

  • --action-policy Accepts two string values: com.wandisco.livemigrator2.migration.OverwriteActionPolicy causes the new migration to re-migrate all files from scratch, including those already migrated to the target filesystem, regardless of file size. com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy skips migrating files that exist on both the target and source, if the file size is consistent between them. Use tab auto-completion with this parameter to view both options and a short description of each.
  • --reload-mappings Resets the migration's path mapping configuration, using the newest default path mapping configuration for Data Migrator.
  • --target-match Enable or disable the Target Match option when resetting the migration. Use either 'ENABLE' or 'DISABLE'.
  • --detailed or --verbose Returns additional information about the reset migration, similarly to migration show.

Example

migration reset --name mymigration

migration resume

Resume a migration that you've stopped from transferring content to its target.

Resume a migration
migration resume        [--name or --migration-id] string
[--detailed or --verbose]

Mandatory parameters

  • --name or --migration-id The migration name or ID to resume.

Example

migration resume --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e

migration run,migration start

Start a migration that was created without the --auto-start parameter.

Start a migration.
migration run   [--name or --migration-id] string
[--detailed or --verbose]

Mandatory parameters

  • --name or --migration-id The migration name or ID to run.

Optional parameters

  • --detailed or --verbose Outputs additional information about the migration.

Example

migration run –-migration-id myNewMigration

migration show

Enter a JSON description of a specific migration.

Get migration details
migration show  [--name or --migration-id] string
[--detailed or --verbose]

Mandatory parameters

  • --name or --migration-id The migration name or ID to show.

Optional parameters

  • --detailed or --verbose Outputs additional information about the migration.

Example

migration show --name myNewMigration

migration stats

Show migration statistics for an individual migration including the number of files, directories, bytes migrated, scanned, and excluded, as well as, files and directories removal actions taken by Target Match.

Stop a migration
migration stats [--name or --migration-id] string

migration stop

Stop a migration from transferring content to its target, placing it into the STOPPED state. Stopped migrations can be resumed.

Stop a migration
migration stop [--name or --migration-id] string

Mandatory parameters

  • --name or --migration-id The migration name or ID to stop.

Example

migration stop --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e

migration stop all

Stop all migrations from transferring content to their targets, placing them into the STOPPED state. Stopped migrations can be resumed.

Stop all migrations
migration stop all

migration update configuration

Update a migration's recurring scan period.

Update migration configuration
migration update configuration  [--name or --migration-id] string                                  
[--recurring-period] string
[--detailed or --verbose] string

Mandatory parameters

  • --name or --migration-id Enter the migration name or ID.
  • --recurring-period Enter a period to schedule the time between migration scan iterations. For example, 12H (12 hours).
  • --detailed or --verbose Include all configuration properties for the source filesystem in the response.

migration verification start

Trigger a new verification for a migration.

Trigger a new verification
migration verification start    [--name or --migration-id] string  
[--depth] integer
[--date] string
[--paths] string

Mandatory parameters

  • --name or --migration-id Enter the migration name or ID.
  • --depth Enter a number to specify how deep in the directory you want to run the verification check. This number must be equal to or less than the total number of levels in the directory structure of your migration. The default value is zero. Zero means there's no limit to the verification depth.
  • --date Enter a date and time as a verification cutoff point in YYYY-MM-DD-THH:MM format.
  • --paths Enter a comma-separated list of paths to verify.

Example

Trigger a new verification for a migration
migration verification start --migration-id myNewMigration --depth 0 --date 2022-11-15T16:24 --paths /MigrationPath
note

The verification status will show the number of missing paths and files on the target filesystem and the number of file size mismatches between the source and target. You can view the verification status using migration verification show for individual verification jobs or migration verification list for all verification jobs.

migration verification list

List summaries for all or specified verifications.

List summaries for all or specified verifications
migration verification list    [--name or --migration-id] string  
[--states] string

Optional parameters

  • --name or --migration-id Enter the migration name or ID. If not specified, the default is to display summaries for all verifications. You can enter multiple migration names or IDs in a comma-separated list.
  • --states Enter the migration state(s) (IN_PROGRESS, QUEUED, COMPLETED, or CANCELLED) for which you want to list summaries.

Examples

List summaries for all verifications.
migration verification list
List in-progress and queued verification summaries for a specific migration
migration verification list --migration-id myNewMigration --states IN_PROGRESS,QUEUED

migration verification show

Show the status of a specific migration verification.

Show a verification job for a migration
migration verification show [--verification-id] string

Mandatory parameters

  • --verification-id Show the status of the verification job for this verification ID (only one verification job can be running per migration).

Example

Example status of a completed verification
Cirata LiveData Migrator >> migration verification show --verification-id 91c79b1b-c61f-4c39-be61-18072ac3a086-1676979356465
{
"id": "91c79b1b-c61f-4c39-be61-18072ac3a086-1676979356465",
"migrationId": "ver1",
"migrationInternalId": "91c79b1b-c61f-4c39-be61-18072ac3a086",
"status": "COMPLETE",
"createdTimestamp": 1676979356467,
"startedTimestamp": 1676979356518,
"finishedTimestamp": 1676979356598,
"createdAt": "2023-02-21T11:35:56.467Z",
"startedAt": "2023-02-21T11:35:56.518Z",
"finishedAt": "2023-02-21T11:35:56.598Z",
"paths": [
"/DATA/d1"
],
"ignoreAfterTimestamp": 1676978431233,
"originalPaths": [
"/DATA/d1"
],
"verificationDepth": 0,
"filesOnSource": 1,
"directoriesOnSource": 0,
"bytesOnSource": 842,
"filesExcluded": 0,
"filesExcludedExistsOnTarget": 0,
"filesExcludedNotExistsOnTarget": 0,
"dataExcluded": 0,
"bytesExcluded": 0,
"bytesExcludedExistsOnTarget": 0,
"bytesExcludedNotExistsOnTarget": 0,
"directoriesExcluded": 0,
"directoriesExcludedExistsOnTarget": 0,
"directoriesExcludedNotExistsOnTarget": 0,
"filesOnTarget": 1,
"directoriesOnTarget": 0,
"bytesOnTarget": 842,
"filesMissingOnTarget": 0,
"directoriesMissingOnTarget": 0,
"filesMissingOnSource": 0,
"directoriesMissingOnSource": 0,
"fileSizeMismatches": 0,
"totalDiscrepancies": 0
}

migration verification stop

Stop a queued or in-progress migration verification.

Stop a migration verification
migration verification stop  [--verification-id] string  

Mandatory parameters

  • --verification-id Enter the ID of the verification that has been started, for example db257c03-697b-48a5-93cc-abc23838d37d-1668593022565. You can find the verification ID in the output of the migration verification list command.

Example

Stop a migration verification
migration verification stop --verification-id ab123c03-697b-48a5-93cc-abc23838d37d-1668593022565

migration verification report

Download a full verification report.

Download a verification report
migration verification report   [--verification-id] string  
[--out-dir] string

Mandatory parameters

  • --verification-id Enter the ID of the verification for which you want to download a report. You can find the verification ID in the output of the migration verification list command.
  • --out-dir Enter your chosen folder for the report download.

Examples

Download a verification report
migration verification report --verification-id ab123c03-697b-48a5-93cc-abc23838d37d-1668593022565 --out-dir /user/exampleVerificationDirectory

status

Get a text description of the overall status of migrations. Information is provided on the following:

  • Total number of migrations defined.
  • Average bandwidth being used over 10s, 60s, and 300s intervals.
  • Peak bandwidth observed over 300s interval.
  • Average file transfer rate per second over 10s, 60s, and 300s intervals.
  • Peak file transfer rate per second over a 300s interval.
  • List of migrations, including one-time migrations, with source path and migration id, and with current progress broken down by migration state: completed, live, stopped, running and ready.
Get migration status
status  [--diagnostics]  
[--migrations]
[--network]
[--transfers]
[--watch]
[--refresh-delay] int
[--full-screen]

Optional parameters

  • --diagnostics Returns additional information about your Data Migrator instance and its migrations, useful for troubleshooting.
  • --migrations Displays information about each running migration.
  • --network Displays file transfer throughput in Gib/s during the last 10 seconds, 1 minute and 30 minutes.
  • --transfers Displays overall performance information about data transfers across the last 10 seconds, 1 minute and 30 minute intervals.
  • --watch Auto-refresh.
  • --refresh-delay Auto-refresh interval (in seconds).
  • --full-screen Auto-refresh fullscreen

Examples

Status
Cirata LiveMigrator >> status

Network (10s) (1m) (30m)

Average Throughput: 10.4 Gib/s 9.7 Gib/s 10.1 Gib/s
Average Files/s: 425 412 403

11 Migrations dd:hh:mm dd:hh:mm

Complete: 1 Transferred Excluded Duration
/static1 5a93d5 67.1 GiB 2.3 GiB 00:12:34

Live: 3 Transferred Excluded Duration
/repl1 9088aa 143.2 GiB 17.3 GiB 00:00:34
/repl_psm1 a4a7e6 423.6 TiB 9.6 GiB 02:05:29
/repl5 ab140d 118.9 GiB 1.2 GiB 00:00:34

Running: 5 Transferred Excluded Duration Remaining
/repl123 e3727c 30.3/45.2 GiB 67% 9.8 GiB 00:00:34 00:00:17
/repl2 88e4e7 26.2/32.4 GiB 81% 0.2 GiB 00:01:27 00:00:12
/repl3 372056 4.1/12.5 GiB 33% 1.1 GiB 00:00:25 00:01:05
/repl4 6bc813 10.6/81.7 TiB 8% 12.4 GiB 00:04:21 01:02:43
/replxyz dc33cb 2.5/41.1 GiB 6% 6.5 GiB 01:00:12 07:34:23

Ready: 2
/repl7 070910 543.2 GiB
/repltest d05ca0 7.3 GiB

Cirata LiveMigrator >> status
Status with --transfers
Cirata LiveMigrator >> status --transfers

Files (10s) (1m) (30m)

Average Migrated/s: 362 158 4781
< 1 KB 14 27 3761
< 1 MB 151 82 0
< 1 GB 27 1 2
< 1 PB 0 0 0
< 1 EB 0 0 0

Peak Migrated/s: 505 161 8712
< 1 KB 125 48 7761
< 1 MB 251 95 4
< 1 GB 29 7 3
< 1 PB 0 0 0
< 1 EB 0 0 0

Average Scanned/s: 550 561 467
Average Rescanned/s: 24 45 56
Average Excluded/s: 7 7 6
Status with --diagnostics
Cirata LiveMigrator >> status --diagnostics

Uptime: 0 Days 1 Hours 23 Minutes 24 Seconds
SystemCpuLoad: 0.1433 ProcessCpuLoad: 0.0081
JVM GcCount: 192 GcPauseTime: 36 s (36328 ms)
OS Connections: 1, Tx: 0 B, Rx: 0 B, Retransmit: 0
Transfer Bytes (10/30/300s): 0.00 Gib/s, 0.00 Gib/s, 0.00 Gib/s
Transfer Files (10/30/300s): 0.00/s 0.00/s 0.00/s
Active Transfers/pull.threads: 0/100
Migrations: 0 RUNNING, 4 LIVE, 0 STOPPED
Actions Total: 0, Largest: "testmigration" 0, Peak: "MyMigration" 1
PendingRegions Total: 0 Avg: 0, Largest: "MyMigration" 0
FailedPaths Total: 0, Largest: "MyMigration" 0
File Transfer Retries Total: 0, Largest: "MyMigration" 0
Total Excluded Scan files/dirs/bytes: 26, 0, 8.1 MB
Total Iterated Scan files/dirs/bytes: 20082, 9876, 2.7 GB
EventsBehind Current/Avg/Max: 0/0/0, RPC Time Avg/Max: 4/8
EventsQueued: 0, Total Events Added: 504
Transferred File Size Percentiles:
2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B
Transferred File Transfer Rates Percentiles per Second:
2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B
Active File Size Percentiles:
0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B
Active File Transfer Rates Percentiles per Second:
0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B

Hive migration commands

hive migration add

Create a new Hive migration to initiate metadata migration from your source Metastore.

info

Create hive rules before initiating a Hive migration to enter which databases and tables are migrated.

Create new migration
hive migration add      [--source] string
[--target] string
[--name] string
[--auto-start]
[--once]
[--rule-names] list

Mandatory parameters

  • --source The name of the Hive agent for the source of migration.
  • --target The name of the Hive agent for the target of migration.
  • --rule-names The rule name or list of rule names to use with the migration. Multiple rules need to be comma-separated (for example: rule1,rule2,rule3).
tip

Metadata rules determine the scope of a migration, you need to add rules before creating your metadata migration.

Optional parameters

  • --name The name to identify the migration with.
  • --auto-start Enter this parameter to start the migration immediately after creation.
  • --once Enter this parameter to perform a one-time migration, and not continuously scan for new or changing metadata.

Example

hive migration add --source sourceAgent --target remoteAgent --rule-names test_dbs,user_dbs --name hive_migration --auto-start
note

Auto-completion of the --rule-names parameter will not work correctly if it is added at the end of the Hive migration parameters. See the troubleshooting guide for workarounds.

hive migration add databricks unity

Create a new metadata migration with a Databricks Unity Catalog target with parameters that override the Unity Catalog target configuration.

Create new Databricks migration with override
hive migration add databricks unity        [--auto-start]
[--catalog] string
[--convert-to-delta] boolean
[--converted-data-location] string
[--delete-after-conversion] boolean
[--external-location] string
[--name] string
[--once]
[--rule-names] list
[--source] string
[--target] string
[--table-type] string

Mandatory parameters

  • --source The name of the Hive agent for the source of migration.
  • --target The name of the Hive agent for the target of migration.
  • --rule-names The rule name or list of rule names to use with the migration. Multiple rules need to be comma-separated (for example: rule1,rule2,rule3).
  • --name The name to identify the migration with.
tip

Metadata rules determine the scope of a migration, you need to add rules before creating your metadata migration.

Optional parameters

  • --auto-start Enter this parameter to start the migration immediately after creation.
  • --once Enter this parameter to perform a one-time migration, and not continuously scan for new or changing metadata.

Databricks Unity Catalog target override properties

info

If you set --convert-to-delta to TRUE, provide a value for --default-fs-override.

  • --catalog Enter the name of your Databricks Unity Catalog.
  • --external-location The external location path already created and defined in Databricks.
  • --convert-to-delta Convert tables to Delta Lake format. Enter either FALSE or TRUE.
  • --delete-after-conversion Delete the raw data after it has been converted to Delta format and migrated to Databricks. When selected Live migration is not possible. Enter either FALSE or TRUE.
    info

    Only use this option if you're performing one-time migrations for the underlying table data. The Databricks agent doesn't support continuous (live) updates of table data when transferring to Delta Lake on Databricks.

  • --table-type Specify how converted tables are migrated. Enter either MANAGED to convert Hive source tables to managed delta or EXTERNAL to convert Hive source tables to external delta.
  • --converted-data-location If your --table-type is EXTERNAL, specify the full URI of the external location on to store the tables converted to Delta Lake. For example, abfss://container@account.dfs.core.windows.net/converted

Example

hive migration add databricks unity --source sourceAgent --target remoteAgent --rule-names user_dbs --name hive_migration --auto-start --catalog CatNew 

hive migration add databricks legacy

Create a new metadata migration with a Databricks Workspace Hive Metastore legacy target and override the target's configuration.

Create new Databricks migration with override
hive migration add databricks legacy       
[--auto-start]
[--convert-to-delta] boolean
[--default-fs-override] string
[--delete-after-conversion] boolean
[--fs-mount-point] string
[--name] string
[--once]
[--rule-names] list
[--source] string
[--target] string

Mandatory parameters

  • --source The name of the Hive agent for the source of migration.
  • --target The name of the Hive agent for the target of migration.
  • --rule-names The rule name or list of rule names to use with the migration. Multiple rules need to be comma-separated (for example: rule1,rule2,rule3).
  • --name The name to identify the migration with.
tip

Metadata rules determine the scope of a migration, you need to add rules before creating your metadata migration.

Optional parameters

  • --auto-start Enter this parameter to start the migration immediately after creation.
  • --once Enter this parameter to perform a one-time migration, and not continuously scan for new or changing metadata.

Databricks Workspace Hive Metastore legacy target override properties

  • --fs-mount-point Enter the mount point path of your cloud storage on your DBFS (Databricks File System) for example: /mnt/mybucketname. This mount point value is required for the migration process.
    info

    The filesystem must already be mounted on DBFS. Learn more on mounting storage on Databricks for ADLS/S3/GCP filesystems.

  • --convert-to-delta Convert tables to Delta Lake format. Enter either TRUE or FALSE.
  • --delete-after-conversion Use this option to delete the underlying table data and metadata from the filesystem location defined by --fs-mount-point after it's converted to Delta Lake on Databricks. Enter either TRUE or FALSE.
    info

    Only use this option if you're performing one-time migrations for the underlying table data. The Databricks agent doesn't support continuous (live) updates of table data when transferring to Delta Lake on Databricks.

  • --default-fs-override Enter the DBFS table location value in the format dbfs:<location>. If you intend to convert to Delta format with --convert-to-delta set to TRUE, enter the location on DBFS to store tables converted to Delta Lake. To store Delta Lake tables on cloud storage, enter the path to the mount point and the path on the cloud storage. (For example, dbfs:/mnt/adls2).

Example

hive migration add databricks legacy --source sourceAgent --target remoteAgent --rule-names test_dbs --name hive_migration --auto-start  --default-fs-override dbfs:/mnt/adls2

hive migration delete

Delete a Hive migration.

note

A Hive migration must be stopped before it can be deleted. This can be achieved by using the --force-stop parameter with this command.

Delete migration from the list, migration should be stopped
hive migration delete [--name] string  [--force-stop]

Example

hive migration delete --name hive_migration --force-stop

hive migration list

List all Hive migrations.

print a list of all migrations
hive migration list

hive migration pause

Pause a Hive migration. Use the --names flag with a comma-separated list of migration names to pause multiple Hive migrations.

Pause a Hive migration
hive migration pause --names hmig1,hmig2

hive migration pause all

Pause all Hive migrations.

Pause all Hive migrations
hive migration pause all

hive migration reset

Reset a stopped Hive migration. This returns the migration to a CREATED state.

Reset a Hive migration
hive migration reset    [--names] string
[--force-stop]
note

A Hive migration must be stopped before it can be reset. This can be achieved by using the --force-stop parameter with this command.

info

The reset migration will use the latest agent settings.

For example, if the target agent’s Default Filesystem Override setting was updated after the original migration started, the reset migration will use the latest Default Filesystem Override value.

To reset multiple Hive migrations, use a comma-separated list of migration names with the --names parameter.

Example

Reset a Hive migration
hive migration reset --names hive_migration1
Stop and reset a list of migrations
hive migration reset --force-stop --names hive_migration1,hive_migration2

hive migration reset all

See the hive migration reset command. Reset all Hive migrations.

Reset all Hive migrations
hive migration reset all

hive migration resume

Resume STOPPED, PAUSED or FAILED Hive migrations. Use the --names flag with a comma-separated list of migration names to resume multiple Hive migrations.

Resume a Hive migration
hive migration resume --names Hmig1

hive migration resume all

Resume all STOPPED, PAUSED or FAILED Hive migrations.

Resume all Hive migrations
hive migration resume all

hive migration show

Display information about a Hive migration.

Show info about specific migration
hive migration show

hive migration start

Start a Hive migration or a list of Hive migrations (comma-separated).

note

Enter the --once parameter to perform a one-time migration, and not continuously scan for new or changing metadata.

Start migration
hive migration start [--names] list  [--once]

Example

hive migration start --names hive_migration1,hive_migration2

hive migration start all

Start all Hive migrations.

note

Enter the --once parameter to perform a one-time migration, and not continuously scan for new or changing metadata.

Start migration
hive migration start all [--once]

Example

hive migration start all --once

hive migration status

Show the status of a Hive migration or a list of Hive migrations (comma-separated). See the following KB article for an example output.

Show migration status
hive migration status [--names] list

Example

hive migration status --names hive_migration1,hive_migration2

hive migration status all

Show the status of all Hive migrations.

Start migration
hive migration status all

Example

hive migration status all

hive migration stop

Stop a running hive migration or a list of running hive migrations (comma-separated).

Stop running migration
hive migration stop [--names] list

Example

hive migration stop --names hive_migration1,hive_migration2

hive migration stop all

Stop all running Hive migrations.

Stop all running migrations
hive migration stop all

Example

hive migration stop all

Path mapping commands

path mapping add

Create a path mapping that allows you to define a alternative target path for a specific target filesystem. These will be automatically applied to a new migration.

When path mapping isn't used, the source path is created on the target filesystem.

note

Path mappings can't be applied to existing migrations. Delete and recreate a migration if you want a path mapping to apply.

Create a new path mapping
path mapping add        [--path-mapping-id] string
[--source-path] string
[--target] string
[--target-path] string
[--description] string

Mandatory parameters

  • --source-path The path on the source filesystem.
  • --target The target filesystem id (value defined for the --file-system-id parameter).
  • --target-path The path for the target filesystem.
  • --description Description of the path mapping enclosed in quotes ("text").

Optional parameters

  • --path-mapping-id An ID for this path mapping. An ID will be auto-generated if you don't enter one.

Example

Example for HDP to HDI - default Hive warehouse directory
path mapping add --path-mapping-id hdp-hdi --source-path /apps/hive/warehouse --target mytarget --target-path /hive/warehouse --description "HDP to HDI - Hive warehouse directory"

path mapping delete

Delete a path mapping.

note

Deleting a path mapping will not affect any existing migrations that have the path mapping applied. Delete and recreate a migration if you no longer want a previous path mapping to apply.

Delete a path mapping
path mapping delete [--path-mapping-id] string

Mandatory parameters

  • --path-mapping-id The ID of the path mapping.

Example

path mapping delete --path-mapping-id hdp-hdi

path mapping list

List all path mappings.

List all path mappings
path mapping list [--target] string

Optional parameters

  • --target List path mappings for the specified target filesystem id.

Examples

Example for listing all path mappings
path mapping list --target hdp-hdi
Example for listing path mappings for a specific target
path mapping list --target hdp-hdi

path mapping show

Show details of a specified path mapping.

Get path mapping details
path mapping show [--path-mapping-id] string

Optional parameters

  • --path-mapping-id The ID of the path mapping.

Example

path mapping show --path-mapping-id hdp-hdi

Built-in commands

clear

Clear the shell screen. You can also type ctrl-L
clear

echo

Prints whatever text that you write to the console. This can be used to sanity check a command before running it (for example: echo migration add --path /repl1 --target mytarget –-migration-id myNewMigration --exclusions 100mbfiles).

Print message
echo [--message] string

exit, quit

Entering either exit or quit will stop operation of Data Migrator when it is run from the command line. All processing will cease, and you will be returned to your system shell.

If your Data Migrator command line is connected to a Data Migrator system service, this command will end your interactive session with that service, which will remain in operation to continue processing migrations.

If this command is encountered during non-interactive processing of input (such as when you pipe input to an instance as part of another shell script) no further commands contained in that input will be processed.

Exit the shell
exit

ALSO KNOWN AS

quit

help

Use the help command to get details of all commands available from the action prompt.

Display help about available commands
help [-C] string

For longer commands, you can use backslashes (\) to indicate continuation, or use quotation marks (") to enclose the full command. When using quotation marks, you can press Tab on your keyboard to make Data Migrator automatically suggest the remainder of your typed command.

See the examples below for reference.

Example

help connect

connect - Connect to Data Migrator and Hive Migrator.

connect [--host] string [--ssl] [--lm2port] int [--hvm-port] int [--timeout] integer [--user] string
Use of backslashes
help hive\ migration\ add

hive migration add - Create new migration.

hive migration add [--source] string [--target] string [--name] string [--auto-start] [--once] [--rule-names] list
Use of quotation marks
help "filesystem add local"

filesystem add local - Add a local filesystem.

filesystem add local [--file-system-id] string [--fs-root] string [--source] [--scan-only] [--properties-files] list [--properties] string

history

Enter history at the action prompt to list all previously entered commands.

Entering history --file <filename> will save up to 500 most recently entered commands in text form to the file specified. Use this to record commands that you have executed.

Display or save the history of previously run commands
history [--file] file

Optional parameters

  • --file The name of the file in which to save the history of commands.

script

Load and execute commands from a text file using the script --file <filename> command. This file should have one command per line, and each will be executed as though they were entered directly at the action prompt in that sequence.

Use scripts outside of the CLI by referencing the script when running the livedata-migrator command (see examples).

Read and execute commands from a file
script [--file] file

Mandatory parameters

  • --file The name of the file containing script commands.
Example contents of a script file
hive agent check --name sourceAgent
hive agent check --name azureAgent

Examples

info

These examples assume that myScript is inside the working directory.

Example inside CLI
script --file myScript
Example outside of CLI (non-interactive)
livedata-migrator --script=./myScript

Change log level commands

log debug

Enable debug level logging
log debug

log info

Enable info level logging
log info

log off

Disable logging
log off

log trace

Enable trace level logging
log trace

Connect commands

connect

Use the connect command to connect to both Data Migrator and Hivemigrator on the same host with a single command.

Connect Data Migrator
connect                 [--host] string
[--hvm-port] integer
[--ldm-port] integer
[--ssl]
[--timeout] integer
[--user] string
hivemigrator
livemigrator

Mandatory parameters

  • --host The hostname or IP address for the Data Migrator and host.

Optional parameters

  • --hvm-port Specify the Hivemigrator port. If not specified, the default port value of 6780 will be used to connect.
  • --ldm-port Specify the Data Migrator port. If not specified, the default port value of 18080 will be used to connect.
  • --ssl Enter this parameter if you want to establish a TLS connection to Data Migrator. Enable Server TLS on the Data Migrator service before using this parameter.
  • --timeout Define the connection timeout in milliseconds. Set this parameter to override the default connection timeout of 5 minutes (300000ms).
  • --user The username to use for authenticating to both services. Used when instances have basic or LDAP authentication enabled. You will be prompted to enter the user password.

Connect to the Data Migrator and Hivemigrator services on the host with this command.

connect --host localhost --hvm-port 6780 --ldm-port 18080 --user admin

connect livemigrator

Connect to the Data Migrator service on your Data Migrator host with this command.

note

This is a manual method of connecting to the Data Migrator service as the livedata-migrator command (shown in CLI - Sign in) will attempt to establish this connection automatically.

Connect Data Migrator
connect livemigrator    [--host] string
[--ssl]
[--port] int
[--timeout] integer
[--user] string

Mandatory parameters

  • --host The hostname or IP address for the Data Migrator host.

Optional parameters

  • --ssl Enter this parameter if you want to establish a TLS connection to Data Migrator. Enable Server TLS on the Data Migrator service before using this parameter.
  • --port The Data Migrator port to connect on (default is 18080).
  • --timeout Define the connection timeout in milliseconds. Set this parameter to override the default connection timeout of 5 minutes (300000ms).
  • --user The username to use for authenticating to the Data Migrator service. Used when the Data Migrator instance has basic or LDAP authentication enabled. You will be prompted to enter the user password.

Connect to the Data Migrator service on your Data Migrator host with this command.

connect livemigrator --host localhost --port 18080

connect hivemigrator

Connect to the Hive Migrator service on your Data Migrator host with this command.

note

This is a manual method of connecting to the Hive Migrator service as the livedata-migrator command (shown in CLI - Log in section) will attempt to establish this connection automatically.

Connect Hivemigrator
connect hivemigrator    [--host] string
[--ssl]
[--port] int
[--timeout] long
[--user] string

Mandatory parameters

  • --host The hostname or IP address for the Data Migrator host that contains the Hive Migrator service.

Optional parameters

  • --ssl Enter this parameter if you want to establish a TLS connection to Hive Migrator.
  • --port The Hive Migrator service port to connect on (default is 6780).
  • --timeout Define the connection timeout in milliseconds. Set this parameter to override the default connection timeout of 5 minutes (300000ms).
  • --user The username to use for authenticating to the Hive Migrator service. Used when Hive Migrator has basic authentication enabled. You will still be prompted to enter the user password.

Example

connect hivemigrator --host localhost --port 6780

Email notifications subscription commands

notification email addresses add

Add email addresses to the subscription list for email notifications.

Subscribe email address to notifications.
notification email addresses add [--addresses]

Mandatory parameters

  • --addresses A comma-separated lists of email addresses to be added.

Example

notification email addresses add --addresses myemail@company.org,personalemail@gmail.com

notification email addresses remove

Remove email addresses from the subscription list for email notifications.

Unsubscribe email address to notifications.
notification email addresses remove [--addresses]  

Mandatory parameters

  • --addresses A comma-separated lists of email addresses to be removed. Use auto-completion to quickly select from subscribed emails.

Example

notification email addresses remove --addresses myemail@company.org,personalemail@gmail.com

notification email smtp set

Configure the details of an SMTP server for Data Migrator to connect to.

Configure the SMTP adapter.
notification email smtp set     [--host] string  
[--port] integer
[--security] security-enum
[--email] string
[--login] string
[--password] string
[--subject-prefix] string

Mandatory parameters

  • --host The host address of the SMTP server.
  • --port The port to connect to the SMTP server. Many SMTP servers use port 25.
  • --security The type of security the server uses. Available options: NONE,SSL,STARTLS_ENABLED,STARTTLS_REQUIRED, or TLS.
  • --email The email address for Data Migrator to use with emails sent through the SMTP server. This address will be the sender of all configured email notifications.

Optional parameters

  • --login The username to authenticate with the SMTP server.
  • --password The password to authenticate with the SMTP server sign-in. Required if you sign in.
  • --subject-prefix Set an email subject prefix to help identify and filter Data Migrator notifications.

Example

notification email smtp set --host my.internal.host --port 587 --security TLS --email livedatamigrator@wandisco.com  --login myusername --password mypassword

notification email smtp show

Display the details of the SMTP server Data Migrator is configured to use.

Show the current configuration of SMTP adapter.
notification email smtp show

notification email subscriptions show

Show a list of currently subscribed emails and notifications.

Show email notification subscriptions.
notification email subscriptions show

notification email types add

Add notification types to the email notification subscription list.

See the output from the command notification email types show for a list of all currently available notification types.

Subscribe on notification types.
notification email types add [--types]  

Mandatory parameters

  • --types A comma-separated list of notification types to subscribe to.

Example

notification email types add MISSING_EVENTS,EVENTS_BEHIND,MIGRATION_AUTO_STOPPED

notification email types remove

Remove notification types from the email notification subscription list.

Unsubscribe on notification types.
notification email types remove [--types]  

Mandatory parameters

  • --types A comma-separated list of notification types to unsubscribe from.

Example

notification email types remove MISSING_EVENTS,EVENTS_BEHIND,MIGRATION_AUTO_STOPPED

notification email types show

Return a list of all available notification types to subscribe to.

Example

Show email notification types.
notification email types show

See the email notification type reference for more information on the types available.

Hive Backup Commands

hive backup add

Immediately create a metadata backup file.

Create new backup
hive backup add

hive backup config show

Show the current metadata backup configuration.

Show configuration of backups.
hive backup config show

hive backup list

List all existing metadata backup files.

List all backups
hive backup list

hive backup restore

Restore from a specified metadata backup file.

Restore backup by name
hive backup restore --name string

hive backup schedule configure

Configure a backup schedule for metadata migrations.

Configure backup schedule
hive backup schedule configure --period-minutes 10 --enable

{
"enabled": true,
"periodMinutes": 10
}

hive backup schedule show

Show the current metadata backup schedule.

Show current backup schedule
hive backup schedule show

{
"enabled": true,
"periodMinutes": 10
}

hive backup show

Show a specified metadata backup file.

hive backup show
hive backup show --name string

Hive configuration commands

hive config certificate generate

Generate system certificates
hive config certificate generate

hive config certificate upload

Create a new path mapping
hive config certificate upload  [--path-mapping-id] string
[--private-key] file
[--certificate] file
[--trusted-certificate] file

Mandatory parameters

  • --private-key Client private key used to establish a TLS connection to the remote agent.
  • --certificate Client certificate used to establish a TLS connection to the remote agent.
  • --trusted-certificate Trusted certificate used to establish a TLS connection to the remote agent.

Hive rule configuration commands

hive rule add,hive rule create

Create a Hive migration rule that is used to define which databases and tables are migrated.

info

Enter these rules when starting a new migration to control which databases and tables are migrated.

Add new Hive migration rule
hive rule add   [--database-pattern] string
[--table-pattern] string
[--name] string

ALSO KNOWN AS

hive rule create

Mandatory parameters

  • --database-pattern Enter a Hive DDL pattern that will match the database names you want to migrate.
  • --table-pattern Enter a Hive DDL pattern that will match the table names you want to migrate.
tip

You can use a single asterisk (*) if you want to match all databases and/or all tables within the Metastore/database.

Optional parameters

  • --name The name for the Hive rule.

Example

Match all database names that start with test and all tables inside of them
hive rule add --name test_databases --database-pattern test* --table-pattern *

hive rule configure

Change the parameters of an existing Hive rule.

The parameters that can be changed are the same as the ones listed in the hive rule add,hive rule create section.

All parameters are optional except --name, which is required to enter the existing Hive rule that you wish to configure.

Example

hive rule configure --name test_databases --database-pattern test_db*

hive rule delete

Delete selected Hive migration rule
hive rule delete [--name] string

Example

hive rule delete --name test_databases

hive rule list

Get a list of defined rules
hive rule list

hive rule show

Show rule details
hive rule show [--name] string

Example

hive rule show --name test_databases

Hive show commands

hive show conf

Returns a description of the specified Hive configuration property.
hive show conf  [--parameter] string  
[--agent-name] string

Hive show configuration parameters

  • --agent-name The name of the agent.
  • --parameter The configuration parameter/property that you want to show the value of.

Example

Example when sourceAgent is an Apache Hive agent
hive show conf --agent-name sourceAgent --parameter hive.metastore.uris

hive show database

Show detailed information about a given database and agent (or sourceAgent if not set).
hive show database      [--database] string  
[--agent-name] string

Hive show database parameters

  • --database The database name. If not specified, the default will be default.
  • --agent-name The name of the agent.

Example

hive show database --agent-name sourceAgent --database mydb01

hive show databases

Get databases list from a given agent or sourceAgent if agent isn't set.
hive show databases      [--like] string  
[--agent-name] string

Hive show databases parameters

  • --like The Hive DDL pattern to use to match the database names (for example: testdb* will match any database name that begins with "testdb").
  • --agent-name The name of the agent.

Example

hive show database --agent-name sourceAgent --like testdb*

hive show indexes

Get indexes list for a given database/table and agent (or sourceAgent if not set).
hive show indexes       [--database] string  
[--table] string
[--agent-name] string

Hive show indexes parameters

  • --database The database name.
  • --table The table name.
  • --agent-name The name of the agent.

Example

hive show indexes --agent-name sourceAgent --database mydb01 --table mytbl01

hive show partitions

Get partitions list for a given database/table and agent (or sourceAgent if not set).
hive show partitions    [--database] string  
[--table] string
[--agent-name] string

Hive show partitions parameters

  • --database The database name.
  • --table The table name.
  • --agent-name The name of the agent.

Example

hive show partitions --agent-name sourceAgent --database mydb01 --table mytbl01

hive show table

Show detailed information about a given table using the given agent (or sourceAgent if not set).
hive show table [--database] string  
[--table] string
[--agent-name] string

Hive show table parameters

  • --database The database name where the table is located.
  • --table The table name.
  • --agent-name The name of the agent.

Example

hive show table --agent-name sourceAgent --database mydb01 --table mytbl01

hive show tables

Get tables list for a given database (default if not set ) and agent (sourceAgent if not set).
hive show tables [[--like] string]  [[--database] string]  [[--agent-name] string]

Hive show tables parameters

  • --like The Hive DDL pattern to use to match the table names (for example: testtbl* will match any table name that begins with "testtbl").
  • --database Database name. Defaults to default if not set.
  • --agent-name The name of the agent.

Example

hive show tables --agent-name sourceAgent --database mydb01 --like testtbl*

License manipulation commands

license show

Show the details of the active license
license show [--full]

license upload

Upload a new license by submitting its location on the local filesystem
license upload [--path] string

Example

license upload --path /user/hdfs/license.key

license usage

View total current usage and instance ID for this instance. Download the license usage information in csv format with the --download-csv option, the file location (/tmp) is shown in the command output. For multiple Data Migrator instances, run the command on each instance. Usage stats are current as of the time of execution.

Optional parameters

  • --download-csv Download usage information to a local file. The download location will be shown on screen when the command returns.

Example

Get current license usage and download
license usage --download-csv

Notification commands

notification latest

Get the latest notification
notification latest

notification list

Get notifications
notification list       [--count] integer
[--since] string
[--type] string
[--exclude-resolved]
[--level] string

Optional parameters

  • --count The number of notifications to return.
  • --since Return notifications created after this date/time.
  • --type The type of notification to return e.g. LicenseExceptionNotification.
  • --exclude-resolved Exclude resolved notifications.
  • --level The level of notification to return.

notification show

Show notification details
notification show [--notification-id]  string

Mandatory parameters

  • --notification-id The id of the notification to return.

Source commands

source clear

Clear all information that Data Migrator maintains about the source filesystem by issuing the source clear command. This will allow you to define an alternative source to one previously defined or detected automatically.

Delete all sources
source clear

source delete

Use source delete to delete information about a specific source by ID. You can obtain the ID for a source filesystem with the output of the source show command.

Delete a source
source delete [--file-system-id] string

Mandatory parameters

  • --file-system-id The ID of the source filesystem resource you want to delete.

Example

source delete --file-system-id auto-discovered-source-hdfs

source show

Get information about the source filesystem configuration.

Show the source filesystem configuration
source show [--detailed]

Optional parameters

  • ---detailed Include all configuration properties for the source filesystem in the response.