Skip to main content
Version: 1.22.0 (latest)

Command reference

System service commands#

The service scripts are used to control operation of each individual service. In most supported Linux distributions, the following commands can be used to manage Data Migrator, Hive Migrator, and WANdisco UI processes.

Data Migrator#

systemd commandUse it to...
systemctl start livedata-migratorStart a service that isn't currently running.
systemctl stop livedata-migratorStop a running service.
systemctl restart livedata-migratorRun a command that performs a stop and then a start.
If the service isn't running, this works the same as a start command.
systemctl status livedata-migratorGet details of the running service's status.

Running without systemd#

For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:

service <service name> <command>
Example command to restart Data Migrator:
service livedata-migrator restart

Hive Migrator#

Service scriptUse it to...
systemctl start hivemigratorStart a service that isn't currently running.
systemctl stop hivemigratorStop a running service.
systemctl restart hivemigratorRun a command that performs a stop and then a start.
If the service isn't running, this works the same as a start command.
systemctl status hivemigratorGet details of the running service's status.
important

Always start/restart Hive Migrator services in the following order:

  1. Remote agents
  2. Hive Migrator service.

Not starting services in this order may cause live migrations to fail.

Running without systemd#

For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:

service <service name> <command>
Example command to view status of Hive Migrator:
service hivemigrator status

WANdisco UI#

Service scriptUse it to...
systemctl start livedata-uiStart a service that isn't currently running.
systemctl stop livedata-uiStop a running service.
systemctl restart livedata-uiRun a command that performs a stop and then a start.
If the service isn't running, this works the same as a start command.
systemctl status livedata-uiGet details of the running service's status.

Running without systemd#

For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:

service <service name> <command>
Example command to stop a WANdisco UI service:
service livedata-ui status

Connect to the WANdisco CLI#

Open a terminal session on the Data Migrator host machine and enter the following command:

livedata-migrator#

Example output#

Connect to the WANdisco CLI (external command)
# livedata-migrator
    #     #     #     &&           &&   &&&&& && && && &    &&&&&&,    &   &&&&&    &&&&&   ,&&&&&,  ##### #####  ###     &&   &&&   &&  &&      && &&&    &&  &      &   &  &        &       &       &############### ###     && && && &&   &       && &&     &&  &       &  &   &&&&&  &        &       &  ###### ###### ###      &&&    &&&    &&      && &&     &&  &      &   &        &  &       &      &    ##     ##   ##        &&    &&      &&&&&& && &&     &&  &&&&&&`    &  &&&&&&    &&&&&   `&&&&&`

WANdisco LiveData Migrator Command Line Interface

Welcome to WANdisco LiveData Migrator Command Line Interface- Type "help" for assistance, <TAB> for completion.

When the CLI connects to the Data Migrator and Hive Migrator services, you get the following command prompt:

WANdisco LiveData Migrator >>

The CLI is now ready to accept commands.

Version check#

Check the current versions of included components by using the livedata-migrator command with the --version parameter. For example:

# livedata-migrator --version
tip

This doesn't start the CLI. You get a list of the current Data Migrator components, along with their version numbers.

Example output#

Check the component versions (Note: The exact output may change in later versions of Data Migrator)
# livedata-migrator --versionlivedata-migrator <version-number>livedata-ui <version-number>livedata-ui-partner <version-number>livedata-migrator-cli <version-number>hivemigrator <version-number>

WANdisco CLI features#

FeatureHow to use it
Review available commandsUse the help command to get details of all commands available.
Command completionHit the <tab> key at any time to get assistance or to complete partially-entered commands.
Cancel inputType <Ctrl-C> before entering a command to return to an empty action prompt.
Syntax indicationInvalid commands are highlighted as you type.
Clear the displayType <Ctrl-L> at any time.
Previous commandsNavigate previous commands using the up and down arrows, and use standard emacs shortcuts.
Interactive or scripted operationYou can interact with the command line interface directly, or send it commands on standard input to incorporate it into shell scripts. See script for more information and examples.

WANdisco CLI commands#

You can manage filesystems, migrations, and more in the WANdisco CLI.

Backup commands#

backup add#

Immediately create a backup file
        backup add

backup config show#

Show the current backups configuration
        backup config show
     {      "backupsLocation": "/opt/wandisco/livedata-migrator/db/backups",      "lastSuccessfulTs": 0,      "backupSchedule": {        "enabled": true,        "periodMinutes": 10      },      "storedFilePaths": [        "/etc/wandisco/livedata-migrator/application.properties",        "/etc/wandisco/livedata-migrator/vars.env",        "/etc/wandisco/livedata-migrator/logback-spring.xml",        "/etc/wandisco/livedata-migrator/application.properties",        "/etc/wandisco/livedata-migrator/vars.env",        "/etc/wandisco/livedata-migrator/logback-spring.xml"      ]    }

backup list#

List all existing backup files
        backup list

backup restore#

Restore from a specified backup file
        backup restore --name string

backup schedule configure#

Configure a backup schedule for Data Migrator
        backup schedule configure --period-minutes 10 --enable
        {  "enabled": true,  "periodMinutes": 10}

backup schedule show#

Show current backup schedule
        backup schedule show
        {  "enabled": true,  "periodMinutes": 10  }

backup show#

Show a specified backup file
        backup show --name string

Bandwidth policy commands#

bandwidth policy delete#

Allow the application to use unlimited bandwidth
        bandwidth policy delete

bandwidth policy set#

Set the application bandwidth limit, in bytes per second
        bandwidth policy set    [--value] long                                  [--unit] string

Mandatory parameters#

  • --value Define the number of byte units.
  • --unit Define the byte unit to be used.
    Decimal units: B, KB, MB, GB, TB, PB.
    Binary units: KiB, MiB, GiB, TiB, PiB.

Example#

Set a limit of 10 Megabytes per second
        bandwidth policy set --value 10 --unit MB

bandwidth policy show#

Get details of the application bandwidth limit, in bytes per second
        bandwidth policy show

Filesystem commands#

filesystem add adls2 oauth#

Add an Azure Data Lake Storage (ADLS) Gen2 container as a migration target using the filesystem add adls2 oauth command, which requires a service principal and OAuth 2 credentials.

note

The service principal that you want to use must have either the Storage Blob Data Owner role assigned to the ADLS Gen2 storage account, or an access control list with RWX permissions for the migration path and all parent paths. For more information, see the Microsoft documentation.

Add an ADLS Gen2 filesystem with OAuth
    filesystem add adls2 oauth          [--file-system-id] string                                          [--storage-account-name] string                                          [--oauth2-client-id] string                                          [--oauth2-client-endpoint] string                                          [--container-name] string                                          [--insecure]                                          [--properties-files] list                                          [--oauth2-client-secret] string                                          [--properties] string                                        [--source]

Mandatory parameters#

  • --file-system-id The ID to give the new filesystem resource. In the UI, this is called Display Name.
  • --storage-account-name The name of the ADLS Gen2 storage account to target. In the UI, this is called Account Name.
  • --oauth2-client-id The client ID (also known as application ID) for your Azure service principal. In the UI, this is called Client ID.
  • --oauth2-client-endpoint The client endpoint for the Azure service principal. In the UI, this is called Endpoint.
    This will often take the form of https://login.microsoftonline.com/{tenant}/oauth2/v2.0/token where {tenant} is the directory ID for the Azure service principal. You can enter a custom URL (such as a proxy endpoint that manually interfaces with Azure Active Directory (Azure AD)).
  • --container-name The name of the container in the storage account to which content will be migrated. In the UI, this is called Container Name.
  • --oauth2-client-secret The client secret (also known as application secret) for the Azure service principal. In the UI, this is called Secret.

Optional parameters#

  • --insecure If you enter this parameter, Data Migrator will not use TLS to encrypt communication with ADLS Gen2. This may improve throughput, but should only be used when you have other means of securing communication. This is referenced in the UI when Use Secure Protocol is unchecked.
  • --properties-files Reference a list of existing properties files, each containing Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.
  • --properties Enter properties to use in a comma-separated key/value list.
  • --source Add this filesystem as the source for migrations.

Example#

        filesystem add adls2 oauth --file-system-id mytarget                                --storage-account-name myadls2                                --oauth2-client-id b67f67ex-ampl-e2eb-bd6d-client9385id                                --oauth2-client-secret 2IPO8*secretk-9OPs8n*TexampleHJ=                                --oauth2-client-endpoint https://login.microsoftonline.com/78u098ex-ampl-e498-8bce-ndpoint5f2e5/oauth2/v2.0/token                                --container-name lm2target

filesystem add adls2 sharedKey#

Add an ADLS Gen2 container as a migration target using the filesystem add adls2 sharedKey command, which requires credentials in the form of an account key.

Add an ADLS Gen2 filesystem with Shared Key
    filesystem add adls2 sharedKey      [--file-system-id] string                                          [--storage-account-name] string                                          [--container-name] string                                          [--insecure]                                          [--shared-key] string                                          [--properties-files] list                                          [--properties] string                                          [--source]

Mandatory parameters#

  • --file-system-id The ID to give the new filesystem resource. In the UI, this is called Display Name.
  • --storage-account-name The name of the ADLS Gen2 storage account to target. In the UI, this is called Account Name.
  • --shared-key The shared account key to use as credentials to write to the storage account. In the UI, this is called Access Key.
  • --container-name The name of the container in the storage account to which content will be migrated. In the UI, this is called Container Name.

Optional parameters#

  • --insecure If you enter this parameter, Data Migrator will not use TLS to encrypt communication with ADLS Gen2. This may improve throughput, but should only be used when you have other means of securing communication. You can only see this in the UI when Use Secure Protocol is unchecked.
  • --properties-files Reference a list of existing properties files, each containing Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.
  • --properties Enter properties to use in a comma-separated key/value list.
  • --source Add this filesystem as the source for migrations.

Example#

        filesystem add adls2 sharedKey  --file-system-id mytarget                                        --storage-account-name myadls2                                        --container-name lm2target                                        --shared-key Yi8NxHGqoQ79DBGLVn+COK/sRDwbNqAEXAMPLEDaMxRkvXt2ijUtASHAREDj/vaS/NbzR5rtjEKEY31eIopUVA==

filesystem add gcs#

Add a Google Cloud Storage as a migration target using the filesystem add gcs command, which requires credentials in the form of an account key file.

Add a Google Cloud Storage filesystem
    filesystem add gcs      [--file-system-id] string                              [--service-account-json-key-file] string                              [--service-account-p12-key-file] string                              [--service-account-json-key-file-server-location] string                              [--service-account-p12-key-file-server-location] string                              [--service-account-email] string                              [--bucket-name] string                              [--properties-files] list                              [--properties] string

Mandatory parameters#

  • --file-system-id The ID to give the new filesystem resource. In the UI, this is called Display Name.

  • --bucket-name The bucket name of a Google Cloud Storage account. In the UI, this is called Bucket Name.

  • Service account key parameters

    info

    Enter your service account key for the Google Cloud Storage bucket by choosing one of the parameters below.

    You can also upload the service account key directly when using the UI (this isn't supported through the CLI).

  • --service-account-json-key-file-server-location
    The absolute filesystem path on the Data Migrator server of your service account key file in JSON format. You can either create a Google Cloud Storage service account key or use an existing one.
    In the UI, this is called Key File and becomes visible when you select Key File Options -> Provide a Path.
  • --service-account-p12-key-file-server-location
    The absolute filesystem path on the Data Migrator server of your service account key file in P12 format. You can either create a Google Cloud Storage service account key or use an existing one.
    In the UI, this is called Key File and becomes visible when you select Key File Options -> Provide a Path.
  • --service-account-json-key-file
    The absolute filesystem path on the host running the WANdisco CLI of your service account key file in JSON format. Use this parameter if you are running the WANdisco CLI on a different host to your Data Migrator server.
  • --service-account-p12-key-file
    The absolute filesystem path on the host running the WANdisco CLI of your service account key file in P12 format. Use this parameter if you are running the WANdisco CLI on a different host to your Data Migrator server.

Optional parameters#

  • --service-account-email The email address linked to your Google Cloud Storage service account. In the UI, this is called Email address and is required when selecting the Upload P12 Key File option.
  • --properties-files Reference a list of existing properties files, each containing Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.
  • --properties Enter properties to use in a comma-separated key/value list.

Example#

        filesystem add gcs      --file-system-id gcsAgent                                --bucket-name myGcsBucket                                --service-account-p12-key-file-server-location /user/hdfs/targetStorage/myAccountKey.p12                                --service-account-email user@mydomain.com

filesystem add hdfs#

Add a Hadoop Distributed File System (HDFS) as either a migration source or target using the filesystem add hdfs command.

Creating a HDFS resource with this command will normally only be used when migrating to a target HDFS filesystem (rather than another storage service like ADLS Gen2 or S3a). Data Migrator will attempt to auto-discover the source HDFS when started from the command line unless Kerberos is enabled on your source environment.

If Kerberos is enabled on your source environment, use the filesystem auto-discover-source hdfs command to enter Kerberos credentials and auto-discover your source HDFS configuration.

Add a Hadoop Distributed File System
    filesystem add hdfs     [--file-system-id] string                              [--default-fs] string                              [--user] string                            [--kerberos-principal] string                              [--kerberos-keytab] string                              [--source]                              [--scan-only]                              [--success-file] string                            [--properties-files] list                              [--properties] string

Mandatory parameters#

  • --file-system-id The ID to give the new filesystem resource. In the UI, this is called Display Name.
  • --default-fs A string that defines how Data Migrator accesses HDFS. In the UI, this is called Default FS.
    It can be specified in a number of forms:
    1. As a single HDFS URI, such as hdfs://192.168.1.10:8020 (using an IP address) or hdfs://myhost.localdomain:8020 (using a hostname).
    2. As a HDFS URI that references a nameservice if the NameNodes have high availability, for example, hdfs://mynameservice. For more information, see HDFS High Availability.

Optional parameters#

Kerberos: Cross-realm authentication required between source and target HDFS

Cross-realm authentication is required in the following scenarios:

  • Migration will occur between a source and target HDFS.
  • Kerberos is enabled on both clusters.

See the links below for guidance for common Hadoop distributions:

  • --user The name of the HDFS user to be used when performing operations against the filesystem. In environments where Kerberos is disabled, this user must be the HDFS super user, such as hdfs.
  • --kerberos-principal The Kerberos principal to authenticate with and perform migrations as. This principal should map to the HDFS super user using auth_to_local rules.
  • --kerberos-keytab The Kerberos keytab containing the principal defined for the --kerberos-principal parameter. This must be accessible to the local system user running the Data Migrator service (default is hdfs).
  • --source Enter this parameter to use the filesystem resource created as a source. This is referenced in the UI when configuring the Unknown source.
  • --scan-only Enter this parameter to create a static source filesystem for use in one-time migrations. Requires --source.
  • --properties-files Reference a list of existing properties files that contain Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml. In the UI, this is called Provide a path to files under Additional Configuration.
  • --properties Enter properties to use in a comma-separated key/value list. In the UI, this is called Additional Configuration under the Additional Configuration option.
  • --success-file Specify a file name or glob pattern for files that Data Migrator will migrate last from the directory they're contained in. For example, --success-file /mypath/myfile.txt or --success-file /**_SUCCESS. You can use these files to confirm the directory they're in has fully finished migrating.
Properties files are required for NameNode HA#

If your Hadoop cluster has NameNode HA enabled, enter the local filesystem path to the properties files that define the configuration for the nameservice ID.

Source HDFS filesystem: These configuration files will likely be in a default location depending on the distribution of the Hadoop cluster.

Target HDFS filesystem: Ensure that the target Hadoop cluster configuration is available on your Data Migrator host's local filesystem.

  • For the UI, use Provide a path to files under the Additional Configuration option and define the directory containing the core-site.xml and hdfs-site.xml files.

    Example for path containing source cluster configuration
          /etc/hadoop/conf
    Example for path containing target cluster configuration
          /etc/targetClusterConfig

    Alternatively, define the absolute filesystem paths to these files:

    Example for absolute paths to source cluster configuration files
          /etc/hadoop/conf/core-site.xml      /etc/hadoop/conf/hdfs-site.xml
    Example for absolute paths to target cluster configuration files
          /etc/targetClusterConfig/core-site.xml      /etc/targetClusterConfig/hdfs-site.xml
  • For the CLI/API, use the --properties-files parameter and define the absolute paths to the core-site.xml and hdfs-site.xml files (see the Examples section for CLI usage of this parameter).

Examples#

HDFS as source#
Example for source NameNode HA cluster
filesystem add hdfs     --file-system-id mysource                        --default-fs hdfs://sourcenameservice                        --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
Example for source NameNode HA cluster with Kerberos enabled
filesystem add hdfs     --file-system-id mysource                        --default-fs hdfs://sourcenameservice                        --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml                        --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab                        --kerberos-principal hdfs@SOURCEREALM.COM
HDFS as target#
note

If you enter a HDFS filesystem as a target, the property files for the target cluster must exist on the local filesystem and be accessible to the Data Migrator system user.

Example for target NameNode HA cluster with Kerberos enabled
        filesystem add hdfs --file-system-id mytarget --default-fs hdfs://targetnameservice --properties-files /etc/targetClusterConfig/core-site.xml,/etc/targetClusterConfig/hdfs-site.xml --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal hdfs@SOURCEREALM.COM
Example for target single NameNode cluster
        filesystem add hdfs --file-system-id mytarget --default-fs hdfs://namenode.targetdomain:8020 --user hdfs

filesystem add local#

Add a local filesystem as either a migration target or source using the filesystem add local command.

Add a Local Filesystem
        filesystem add local    [--file-system-id] string                                [--fs-root] string                                [--source]                                [--scan-only]                                [--properties-files] list                                [--properties] string

Mandatory parameters#

  • --file-system-id The ID to give the new filesystem resource. In the UI, this is called Display Name.

Optional parameters#

  • --fs-root The directory in the local filesystem to scan for data or send data to, depending on whether the filesystem is defined as a source or a target. Should be supplied using the full directory path from the root.
  • --source Enter this parameter to use the filesystem resource created as a source. This is referenced in the UI when configuring the Unknown source.
  • --scan-only Enter this parameter to create a static source filesystem for use in one-time migrations. Requires --source.
  • --properties-files Reference a list of existing properties files.
  • --properties Enter properties to use in a comma-separated key/value list.
note

If no fs-root is specified, the file path will default to the root of your system.

Examples#

Local filesystem as source#
        filesystem add local --file-system-id mytarget --fs-root ./tmp --source
Local filesystem as target#
        filesystem add local --file-system-id mytarget --fs-root ./Users/username/destinationfolder/

filesystem add s3a#

Add an Amazon Simple Storage Service (Amazon S3) bucket as a target filesystem using the filesystem add s3a command. This method also supports IBM Cloud Object Storage buckets.

Add an S3 filesystem
    filesystem add s3a          [--file-system-id] string                                  [--bucket-name] string                                  [--endpoint] string                                  [--access-key] string                                  [--secret-key] string                                  [--sqs-queue] string                                  [--sqs-endpoint] string                                  [--credentials-provider] string                                  [--source]                                  [--scan-only]                                  [--properties-files] list                                  [--properties] string                                  [--s3type] string                                  [--bootstrap.servers] string                                  [--topic] string

For guidance about access, permissions, and security when adding an Amazon S3 bucket as a target filesystem, see Security best practices in IAM.

S3A mandatory parameters#

  • --file-system-id The ID for the new filesystem resource. In the UI, this is called Display Name.

  • --bucket-name The name of your Amazon S3 bucket. In the UI, this is called Bucket Name.

  • --credentials-provider The Java class name of a credentials provider for authenticating with the Amazon S3 endpoint. In the UI, this is called Credentials Provider. This isn't a required parameter when adding an IBM Cloud Object Storage bucket through the UI.
    The Provider options available include:

    • org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider

      Use this provider to offer credentials as an access key and secret access key with the --access-key and --secret-key Parameters.

    • com.amazonaws.auth.InstanceProfileCredentialsProvider

      Use this provider when running Data Migrator on an Elastic Compute Cloud (EC2) instance that has been assigned an IAM role with policies that allow it to access the Amazon S3 bucket.

    • com.amazonaws.auth.DefaultAWSCredentialsProviderChain

      A commonly-used credentials provider chain that looks for credentials in this order:

      • Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, or AWS_ACCESS_KEY and AWS_SECRET_KEY.
      • Java System Properties - aws.accessKeyId and aws.secretKey.
      • Web Identity Token credentials from the environment or container.
      • Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI.
      • Credentials delivered through the Amazon EC2 container service if AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable is set and security manager has permission to access the variable.
      • Instance profile credentials delivered through the Amazon EC2 metadata service.
    • com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider

      This provider supports the use of multiple AWS credentials, which are stored in a credentials file.

      When adding a source filesystem, use the following properties:

      • awsProfile - Name for the AWS profile.

      • awsCredentialsConfigFile - Path to the AWS credentials file. The default path is ~/.aws/credentials.

        For example:

        filesystem add s3a --file-system-id testProfile1Fs --bucket-name profile1-bucket --credentials-provider com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider --properties awsProfile=<profile-name>,awsCredentialsConfigFile=</path/to/the/aws/credentials" file>

        In the CLI, you can also use --aws-profile and --aws-config-file.

        For example:

        filesystem add s3a --file-system-id testProfile1Fs --bucket-name profile1-bucket --credentials-provider com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider --aws-profile <profile-name>--aws-config-file </path/to/the/aws/credentials/file>

        Learn more about using AWS profiles: Configuration and credential file settings.

  • Endpoint (UI & IBM Cloud Object Storage only): This is required when adding an IBM Cloud Object Storage bucket. IBM provide a list of available endpoints that can be found in their public documentation.

S3A optional parameters#

  • --access-key When using the org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider credentials provider, enter the access key with this parameter. In the UI, this is called Access Key. This is a required parameter when adding an IBM Cloud Object Storage bucket.

  • --secret-key When using the org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider credentials provider, enter the secret key using this parameter. In the UI, this is called Secret Key. This is a required parameter when adding an IBM Cloud Object Storage bucket.

  • --endpoint (S3 as a target only) Enter a specific endpoint to access the S3 bucket such as an AWS PrivateLink endpoint (for example: https://bucket.vpce-0e25b8cdd720f900e-argc85vg.s3.us-east-1.vpce.amazonaws.com). When using this parameter, do not use the fs.s3a.endpoint property as an additional custom property as this supersedes it.

  • --sqs-queue Enter an SQS queue name. This field is required if you enter an SQS endpoint.

  • --sqs-endpoint Enter an SQS endpoint.

  • --source (Preview) Enter this parameter to use the filesystem resource created as a source. This is referenced in the UI when configuring the Unknown source.

  • --scan-only Enter this parameter to create a static source filesystem for use in one-time migrations. Requires --source.

  • --properties-files Reference a list of existing properties files, each containing Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.

  • --properties Enter properties to use in a comma-separated key/value list. In the UI, this is called S3A Properties (see S3a Default Properties and S3a Custom Properties for more information).

  • --s3type Indicates an s3a compatibility filesystem type. You can set the parameter value to one of the following or leave it blank:

    • aws
    • oracle
    • ibmcos
  • --bootstrap.servers Kafka server address.

  • --topic Kafka's topic where s3 object change notifications are provided.

    note

    Amazon S3a as a source is currently a preview feature.

S3a default properties#

These properties are defined by default when adding an S3a filesystem.

  • fs.s3a.impl (default org.apache.hadoop.fs.s3a.S3AFileSystem): The implementation class of the S3a Filesystem.
  • fs.AbstractFileSystem.s3a.impl (default org.apache.hadoop.fs.s3a.S3A): The implementation class of the S3a AbstractFileSystem.
  • fs.s3a.user.agent.prefix (default APN/1.0 WANdisco/1.0 LiveDataMigrator/1.11.6): Sets a custom value that will be pre-pended to the User-Agent header sent in HTTP requests to the S3 back-end by S3aFileSystem.
  • fs.s3a.impl.disable.cache (default true): Disables the S3 filesystem cache when set to 'true'.
  • hadoop.tmp.dir (default tmp): The parent directory for other temporary directories.
  • fs.s3a.connection.maximum (default 120) Defines the maximum number of simultaneous connections to the S3 filesystem.
  • fs.s3a.threads.max (default 150): Defines the total number of threads to make available in the filesystem for data uploads or any other queued filesystem operation.
  • fs.s3a.max.total.tasks (default 60): Defines the number of operations which can be queued for execution at a time.
  • fs.s3a.healthcheck (Default true): Allows the S3A filesystem health check to be turned off by changing true to false. This option is useful for setting up Data Migrator while cloud services are offline. However, when disabled, errors in S3A configuration may be missed, resulting in hard-to-diagnose migration stalls.

S3a custom properties#

These are some of the additional properties that can be added when creating an S3a filesystem.

  • fs.s3a.fast.upload.buffer (default disk): Defines how the filesystem will buffer the upload.
  • fs.s3a.fast.upload.active.blocks (default 8): Defines how many blocks a single output stream can have uploading or queued at a given time.
  • fs.s3a.block.size (default 32M): Defines the maximum size of blocks during file transfer. Use the suffix K, M, G, T or P to scale the value in Kilobytes, Megabytes, Gigabytes, Terabytes or Petabytes respectively.
  • fs.s3a.buffer.dir (default tmp): Defines the directory used by disk buffering.

Find an additional list of S3a properties in the S3a documentation.

Upload buffering#

Migrations using an S3A target destination will buffer all uploads. By default, the buffering will occur on the local disk of system Data Migrator is running on, in the /tmp directory.

Data Migrator will automatically delete the temporary buffering files once they are no longer needed.

If you want to use a different type of buffering, you can change the property fs.s3a.fast.upload.buffer. The following values can be supplied:

Buffering OptionDetailsProperty Value
Array BufferBuffers the uploaded data in memory instead of on disk, using the Java heap.array
Byte BufferBuffers the uploaded data in memory instead of on disk, but does not use the Java heap.bytebuffer
Disk BufferingThe default option. Buffers the upload to disk.disk

Both the array and bytebuffer options may lead to the consumption of large amounts of memory. Other properties (such as fs.s3a.fast.upload.active.blocks) may be used to fine-tune the migration to avoid issues.

note

If you run out of disk space on which to buffer the migration, the migration will stall with a series of errors. To avoid this, ensure the filesystem containing the directory used for buffering (/tmp by default) has enough remaining space to facilitate the transfer.

S3a Example#

        filesystem add s3a --file-system-id mytarget        --bucket-name mybucket1        --credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider        --access-key pkExampleAccessKeyiz        --secret-key OP87ChoExampleSecretKeyI904lT7AaDBGJpp0D

IBM Cloud Object Storage Examples#

  • Add source IBM Cloud Object Storage filesystem. Note that this does not work if SSL is used on the endpoint address.

          filesystem add s3a --source --file-system-id cos_s3_source2      --bucket-name container2      --credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider      --access-key pkExampleAccessKeyiz      --secret-key c3vq6vaNtExampleSecretKeyVuqJMIHuV9IF3n9      --s3type ibmcos      --bootstrap.servers=10.0.0.123:9092      --topic newcos-events--enpoint http://10.0.0.124
  • Add path mapping.

          path mapping add --path-mapping-id testPath      --description description-string      --source-path /      --target targetHdfs2      --target-path /repl_test1      {      "id": "testPath",      "description": "description-string",      "sourceFileSystem": "cos_s3_source2",      "sourcePath": "/",      "targetFileSystem": "targetHdfs2",      "targetPath": "/repl_test1"      }
  • Adding file to container

          ./directory cp ~/Downloads/wq4.pptx cos/container2/
  • Removing a file from a container

          ~/Downloads/directory ./directory rm cos/container2/wq4.pptx
  • List objects in container

          ./directory ls cos/container2/
  • Via S3a API

          aws s3api list-objects --endpoint-url=http://10.0.0.201                              --bucket container2

filesystem auto-discover-source hdfs#

Discover your local HDFS filesystem by entering the Kerberos credentials for your source environment.

You can also manually configure the source HDFS filesystem using the filesystem add hdfs command.

Auto-discover-source Hadoop Distributed File System (HDFS)
        filesystem auto-discover-source hdfs [--kerberos-principal] string                                             [--kerberos-keytab] string

Kerberos parameters#

  • --kerberos-principal The Kerberos principal to authenticate with and perform migrations as. This principal should map to the HDFS super user using auth_to_local rules.
  • --kerberos-keytab The Kerberos keytab containing the principal defined for the --kerberos-principal parameter. This must be accessible to the local system user running the Data Migrator service (default is hdfs).

Example#

        filesystem auto-discover-source hdfs --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal hdfs@REALM.COM

filesystem clear#

Delete all target filesystem references with the filesystem clear. This leaves any migrated content intact in those targets, but removes all resources that act as references to the target filesystems.

Delete all targets
        filesystem clear

filesystem delete#

Delete a specific filesystem resource by ID. This leaves all migrated content intact at that target, but removes the resource that acts as a reference to that filesystem.

Delete a target
        filesystem delete [--file-system-id] string

Mandatory parameters#

  • --file-system-id The ID of the filesystem resource to delete. In the UI, this is called Display Name.

Example#

        filesystem delete --file-system-id mytarget

filesystem list#

List defined filesystem resources.

List targets
        filesystem list [--detailed]

Mandatory parameters#

  • --detailed Include all properties for each filesystem in the JSON result.

filesystem show#

View details for a filesystem resource.

Get target details
        filesystem show [--file-system-id] string                          [--detailed]

Mandatory parameters#

  • --file-system-id The ID of the filesystem resource to show. In the UI, this is called Display Name.

Example#

        filesystem show --file-system-id mytarget

filesystem types#

View information about the filesystem types available for use with Data Migrator.

List the types of target filesystems available
        filesystem types

filesystem update adls2 oauth#

Update an existing ADLS Gen2 container migration target with a specified filesystem ID using the filesystem update adls2 oauth command. You will be prompted to optionally update the service principal and OAuth 2 credentials.

Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add adls2 oauth section.

All parameters are optional except --file-system-id, which specifies the filesystem you want to update.

Example#

filesystem update adls2 oauth --file-system-id mytarget --storage-account-name myadls2 --oauth2-client-id b67f67ex-ampl-e2eb-bd6d-client9385id --oauth2-client-secret 2IPO8*secretk-9OPs8n*TexampleHJ= --oauth2-client-endpoint https://login.microsoftonline.com/78u098ex-ampl-e498-8bce-ndpoint5f2e5/oauth2/v2.0/token --container-name lm2target

filesystem update adls2 sharedKey#

Update an existing ADLS Gen2 container migration target using the filesystem update adls2 sharedKey command. You will be prompted to optionally update the secret key.

Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add adls2 sharedKey section.

All parameters are optional except --file-system-id, which specifies the filesystem you want to update.

Example#

filesystem update adls2 sharedKey --file-system-id mytarget --storage-account-name myadls2 --container-name lm2target --shared-key Yi8NxHGqoQ79DBGLVn+COK/sRDwbNqAEXAMPLEDaMxRkvXt2ijUtASHAREDj/vaS/NbzR5rtjEKEY31eIopUVA==

filesystem update gcs#

Update a Google Cloud Storage migration target using the filesystem update gcs command.

Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add gcs section.

All parameters are optional except --file-system-id, which specifies the filesystem you want to update.

Example#

        filesystem update gcs --file-system-id gcsAgent --bucket-name myGcsBucket --service-account-p12-key-file-server-location /user/hdfs/targetStorage/myAccountKey.p12 --service-account-email user@mydomain.com

filesystem update hdfs#

Update either a source or target Hadoop Distributed filesystem using the filesystem update hdfs command.

Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add hdfs section.

All parameters are optional except --file-system-id, which specifies the filesystem you want to update.

Examples#

Example for source NameNode HA cluster
        filesystem update hdfs  --file-system-id mysource                                --default-fs hdfs://sourcenameservice                                --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
Example for source NameNode HA cluster with Kerberos enabled
        filesystem update hdfs  --file-system-id mytarget                                --default-fs hdfs://sourcenameservice                                --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml                                --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab                                --kerberos-principal hdfs@SOURCEREALM.COM

filesystem update local#

Update a target or source local filesystem using the filesystem update local command.

Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add local section.

All parameters are optional except --file-system-id, which specifies the filesystem you want to update.

Example#

        filesystem update local --file-system-id mytarget --fs-root ./tmp

filesystem update s3a#

Update an S3 bucket target filesystem using the filesystem update s3a command. This method also supports IBM Cloud Object Storage buckets.

Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add s3a section.

All parameters are optional except --file-system-id, which specifies the filesystem you want to update.

Example#

        filesystem update s3a --file-system-id mytarget                                --bucket-name mybucket1 --credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider --access-key pkExampleAccessKeyiz --secret-key eSeCreTkeYd8uEDnDHRHuV9IF3n9

Hive agent configuration commands#

hive agent add azure#

Add a local or remote Hive agent to connect to an Azure SQL database using the hive agent add azure command.

If your Data Migrator host can communicate directly with the Azure SQL database, then a local Hive agent will be sufficient. Otherwise, consider using a remote Hive agent.

remote deployments

For a remote Hive agent connection, enter a remote host (Azure VM, HDI cluster node) that will be used to communicate with the local Data Migrator server (constrained to a user-defined port).

A small service will be deployed on this remote host so that the data can transfer between the Hive agent and the remote metastore.

Add Azure SQL agent
        hive agent add azure [--name] string                             [--db-server-name] string                             [--database-name] string                             [--database-user] string                             [--database-password] string                             [--auth-method]  azure-sqlauthentication-method                             [--client-id] string                             [--storage-account] string                             [--container-name] string                             [--hdi-version] string                             [--insecure] boolean                             [--host] string                             [--port] integer                             [--no-ssl]                             [--autodeploy] boolean                             [--ssh-user] string                             [--ssh-key] file                             [--ssh-port] int                             [--use-sudo]                             [--ignore-host-checking]                             [--file-system-id] string                             [--default-fs-override] string

Mandatory parameters#

info

The Azure Hive agent requires a ADLS Gen2 storage account and container name, this is only for the purposes of generating the correct location for the metadata. The agent will not access the container and data will not be written to it.

  • --name The ID to give to the new Hive agent. In the UI, this is called Display Name.
  • --db-server-name Azure SQL database server name.
  • --database-name The Azure SQL database name. In the UI, this is called Azure SQL Database Name.
  • --storage-account The name of the ADLS Gen2 storage account. In the UI, this is called Account Name.
  • --container-name The name of the container in the ADLS Gen2 storage account. In the UI, this is called Container Name.
  • --hdi-version The HDI version. This is relevant if you are intending to integrate your SQL server into a HDInsights cluster. In the UI, this is called HDI Version.
  • --auth-method Azure SQL database connection authentication method (SQL_PASSWORD, AD_MSI, AD_INTEGRATED, AD_PASSWORD, ACCESS_TOKEN).

Additionally, use only one of the following parameters:

  • --file-system-id The name of the filesystem that will be associated with this agent (for example: myadls2storage). This will ensure any path mappings are correctly linked between the filesystem and the agent. In the UI, this is called Filesystem.
  • --default-fs-override Enter an override for the default filesystem URI instead of a filesystem name (for example: abfss://mycontainer@mystorageaccount.dfs.core.windows.net). In the UI, this is called Default Filesystem Override.

Optional parameters#

  • --client-id Azure resource's clientId.
  • --insecure Define an insecure connection (TLS disabled) to the Azure SQL database server (default is false). In the UI, this is called Use Secure Protocol.

Authentication parameters#

Choose one of the authentication methods listed and include the additional parameters required for the chosen method.

  • --auth-method The authentication method to use to connect to the Azure SQL server. In the UI, this is called Authentication Method.
    The following methods can be used:
    • SQL_PASSWORD - Enter a username and password to access the database. In the UI, this is called SQL Password.
    • AD_MSI - Use a system-assigned or user-assigned managed identity. In the UI, this is called Active Directory MSI.
Required parameters for SQL_PASSWORD#
  • --database-user The user name to access the database. In the UI, this is called Database Username.
  • --database-password The user password to access the database. In the UI, this is called Database Password.
Required parameters for AD_MSI#

To use this method, the following pre-requirements must be met:

  • Data Migrator or the remote Azure Hive agent must be installed on an Azure resource with the managed identity assigned to it. The host must also have Azure AD authentication enabled.

  • Your Azure SQL server must be enabled for Azure AD authentication.

  • You have created a contained user in the Azure SQL database that is mapped to the Azure AD resource (where Data Migrator or the remote Azure Hive agent is installed).

    • The username of the contained user will depend on whether you are using a system-assigned or user-assigned identity.

      Azure SQL database command for a system-assigned managed identity
      CREATE USER "<azure_resource_name>" FROM EXTERNAL PROVIDER;ALTER ROLE db_owner ADD MEMBER "<azure_resource_name>";

      The <azure_resource_name> is the name of the Azure resource where Data Migrator or remote Azure Hive agent is installed (for example: myAzureVM).

      Azure SQL database command for a user-assigned managed identity
      CREATE USER <managed_identity_name> FROM EXTERNAL PROVIDER;ALTER ROLE db_owner ADD MEMBER <managed_identity_name>;

      The <managed_identity_name> is the name of the user-assigned managed identity (for example: myManagedIdentity).

Once all pre-requirements are met, see the system-assigned identity or user-assigned identity parameters.

System-assigned identity#

No other parameters are required for a system-managed identity.

User-assigned identity#

The --client-id parameter must be specified:

  • --client-id The Client ID of your Azure managed identity. In the UI, this is called MSI Client ID.

Parameters for remote Hive agents only#

  • --host The host where the remote Hive agent will be deployed.
  • --port The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.
  • --no-ssl TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it.
Parameters for automated deployment#
  • --autodeploy The remote agent will be automatically deployed when this flag is used. If using this, the --ssh-key parameter must also be specified.
  • --ssh-user The SSH user to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter).
  • --ssh-key The absolute path to the SSH private key to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter).
  • --ssh-port The SSH port to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter). Default is port 22.
  • --use-sudo All commands performed by the SSH user will use sudo on the remote host when performing automatic deployment (using the --autodeploy parameter).
  • --ignore-host-checking Ignore strict host key checking when performing the automatic deployment (using the --autodeploy parameter).
Steps for manual deployment#

If you do not wish to use the --autodeploy function, follow these steps to deploy a remote Hive agent for Azure SQL manually:

  1. Transfer the remote server installer to your remote host (Azure VM, HDI cluster node):

    Example of secure transfer from local to remote host
         scp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
  2. On your remote host, make the installer script executable:

         chmod +x hivemigrator-remote-server-installer.sh
  3. On your remote host, run the installer as root (or sudo) user in silent mode:

         ./hivemigrator-remote-server-installer.sh -- --silent
  4. On your remote host, start the remote server service:

         service hivemigrator-remote-server start
  5. On your local host, run the hive agent add azure command without using --autodeploy and its related parameters to configure your remote Hive agent.

    See the Example for remote Azure SQL deployment - manual example below for further guidance.

Examples#

Example for local Azure SQL deployment with SQL username/password
        hive agent add azure --name azureAgent --db-server-name mysqlserver.database.windows.net --database-name mydb1 --auth-method SQL_PASSWORD --database-user azureuser --database-password mypassword --storage-account myadls2 --container-name mycontainer --hdi-version 4.0 --file-system-id myadls2storage
Example for remote Azure SQL deployment with System-assigned managed identity - automated
        hive agent add azure --name azureRemoteAgent --db-server-name mysqlserver.database.windows.net --database-name mydb1 --auth-method AD_MSI --storage-account myadls2 --container-name mycontainer --hdi-version 4.0 --file-system-id myadls2storage --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5052
Example for remote Azure SQL deployment with User-assigned managed identity - manual
        hive agent add azure --name azureRemoteAgent --db-server-name mysqlserver.database.windows.net --database-name mydb1 --auth-method AD_MSI --client-id b67f67ex-ampl-e2eb-bd6d-client9385id --storage-account myadls2 --container-name mycontainer --hdi-version 4.0 --file-system-id myadls2storage --host myRemoteHost.example.com --port 5052
remote deployments

For a remote Hive agent connection, enter a remote host (EC2 instance) that will be used to communicate with the local Data Migrator server (constrained to a user-defined port).

A small service will be deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.

hive agent add filesystem#

Add a filesystem Hive agent to connect to your host's local filesystem using the hive agent add filesystem command.

Add filesystem agent
        hive agent add filesystem [--file-system-id] string                                  [--root-folder] string                                  [--name] string
  • --file-system-id The filesystem ID to be used.
  • --root-folder The path to use as the root directory for the filesystem agent.
  • --name The ID to give to the new Hive agent.

Example#

        hive agent add filesystem --file-system-id myfilesystem --root-folder /var/lib/mysql --name fsAgent

hive agent add glue#

Add an AWS Glue Hive agent to connect to an AWS Glue data catalog using the hive agent add glue command.

If your Data Migrator host can communicate directly with the AWS Glue Data Catalog, then a local Hive agent will be sufficient. Otherwise, consider using a remote Hive agent.

remote deployments

For a remote Hive agent connection, enter a remote host (EC2 instance) that will be used to communicate with the local Data Migrator server (constrained to a user-defined port).

A small service will be deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.

Add AWS Glue agent
        hive agent add glue [--name] string                            [--access-key] string                            [--secret-key] string                            [--glue-endpoint] string                            [--aws-region] string                            [--glue-catalog-id] string                            [--credentials-provider] string                            [--glue-max-retries] integer                            [--glue-max-connections] integer                            [--glue-max-socket-timeout] integer                            [--glue-connection-timeout] integer                            [--file-system-id] string                            [--default-fs-override] string                            [--host] string                            [--port] integer                            [--no-ssl]

Glue parameters#

  • --name The ID to give to the new Hive agent. In the UI, this is called Name.
  • --glue-endpoint The AWS Glue service endpoint for connections to the data catalog. VPC endpoint types are also supported. In the UI, this is called AWS Glue Service Endpoint.
  • --aws-region The AWS region that your data catalog is located in (default is us-east-1). If --glue-endpoint is specified, this parameter will be ignored. In the UI, this is called AWS Region.

Additionally, use only one of the following parameters:

  • --file-system-id The name of the filesystem that will be associated with this agent (for example: mys3bucket). This will ensure any path mappings are correctly linked between the filesystem and the agent. In the UI, this is called Filesystem.
  • --default-fs-override Enter an override for the default filesystem URI instead of a filesystem name (for example: s3a://mybucket/). In the UI, this is called Default Filesystem Override.

Glue credential parameters#

Glue optional parameters#

  • --glue-catalog-id The AWS Account ID to access the Data Catalog. This is used if the Data Catalog is owned by a different account to the one provided by the credentials provider and cross-account access has been granted.
  • --glue-max-retries The maximum number of retries the Glue client will perform after an error.
  • --glue-max-connections The maximum number of parallel connections the Glue client will allocate.
  • --glue-max-socket-timeout The maximum time the Glue client will allow for an established connection to timeout.
  • --glue-connection-timeout The maximum time the Glue client will allow to establish a connection.

Parameters for remote Hive agents only#

  • --host The host where the remote Hive agent will be deployed.
  • --port The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.
  • --no-ssl TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it.
Steps for remote agent deployment#

Follow these steps to deploy a remote Hive agent for AWS Glue:

  1. Transfer the remote server installer to your remote host (Amazon EC2 instance):

    Example of secure transfer from local to remote host
         scp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
  2. On your remote host, run the installer as root (or sudo) user in silent mode:

         ./hivemigrator-remote-server-installer.sh -- --silent
  3. On your remote host, start the remote server service:

         service hivemigrator-remote-server start
  4. On your local host, run the hive agent add glue command to configure your remote Hive agent.

    See the Example for remote AWS Glue agent example below for further guidance.

Examples#

Example for local AWS Glue agent
        hive agent add glue --name glueAgent --access-key ACCESS6HCFPAQIVZTKEY --secret-key SECRET1vTMuqKOIuhET0HAI78UIPfSRjcswTKEY --glue-endpoint glue.eu-west-1.amazonaws.com --aws-region eu-west-1 --file-system-id mys3bucket
Example for remote AWS Glue agent
        hive agent add glue --name glueAgent --access-key ACCESS6HCFPAQIVZTKEY --secret-key SECRET1vTMuqKOIuhET0HAI78UIPfSRjcswTKEY --glue-endpoint glue.eu-west-1.amazonaws.com --aws-region eu-west-1 --file-system-id mys3bucket --host myRemoteHost.example.com --port 5052

hive agent add hive#

Add a Hive agent to connect to a local or remote Apache Hive Metastore using the hive agent add hive command.

remote deployments

When connecting to a remote Apache Hive Metastore, enter a host on the remote cluster that will be used to communicate with the local Data Migrator server (constrained to a user-defined port).

A small service will be deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.

Add local or remote Hive agent
        hive agent add hive [--config-path] string                            [--config-file] string                            [--kerberos-principal] string                            [--kerberos-keytab] string                            [--name] string                            [--host] string                            [--port] integer                            [--no-ssl]                            [--autodeploy]                            [--ssh-user] string                            [--ssh-key] file                            [--ssh-port] int                            [--use-sudo]                            [--ignore-host-checking]                            [--file-system-id] string                            [--default-fs-override] string

Mandatory parameters#

  • --kerberos-principal Not required if Kerberos is disabled. The Kerberos principal to use to access the Hive service (for example: hive/myhost.example.com@REALM.COM). In the UI, this is called Principal.
  • --kerberos-keytab Not required if Kerberos is disabled. The path to the Kerberos keytab containing the principal to access the Hive service (for example: /etc/security/keytabs/hive.service.keytab). In the UI, this is called Keytab.
  • --name The ID to give to the new Hive agent. In the UI, this is called Name.

Additionally, use only one of the following parameters:

  • --file-system-id The name of the filesystem that will be associated with this agent (for example: myhdfs). This will ensure any path mappings are correctly linked between the filesystem and the agent. In the UI, this is called Filesystem.
  • --default-fs-override Enter an override for the default filesystem URI instead of a filesystem name (for example: hdfs://nameservice01). In the UI, this is called Default Filesystem Override.

Optional parameters#

  • --config-path The path to the directory containing the Hive configuration files core-site.xml, hive-site.xml, and hdfs-site.xml. If not specified, Data Migrator will use the default location for the cluster distribution. In the UI, this is called Override Default Hadoop Configuration Path.
  • --config-file If the configuration files are not located on the same path, use this parameter to enter all the paths as a comma-delimited list. For example, /path1/core-site.xml,/path2/hive-site.xml,/path3/hdfs-site.xml.

When configuring a CDP target#

  • --jdbc-url The JDBC URL for the database.
  • --jdbc-driver-name Full class name of JDBC driver.
  • --jdbc-username Full class name of JDBC driver.
  • --jdbc-password Password for connecting to database.
important

Don't use the optional parameters, --config-path and --config-file in the same add command.
Use --config-path when configuration files are on the same path, or --config-file when the configuration files are on separate paths.

Parameters for remote Hive agents only#

  • --host The host where the remote Hive agent will be deployed.
  • --port The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.
  • --no-ssl TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it.
Parameters for automated deployment#
  • --autodeploy The remote agent will be automatically deployed when this flag is used. If using this, the --ssh-key parameter must also be specified.
  • --ssh-user The SSH user to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter).
  • --ssh-key The absolute path to the SSH private key to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter).
  • --ssh-port The SSH port to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter). Default is port 22.
  • --use-sudo All commands performed by the SSH user will use sudo on the remote host when performing automatic deployment (using the --autodeploy parameter).
  • --ignore-host-checking Ignore strict host key checking when performing the automatic deployment (using the --autodeploy parameter).
Steps for manual deployment#

If you do not wish to use the --autodeploy function, follow these steps to deploy a remote Hive agent for Apache Hive manually:

  1. Transfer the remote server installer to your remote host:

    Example of secure transfer from local to remote host
         scp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
  2. On your remote host, make the installer script executable:

         chmod +x hivemigrator-remote-server-installer.sh
  3. On your remote host, run the installer as root (or sudo) user in silent mode:

         ./hivemigrator-remote-server-installer.sh -- --silent
  4. On your remote host, start the remote server service:

         service hivemigrator-remote-server start
  5. On your local host, run the hive agent add hive command without using --autodeploy and its related parameters to configure your remote Hive agent.

    See the Example for remote Apache Hive deployment - manual example below for further guidance.

Example for local Apache Hive deployment
        hive agent add hive --name sourceAgent --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@LOCALREALM.COM --file-system-id mysourcehdfs
Example for remote Apache Hive deployment - automated
        hive agent add hive --name targetautoAgent --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5052 --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@REMOTEREALM.COM --config-path <example directory path> --file-system-id mytargethdfs

Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.

Example for remote Apache Hive deployment - manual
        hive agent add hive --name targetmanualAgent --host myRemoteHost.example.com --port 5052 --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@REMOTEREALM.COM --config-path <example directory path> --file-system-id mytargethdfs

Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.

info

If you enter Kerberos and configuration path information for remote agents, ensure that the directories and Kerberos principal are correct for your chosen remote host (not your local host).

hive agent add databricks#

note

Databricks agents are currently available as a preview feature.

info

The source table format must be in one of the following formats to ensure a successful migration to Databricks Delta Lake:

  • CSV
  • JSON
  • Avro
  • ORC
  • Parquet
  • Text

Add a Databricks Hive agent to connect to a Databricks Delta Lake metastore (AWS, Azure, or Google Cloud Platform (GCP)) using the hive agent add databricks command.

Add Databricks agent
        hive agent add databricks [--name] string                                  [--jdbc-server-hostname] string                                  [--jdbc-port] int                                  [--jdbc-http-path] string                                  [--access-token] string                                  [--fs-mount-point] string                                  [--convert-to-delta]                                  [--delete-after-conversion]                                  [--file-system-id] string                                  [--default-fs-override] string                                  [--host] string                                  [--port] integer                                  [--no-ssl]

Enable JDBC connections to Databricks#

The following steps are required to enable Java Database Connectivity (JDBC) to Databricks Delta Lake:

  1. Download the Databricks JDBC driver.

  2. Unzip the package and upload the SparkJDBC42.jar file to the LiveData Migrator host machine.

  3. Move the SparkJDBC42.jar file to the LiveData Migrator directory below:

         /opt/wandisco/hivemigrator/agent/databricks
  4. Change ownership of the Jar file to the HiveMigrator system user and group:

    Example for hive:hadoop
         chown hive:hadoop /opt/wandisco/hivemigrator/agent/databricks/SparkJDBC42.jar

Databricks mandatory parameters#

  • --name The ID to give to the new Hive agent. In the UI, this is called Name.
  • --jdbc-server-hostname The server hostname for the Databricks cluster (AWS, Azure or GCP). In the UI, this is called JDBC Server Hostname.
  • --jdbc-port The port used for JDBC connections to the Databricks cluster (AWS, Azure or GCP). In the UI, this is called JDBC Port.
  • --jdbc-http-path The HTTP path for the Databricks cluster (AWS, Azure or GCP). In the UI, this is called JDBC Http Path.
  • --access-token The personal access token to be used for the Databricks cluster (AWS, Azure or GCP). In the UI, this is called Access Token.

Additionally, use only one of the following parameters:

important

If the --convert-to-delta option is used, the --default-fs-override parameter must also be provided with the value set to dbfs:, or a path inside the Databricks filesystem. For example, dbfs:/mount/externalStorage.

  • --file-system-id The name of the filesystem that will be associated with this agent (for example: myadls2 or mys3bucket). This will ensure any path mappings are correctly linked between the filesystem and the agent. In the UI, this is called Filesystem.
  • --default-fs-override Provide an override for the default filesystem URI instead of a filesystem name (for example: dbfs:). In the UI, this is called DefaultFs Override.

Databricks optional parameters#

  • --fs-mount-point Define the ADLS/S3/GCP location in the Databricks filesystem for containing migrations (for example: /mnt/mybucketname). In the UI, this is called FS Mount Point.
note

This parameter is required if --convert-to-delta is used. The Databricks agent will copy all associated table data and metadata into this location within the Databricks filesystem during conversion.

  • --convert-to-delta All underlying table data and metadata is migrated to the filesystem location defined by the --fs-mount-point parameter. Use this option to automatically copy the associated data and metadata into Delta Lake on Databricks (AWS, Azure or GCP), and convert tables into Delta Lake format. In the UI, this is called Convert to delta format.

    The following parameter can only be used if --convert-to-delta has been specified:

    • --delete-after-conversion Use this option to delete the underlying table data and metadata from the filesystem location (defined by --fs-mount-point) once it has been converted into Delta Lake on Databricks. In the UI, this is called Delete after conversion.

      important

      Only use this option if you are performing one-time migrations for the underlying table data. The Databricks agent does not support continuous (live) updates of table data when transferring to Delta Lake on Databricks.

  • If a migration to Databricks runs without the --convert-to-delta option, then some migrated data may not be visible from the Databricks side. To avoid this issue, ensure that the value of default-fs-override is set to "dbfs:" with the value of --fs-mount-point.

    Example:

    --default-fs-override dbfs:/mnt/mybucketname    

Example#

Example for local Databricks agent
        hive agent add databricks --name databricksAgent --jdbc-server-hostname mydbcluster.cloud.databricks.com  --jdbc-port 443 --jdbc-http-path sql/protocolv1/o/8445611123456789/0234-125567-testy978 --access-token daexamplefg123456789t6f0b57dfdtoken4 --file-system-id mys3bucket --default-fs-override dbfs:/mnt/mybucketname --fs-mount-point /mnt/mybucket --convert-to-delta

hive agent add dataproc#

Add a Hive agent to connect to a local or remote Google Dataproc Metastore using the hive agent add dataproc command.

remote deployments

When connecting to a remote Dataproc Metastore, enter a host on the remote cluster that will be used to communicate with the local Data Migrator server (constrained to a user-defined port).

A small service will be deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.

Add local or remote Dataproc agent
        hive agent add dataproc [--config-path] string                                [--kerberos-principal] string                                [--kerberos-keytab] string                                [--name] string                                [--host] string                                [--port] integer                                [--no-ssl]                                [--autodeploy]                                [--ssh-user] string                                [--ssh-key] file                                [--ssh-port] int                                [--use-sudo]                                [--ignore-host-checking]                                [--file-system-id] string                                [--default-fs-override] string

Mandatory parameters#

  • --kerberos-principal Not required if Kerberos is disabled. The Kerberos principal to use to access the Hive service (for example: hive/myhost.example.com@REALM.COM). In the UI, this is called Principal.
  • --kerberos-keytab Not required if Kerberos is disabled. The path to the Kerberos keytab containing the principal to access the Hive service (for example: /etc/security/keytabs/hive.service.keytab). In the UI, this is called Keytab.
  • --name The ID to give to the new Hive agent. In the UI, this is called Name.

Additionally, use only one of the following parameters:

  • --file-system-id The name of the filesystem that will be associated with this agent (for example: myhdfs). This will ensure any path mappings are correctly linked between the filesystem and the agent. In the UI, this is called Filesystem.

Optional parameters#

  • --default-fs-override Enter an override for the default filesystem URI instead of a filesystem name (for example: hdfs://nameservice01). In the UI, this is called Default Filesystem Override.
  • --config-path The path to the directory containing the Hive configuration files core-site.xml, hive-site.xml and hdfs-site.xml. If not specified, Data Migrator will use the default location for the cluster distribution. In the UI, this is called Override Default Hadoop Configuration Path.

Parameters for remote Hive agents only#

  • --host The host where the remote Hive agent will be deployed.
  • --port The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.
  • --no-ssl TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it.
Parameters for automated deployment#
  • --autodeploy The remote agent will be automatically deployed when this flag is used. If using this, the --ssh-key parameter must also be specified.
  • --ssh-user The SSH user to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter).
  • --ssh-key The absolute path to the SSH private key to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter).
  • --ssh-port The SSH port to use for authentication on the remote host to perform automatic deployment (when using the --autodeploy parameter). Default is port 22.
  • --use-sudo All commands performed by the SSH user will use sudo on the remote host when performing automatic deployment (using the --autodeploy parameter).
  • --ignore-host-checking Ignore strict host key checking when performing the automatic deployment (using the --autodeploy parameter).
Steps for manual deployment#

If you do not wish to use the --autodeploy function, follow these steps to deploy a remote Hive agent for Apache Hive manually:

  1. Transfer the remote server installer to your remote host:

    Example of secure transfer from local to remote host
         scp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
  2. On your remote host, make the installer script executable:

         chmod +x hivemigrator-remote-server-installer.sh
  3. On your remote host, run the installer as root (or sudo) user in silent mode:

         ./hivemigrator-remote-server-installer.sh -- --silent
  4. On your remote host, start the remote server service:

         service hivemigrator-remote-server start
  5. On your local host, run the hive agent add dataproc command without using --autodeploy and its related parameters to configure your remote Hive agent.

    See the Example for remote Apache Hive deployment - manual example below for further guidance.

Examples#

Example for local Apache Hive deployment
        hive agent add dataproc --name sourceAgent --file-system-id mysourcehdfs
Example for remote Apache Hive deployment - automated
        hive agent add dataproc --name targetautoAgent --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5052 --config-path <example directory path> --file-system-id mytargethdfs

Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.

Example for remote Apache Hive deployment - manual
        hive agent add dataproc --name targetmanualAgent --host myRemoteHost.example.com --port 5052 --config-path <example directory path> --file-system-id mytargethdfs

Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.

note

If you enter Kerberos and configuration path information for remote agents, ensure that the directories and Kerberos principal are correct for your chosen remote host (not your local host).

hive agent add snowflake basic#

Add an agent using basic authentication.

Add a Snowflake agent using basic authentication
        hive agent add snowflake basic [--account-identifier] string                                           [--file-system-id] string                                       [--name ] string                                       [--password] string                                                      [--stage] string                                                        [--stage-schema] string                                                   [--warehouse] string                                         [--default-fs-override] string                                                     [--schema] string                                                         [--stage-database] string                                                [--user] string  

Mandatory parameters#

  • --account-identifier is the unique ID for your Snowflake account.
  • --name is a name that will be used to reference the remote agent.
  • --warehouse is the Snowflake-based cluster of compute resources.
  • --stage storage used to temporarily store data that is being moved into Snowflake. A stage can be internal (to Snowflake), or external, using a cloud storage service from Amazon S3, Azure, or Google Cloud.
  • --user is your Snowflake username.

Additionally, use only one of the following parameters:

  • --file-system-id is the ID of the target filesystem. In the UI, this is called Filesystem.
  • --default-fs-override is an override for the default filesystem URI instead of a filesystem name.

Optional parameters#

  • --stage-database is an optional parameter for a Snowflake stage database with the default value "WANDISCO".
  • --stage-schema - is an optional parameter for a Snowflake stage schema, with the default value "PUBLIC".
  • --schema - is an optional parameter for a Snowflake schema, with the default value "PUBLIC".

Examples#

Example of adding a Snowflake agent with basic authentication
        hive agent add snowflake basic --account-identifier test_adls2 --name snowflakeAgent --stage myAzure --user exampleUser -- password examplePassword --warehouse DemoWH2

hive agent add snowflake privatekey#

Add a Snowflake agent using a private key
        hive agent add snowflake privatekey     [--account-ID] string                                                    [--file-system-id] string                                                         [--private-key-file]  string                                                [--private-key-file-pwd]  string                                                       [--schema]   string                                                                [--stage-database]  string                                                         [--warehouse]  string                                                [--default-fs-override] string                                                   [--name]  string                                                                          [--stage]  string                                                                   [--stage-schema]  string                                                   [--user] string

Mandatory parameters#

  • --account-identifier is the unique ID for your Snowflake account.
  • --private-key-file is the path to your private key file.
  • --private-key-file-pwd is the password that corresponds with the above private-key-file.
  • --name is a name that will be used to reference the remote agent.
  • --warehouse is the Snowflake-based cluster of compute resources.
  • --stage storage used to temporarily store data that is being moved into Snowflake. A stage can be internal (to Snowflake), or external, using a cloud storage service from Amazon S3, Azure, or Google Cloud.
  • --user is your Snowflake username.

Additionally, use only one of the following parameters:

  • --file-system-id is the ID of the target filesystem. In the UI, this is called Filesystem.
  • --default-fs-override is an override for the default filesystem URI instead of a filesystem name.

Optional parameters#

  • --stage-database is an optional parameter for a Snowflake stage database with the default value "WANDISCO".
  • --stage-schema - is an optional parameter for a Snowflake stage schema, with the default value "PUBLIC".
  • --schema - is an optional parameter for a Snowflake schema, with the default value "PUBLIC".

hive agent check#

Check the configuration of an existing Hive agent using hive agent check.

Check if agent configuration is valid & connectable
        hive agent check [--name] string

Example#

        hive agent check --name azureAgent

hive agent configure azure#

Change the configuration of an existing Azure Hive agent using hive agent configure azure.

The parameters that can be changed are the same as the ones listed in the hive agent add azure section.

All parameters are optional except --name, which is required to enter the existing Hive agent that you wish to configure.

Example#

        hive agent configure azure --name azureAgent --database-password CorrectPassword

hive agent configure filesystem#

Change the configuration of an existing filesystem Hive agent using hive agent configure filesystem.

The parameters that can be changed are the same as the ones listed in the hive agent add filesystem section.

All parameters are optional except --name, which is required to enter the existing Hive agent that you wish to configure.

Example#

        hive agent configure filesystem --name fsAgent --root-folder /user/dbuser/databases

hive agent configure glue#

Change the configuration of an existing AWS Glue Hive agent using hive agent configure glue.

The parameters that can be changed are the same as the ones listed in the hive agent add glue section.

All parameters are optional except --name, which is required to enter the existing Hive agent that you wish to configure.

Example#

        hive agent configure glue --name glueAgent --aws-region us-east-2

hive agent configure hive#

Change the configuration of an existing Apache Hive agent using hive agent configure hive.

The parameters that can be changed are the same as the ones listed in the hive agent add hive section.

All parameters are optional except --name, which is required to enter the existing Hive agent that you wish to configure.

Example#

        hive agent configure hive --name sourceAgent --kerberos-keytab /opt/keytabs/hive.keytab --kerberos-principal hive/myhostname.example.com@REALM.COM

hive agent configure databricks#

Change the configuration of an existing Databricks agent using hive agent configure databricks.

The parameters that can be changed are the same as the ones listed in the hive agent add databricks section.

All parameters are optional except --name, which is required to enter the existing Hive agent that you wish to configure.

Example#

        hive agent configure hive --name databricksAgent --access-token myexamplefg123456789t6fnew7dfdtoken4

hive agent configure dataproc#

Change the configuration of an existing Dataproc agent using hive agent configure dataproc.

The parameters that can be changed are the same as the ones listed in the hive agent add dataproc section.

All parameters are optional except --name, which is required to enter the existing Hive agent that you wish to configure.

Example#

        hive agent configure dataproc --name dataprocAgent --port 9099

hive agent configure snowflake#

Configure an existing Snowflake remote agent by using the hive agent configure snowflake command.

Add a remote Snowflake agent using basic authentication
        hive agent configure snowflake basic [--account-identifier] string                                                [--file-system-id] string                                             [--user] string                                                  [--password] string                                                            [--stage] string                                                               [--stage-schema] string                                                        [--warehouse] string                                             [--default-fs-override] string                                                 [--name] string                                                               [--schema] string                                                              [--stage-database] string        

Example Snowflake remote agent configuration#

        hive agent configure snowflake basic --user snowflakeAgent --password <password-here> --stage internal
Configure a remote Snowflake agent using privatekey authentication
        hive agent configure snowflake privatekey [--account-identifier] string                                                       [--file-system-id] string                                                      [--private-key-file] string                                                  [--private-key-file-pwd] string                                                          [--schema] string                                                                     [--stage-database] string                                                             [--warehouse] string                                                    [--default-fs-override] string                                                       [--name] string                                                                       [--stage] string                                                          [--stage-schema] string

Example Snowflake remote agent configuration#

        hive agent configure snowflake privatekey --private-key-file-pwd <password> --private-key-file /path/to/keyfiles/ --user snowflakeAgent --schema star-schema

hive agent delete#

Delete the specified Hive agent with hive agent delete.

Delete agent
        hive agent delete [--name] string

Example#

        hive agent delete --name azureAgent

hive agent list#

List configured Hive agents with hive agent list.

List already added agents
        hive agent list [--detailed]

Example#

        hive agent list --detailed

hive agent show#

Show the configuration of a Hive agent with hive agent show.

Show agent configuration
        hive agent show [--name] string

Example#

        hive agent show --name azureAgent

hive agent types#

Print a list of supported Hive agent types with hive agent types.

Print list of supported agent types
        hive agent types

Example#

        hive agent types

Exclusion commands#

exclusion add date#

Create a date-based exclusion that checks the 'modified date' of any directory or file that the Data Migrator encounters during a migration to which the exclusion has been applied. If the path or file being examined by Data Migrator has a 'modified date' earlier than the specified date, it will be excluded from the migration.

Once associated with a migration using migration exclusion add, files that match the policy will not be migrated.

Create a new date-based rule
        exclusion add date [--exclusion-id] string                           [--description] string                           [--before-date] string

Mandatory parameters#

  • --exclusion-id The ID for the exclusion policy. In the UI, this is called Name.
  • --description A user-friendly description for the policy. In the UI, this is called Description.
  • --before-date An ISO formatted date and time, which can include an offset for a particular time zone. In the UI, this is called TBA.

Example#

        exclusion add date --exclusion-id beforeDate --description "Files earlier than 2020-10-01T10:00:00PDT" --before-date 2020-10-01T10:00:00-07:00

exclusion add file-size#

Create an exclusion that can be applied to migrations to constrain the files transferred by a policy based on file size. Once associated with a migration using migration exclusion add, files that match the policy will not be migrated.

Create a new exclusion by file size policy
        exclusion add file-size [--exclusion-id] string                                [--description] string                                [--value] long                                [--unit] string

Mandatory parameters#

  • --exclusion-id The ID for the exclusion policy. In the UI, this is called Name.
  • --description A user-friendly description for the policy. In the UI, this is called Description.
  • --value The numerical value for the file size, in a unit defined by the --unit parameter. In the UI, this is called Value.
  • --unit A string to define the unit used. You can use B for bytes, GB for gigabytes, KB for kilobytes, MB for megabytes, PB for petabytes, TB for terabytes, GiB for gibibytes, KiB for kibibytes, MiB for mebibytes, PiB for pebibytes, or TiB for tebibytes when creating exclusions with the CLI.

Example#

        exclusion add file-size --exclusion-id 100mbfiles --description "Files greater than 100 MB" --value 100 --unit MB

exclusion add regex#

Create an exclusion using a regular expression to prevent certain files and directories being transferred based on matching file or directory names. Once associated with a migration using migration exclusion add, files and directories that match the regular expression will not be migrated.

Create a new exclusion by regular expression policy
        exclusion add regex [--exclusion-id] string                            [--description] string                            [--regex] string                            [--type] string

Mandatory parameters#

  • --exclusion-id The ID for the exclusion policy. In the UI, this is called Name.
  • --description A user-friendly description for the policy. In the UI, this is called Description.
  • --regex A regular expression in a syntax of either Java PCRE, Automata or GLOB type. In the UI, this is called Regex.

Optional parameters#

  • --type Choose the regular expression syntax type. There are three options available:

    1. JAVA_PCRE (default)
    2. AUTOMATA
    3. GLOB

Examples#

Example glob pattern
        exclusion add regex --description "No paths or files that start with test" --exclusion-id exclusion1 --type GLOB --regex test*
Example Java PCRE pattern
        exclusion add regex --description "No paths of files that start with test" --exclusion-id exclusion1 --regex ^test\.*

Using backslash characters within --regex parameter#

If you wish to use a \ character as part of your regex value, you must escape this character with an additional backslash.

Example
        exclusion add regex --description "No paths that start with a backslash followed by test"  --exclusion-id exclusion2 --regex ^\\test\.*

The response displayed if running through the CLI will not hide the additional backslash. However, the internal representation will be as expected within Data Migrator (it will read as ^\test.*).

This workaround isn't required for API inputs, as it only affects the Spring Shell implementation used for the CLI.

exclusion delete#

Delete an exclusion policy so that it is no longer available for migrations.

        exclusion delete [--exclusion-id] string

Mandatory parameters#

  • --exclusion-id The ID for the exclusion policy to delete. In the UI, this is called Name.

Example#

        exclusion delete --exclusion-id exclusion1

exclusion list#

List all exclusion policies defined.

List all exclusion policies
        exclusion list - List all exclusion rules.

exclusion show#

Get details for an individual exclusion policy by ID.

Get details for a specific exclusion rule
        exclusion show [--exclusion-id] string

Mandatory parameters#

  • --exclusion-id The ID for the exclusion policy to show. In the UI, this is called Name.

Example#

        exclusion show --exclusion-id 100mbfiles

Migration commands#

migration add#

Create a new migration to initiate data migration from your source filesystem.
        migration add [--name or --migration-id] string                      [--path] string                      [--target] string                      [--exclusions] string                      [--action-policy] string                      [--auto-start]                      [--source] string                      [--scan-only]                      [--verbose]                      [--detailed]
caution

Do not write to target filesystem paths when a migration is underway. This could interfere with Data Migrator functionality and lead to undetermined behavior.

Use different filesystem paths when writing to the target filesystem directly (and not through Data Migrator).

Mandatory parameters#

  • --path Defines the source filesystem directory that is the scope of the migration. All content (other than that excluded) will be migrated to the target. In the UI, this is called Path for {source-filesystem}.
note

ADLS Gen2 has a filesystem restriction of 60 segments. Make sure your path has less than 60 segments when defining the path string parameter.

  • --target Specifies the name of the target filesystem resource to which migration will occur. In the UI, this is called Target.

Optional parameters#

  • --name or --migration-id Enter a name or ID for the new migration. An ID is auto-generated if you don't enter one. In the UI, this is called Migration Name.
  • --exclusions A comma-separated list of exclusions by name. In the UI, this is called Add new exclusion.
  • --auto-start Enter this parameter if you want the migration to start immediately. If you don't enter one, the migration will only take effect once you start to run it. In the UI, this is called Auto-start migration.
  • --action-policy This parameter determines what happens if the migration encounters content in the target path with the same name and size. In the UI, this is called Skip Or Overwrite Settings.
    There are two options available:
    1. com.wandisco.livemigrator2.migration.OverwriteActionPolicy (default policy)
      Every file is replaced, even if file size is identical on the target storage. In the UI, this is called Overwrite.
    2. com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy
      If the file size is identical between the source and target, the file is skipped. If it’s a different size, the whole file is replaced. In the UI, this is called Skip if Size Match.
  • --source Specifies the name of the source filesystem.
  • --scan-only Select this option to create a one-time migration.
  • --verbose Enter this parameter for additional information about the migration.
  • --detailed Enter this parameter for additional information about the migration.

Example#

        migration add --path /repl1 --target mytarget –-migration-id myNewMigration --exclusions 100mbfiles

migration delete#

Delete a stopped migration resource.

Delete a migration
        migration delete [--name or --migration-id] string

Mandatory parameters#

  • --name or --migration-id The migration name or ID to delete.

Example#

        migration delete --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e

migration exclusion add#

Associate an exclusion resource with a migration so that the exclusion policy applies to items processed for the migration. Exclusions must be associated with a migration before they take effect.

Add an exclusion to a migration
        migration exclusion add [--name or --migration-id] string                                [--exclusion-id] string

Mandatory parameters#

  • --name or --migration-id The migration name or ID with which to associate the exclusion.
  • --exclusion-id The ID of the exclusion to associate with the migration. In the UI, this is called Name.

Example#

        migration exclusion add --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e --exclusion-id myexclusion1

migration exclusion delete#

Remove an exclusion from association with a migration so that its policy no longer applies to items processed for the migration.

Remove an exclusion from a migration
        migration exclusion delete      [--name or --migration-id] string                                        [--exclusion-id] string

Mandatory parameters#

  • --name or --migration-id The migration name or ID from which to remove the exclusion.
  • --exclusion-id The ID of the exclusion to remove from the migration. In the UI, this is called Name.

Example#

        migration exclusion delete --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e --exclusion-id myexclusion1

migration list#

Present the list of all migrations defined.

List running and active migrations
        migration list [--detailed or --verbose]

Optional parameters#

  • --detailed or --verbose Returns additional information about each migration.

migration path status#

View all actions scheduled on a source filesystem in the specified path.

Show information on the migration status of a path on the source filesystem
        migration path status [--source-path] string                                 [--source] string

Mandatory parameters#

  • --source-path The path on the filesystem to review actions for. Supply a full directory.
  • --source The filesystem ID of the source system the path is in.

Example#

        migration path status --source-path /root/mypath/ --source mySource

migration pending-region add#

Add a path for rescanning to a migration.

Add a path for rescanning to a migration
        migration pending-region add    [--name or --migration-id] string                                        [--path] string                                        [--action-policy] string

Mandatory parameters#

  • --name or --migration-id The migration name or ID.
  • --path The path string of the region to add for rescan.

Optional parameters#

  • --action-policy This parameter determines what happens if the migration encounters content in the target path with the same name and size. In the UI, this is called Skip Or Overwrite Settings.
    There are two options available:
    1. com.wandisco.livemigrator2.migration.OverwriteActionPolicy (default policy)
      Every file is replaced, even if file size is identical on the target storage. In the UI, this is called Overwrite.
    2. com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy
      If the file size is identical between the source and target, the file is skipped. If it’s a different size, the whole file is replaced. In the UI, this is called Skip if Size Match.

Example#

        migration pending-region add --name myMigration --path etc/files --action-policy com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy

migration reset#

Reset a stopped migration to the state it was in before it was started. This deletes and replaces it with a new migration that has the same settings as the old one.

Reset a migration
        migration reset  [--name or --migration-id] string                        [--action-policy] string                        [--reload-mappings]                           [--detailed or --verbose]

Mandatory parameters#

  • --name or --migration-id The name of the migration you want to reset.
  • --migration-id The ID of the migration you want to reset.

Optional parameters#

  • --action-policy Accepts two string values: com.wandisco.livemigrator2.migration.OverwriteActionPolicy causes the new migration to re-migrate all files from scratch, including those already migrated to the target filesystem, regardless of file size. com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy skips migrating files that exist on both the target and source, if the file size is consistent between them. Use tab auto-completion with this parameter to view both options and a short description of each.
  • --reload-mappings Resets the migration's path mapping configuration, using the newest default path mapping configuration for Data Migrator.
  • --detailed or --verbose Returns additional information about the reset migration, similarly to migration show.

Example#

        migration reset --name mymigration

migration resume#

Resume a migration that you've stopped from transferring content to its target.

Resume a migration
        migration resume [--name or --migration-id] string                         [--detailed or --verbose]

Mandatory parameters#

  • --name or --migration-id The migration name or ID to resume.

Example#

        migration resume --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e

migration run,migration start#

Start a migration that was created without the --auto-start parameter.

Start a migration.
        migration run [--name or --migration-id] string                        [--detailed or --verbose]

Mandatory parameters#

  • --name or --migration-id The migration name or ID to run.

Optional parameters#

  • --detailed or --verbose Outputs additional information about the migration.

Example#

        migration run –-migration-id myNewMigration

migration show#

Enter a JSON description of a specific migration.

Get migration details
        migration show  [--name or --migration-id] string                        [--detailed or --verbose]

Mandatory parameters#

  • --name or --migration-id The migration name or ID to show.

Optional parameters#

  • --detailed or --verbose Outputs additional information about the migration.

Example#

        migration show --name myNewMigration

migration stop#

Stop a migration from transferring content to its target, placing it into the STOPPED state. Stopped migrations can be resumed.

Stop a migration
        migration stop [--name or --migration-id] string

Mandatory parameters#

  • --name or --migration-id The migration name or ID to stop.

Example#

        migration stop --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e

migration verification add#

Add a migration verification for a specified migration. This will scan your source and target filesystems (in the migration path) and compare them for any discrepancies in either file sizes or in missing files.

The verification status will show the number of missing paths and files on the target filesystem and also the number of file size mismatches between the source and target. The verification status can be viewed by using migration verification show (for individual verification jobs) or migration verification list (for all verification jobs).

Once a verification job is complete, a verification report will be created in the /var/log/wandisco/livedata-migrator directory in the format of verification-report-{verificationId}-{startTime}.log. This report will contain more details including any paths that have discrepancies.

See migration verifications for more details.

Verify a migration
        migration verification add [--name or --migration-id] string                                   [--override]

Mandatory parameters#

  • --name or --migration-id The migration name or ID to start (or override) a verification on.

Optional parameters#

  • --override Stop the currently running verification and start a new one.

Examples#

Start a verification job
        migration verification add --name myMigration
Stop the running verification and start a new one
        migration verification add --name myMigration --override

migration verification list#

List all running migration verification jobs and their statuses (use migration verification show when just wanting the status for one verification job).

List all verification jobs
        migration verification list

migration verification show#

Show the status of a specific migration verification.

Show a verification job for a migration
        migration verification show [--name or --migration-id] string

Mandatory parameters#

  • --name or --migration-id Show the status of the current verification job running on this migration name or ID (only one verification job can be running per migration).

Example#

See verification status values for further explanation of the output.

Example status of a completed verification
WANdisco LiveData Migrator >> migration verification show --name testmig{  "migrationId" : "testmig",  "state" : "COMPLETED",  "verificationId" : "e1aedfbd-b094-4a1b-a294-69cdd5a6030a",  "verificationPath" : "/testdir",  "startTime" : "2021-04-29T13:27:44.278Z",  "completeTime" : "2021-04-29T13:27:45.392Z",  "verificationEdge" : "/testmig/testdir01/testfile01",  "scannerSummary" : {    "progressSummary" : {      "filesScanned" : 177,      "directoriesScanned" : 47,      "bytesScanned" : 1105391944,      "filesExcluded" : 51,      "dirsExcluded" : 0,      "bytesExcluded" : 0,      "baseScanCompletionTime" : "2021-04-29T13:27:45.392Z"    },    "contentSummary" : {      "byteCount" : 1105391944,      "fileCount" : 194,      "directoryCount" : 81    }  },  "verificationProgress" : {    "matchedPathCount" : 224,    "totalFailedPathCount" : 0,    "targetFilesMissing" : 0,    "targetDirectoriesMissing" : 0,    "filesizeMismatches" : 0  }}

status#

Get a text description of the overall status of migrations. Information is provided on the following:

  • Total number of migrations defined.
  • Average bandwidth being used over 10s, 60s, and 300s intervals.
  • Peak bandwidth observed over 300s interval.
  • Average file transfer rate per second over 10s, 60s, and 300s intervals.
  • Peak file transfer rate per second over a 300s interval.
  • List of migrations, including one-time migrations, with source path and migration id, and with current progress broken down by migration state: completed, live, stopped, running and ready.
Get migration status
        status  [--diagnostics]                  [--migrations]                  [--network]                  [--transfers]                  [--watch]                  [--refresh-delay] int                  [--full-screen]

Optional parameters#

  • --diagnostics Returns additional information about your Data Migrator instance and its migrations, useful for troubleshooting.
  • --migrations Displays information about each running migration.
  • --network Displays file transfer throughput in Gib/s during the last 10 seconds, 1 minute and 30 minutes.
  • --transfers Displays overall performance information about data transfers across the last 10 seconds, 1 minute and 30 minute intervals.
  • --watch Auto-refresh.
  • --refresh-delay Auto-refresh interval (in seconds).
  • --full-screen Auto-refresh fullscreen

Examples#

Status
WANdisco LiveMigrator >> status
Network             (10s)       (1m)       (30m)
Average Throughput: 10.4 Gib/s  9.7 Gib/s  10.1 Gib/sAverage Files/s:    425         412        403
11 Migrations                                     dd:hh:mm  dd:hh:mm
Complete: 1         Transferred         Excluded  Duration /static1   5a93d5        67.1 GiB       2.3 GiB  00:12:34
Live:     3         Transferred         Excluded  Duration /repl1     9088aa       143.2 GiB      17.3 GiB  00:00:34 /repl_psm1 a4a7e6       423.6 TiB       9.6 GiB  02:05:29 /repl5     ab140d       118.9 GiB       1.2 GiB  00:00:34
Running:  5         Transferred         Excluded  Duration  Remaining /repl123   e3727c  30.3/45.2 GiB 67%    9.8 GiB  00:00:34  00:00:17 /repl2     88e4e7  26.2/32.4 GiB 81%    0.2 GiB  00:01:27  00:00:12 /repl3     372056   4.1/12.5 GiB 33%    1.1 GiB  00:00:25  00:01:05 /repl4     6bc813  10.6/81.7 TiB  8%   12.4 GiB  00:04:21  01:02:43 /replxyz   dc33cb   2.5/41.1 GiB  6%    6.5 GiB  01:00:12  07:34:23
Ready:    2 /repl7     070910  543.2 GiB /repltest  d05ca0  7.3 GiB
WANdisco LiveMigrator >> status
Status with --transfers
WANdisco LiveMigrator >> status --transfers
Files (10s) (1m) (30m)
Average Migrated/s: 362 158 4781< 1 KB 14 27 3761< 1 MB 151 82 0< 1 GB 27 1 2< 1 PB 0 0 0< 1 EB 0 0 0
Peak Migrated/s: 505 161 8712< 1 KB 125 48 7761< 1 MB 251 95 4< 1 GB 29 7 3< 1 PB 0 0 0< 1 EB 0 0 0
Average Scanned/s: 550 561 467Average Rescanned/s: 24 45 56Average Excluded/s: 7 7 6
Status with --diagnostics
WANdisco LiveMigrator >> status --diagnostics
Uptime: 0 Days 1 Hours 23 Minutes 24 SecondsSystemCpuLoad: 0.1433 ProcessCpuLoad: 0.0081JVM GcCount: 192 GcPauseTime: 36 s (36328 ms)OS Connections: 1, Tx: 0 B, Rx: 0 B, Retransmit: 0Transfer Bytes (10/30/300s): 0.00 Gib/s, 0.00 Gib/s, 0.00 Gib/sTransfer Files (10/30/300s): 0.00/s 0.00/s 0.00/sActive Transfers/pull.threads: 0/100Migrations: 0 RUNNING, 4 LIVE, 0 STOPPEDActions Total: 0, Largest: "testmigration" 0, Peak: "MyMigration" 1PendingRegions Total: 0 Avg: 0, Largest: "MyMigration" 0FailedPaths Total: 0, Largest: "MyMigration" 0File Transfer Retries Total: 0, Largest: "MyMigration" 0Total Excluded Scan files/dirs/bytes: 26, 0, 8.1 MBTotal Iterated Scan files/dirs/bytes: 20082, 9876, 2.7 GBEventsBehind Current/Avg/Max: 0/0/0, RPC Time Avg/Max: 4/8EventsQueued: 0, Total Events Added: 504Transferred File Size Percentiles: 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 BTransferred File Transfer Rates Percentiles per Second: 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 BActive File Size Percentiles: 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 BActive File Transfer Rates Percentiles per Second: 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B

Hive migration commands#

hive migration add#

Create a new Hive migration to initiate metadata migration from your source Metastore.

info

Create hive rules before initiating a Hive migration to enter which databases and tables are migrated.

Create new migration
        hive migration add [--source] string                           [--target] string                           [--name] string                           [--auto-start]                           [--once]                           [--rule-names] list

Mandatory parameters#

  • --source The name of the Hive agent for the source of migration.
  • --target The name of the Hive agent for the target of migration.

Optional parameters#

  • --name The name to identify the migration with.
  • --auto-start Enter this parameter to start the migration immediately after creation.
  • --once Enter this parameter to perform a one-time migration, and not continuously scan for new or changing metadata.
  • --rule-names The rule name or list of rule names to use with the migration. Multiple rules need to be comma-separated (for example: rule1,rule2,rule3).

Example#

        hive migration add --source sourceAgent --target remoteAgent --rule-names test_dbs,user_dbs --name hive_migration --auto-start
note

Auto-completion of the --rule-names parameter will not work correctly if it is added at the end of the Hive migration parameters. See the troubleshooting guide for workarounds.

hive migration delete#

Delete a Hive migration.

note

A Hive migration must be stopped before it can be deleted. This can be achieved by using the --force-stop parameter with this command.

Delete migration from the list, migration should be stopped
        hive migration delete [--name] string  [--force-stop]

Example#

        hive migration delete --name hive_migration --force-stop

hive migration list#

List all Hive migrations.

print a list of all migrations
        hive migration list

hive migration show#

Display information about a Hive migration.

Show info about specific migration
        hive migration show

hive migration start#

Start a Hive migration or a list of Hive migrations (comma-separated).

note

Enter the --once parameter to perform a one-time migration, and not continuously scan for new or changing metadata.

Start migration
        hive migration start [--names] list  [--once]

Example#

        hive migration start --names hive_migration1,hive_migration2

hive migration start all#

Start all Hive migrations.

note

Enter the --once parameter to perform a one-time migration, and not continuously scan for new or changing metadata.

Start migration
        hive migration start all [--once]

Example#

        hive migration start all --once

hive migration status#

Show the status of a Hive migration or a list of Hive migrations (comma-separated).

Show migration status
        hive migration status [--name] list

Example#

hive migration status --name hive_migration1,hive_migration2

hive migration status all#

Show the status of all Hive migrations.

Start migration
        hive migration status all

Example#

        hive migration status all

hive migration stop#

Stop a running hive migration or a list of running hive migrations (comma-separated).

Stop running migration
        hive migration stop [--names] list

Example#

        hive migration stop --names hive_migration1,hive_migration2

hive migration stop all#

Stop all running Hive migrations.

Stop all running migrations
        hive migration stop all

Example#

        hive migration stop all

hive migration reset#

Reset a stopped Hive migration. This returns the migration to a CREATED state.

Reset a Hive migration
        hive migration reset [--names] string                             [--force-stop]
note

A Hive migration must be stopped before it can be reset. This can be achieved by using the --force-stop parameter with this command.

info

The reset migration will use the latest agent settings.

For example, if the target agent’s Default Filesystem Override setting was updated after the original migration started, the reset migration will use the latest Default Filesystem Override value.

To reset multiple Hive migrations, use a comma-separated list of migration names with the --names parameter.

Example#

Reset a Hive migration
hive migration reset --names hive_migration1
Stop and reset a list of migrations
hive migration reset --force-stop --names hive_migration1,hive_migration2

Path mapping commands#

path mapping add#

Create a path mapping that allows you to define a alternative target path for a specific target filesystem. These will be automatically applied to a new migration.

When path mapping isn't used, the source path is created on the target filesystem.

note

Path mappings can't be applied to existing migrations. Delete and recreate a migration if you want a path mapping to apply.

Create a new path mapping
        path mapping add [--path-mapping-id] string                         [--source-path] string                         [--target] string                         [--target-path] string                         [--description] string

Mandatory parameters#

  • --source-path The path on the source filesystem.
  • --target The target filesystem id (value defined for the --file-system-id parameter).
  • --target-path The path for the target filesystem.
  • --description Description of the path mapping enclosed in quotes ("text").

Optional parameters#

  • --path-mapping-id An ID for this path mapping. An ID will be auto-generated if you don't enter one.

Example#

Example for HDP to HDI - default Hive warehouse directory
        path mapping add --path-mapping-id hdp-hdi --source-path /apps/hive/warehouse --target mytarget --target-path /hive/warehouse --description "HDP to HDI - Hive warehouse directory"

path mapping delete#

Delete a path mapping.

note

Deleting a path mapping will not affect any existing migrations that have the path mapping applied. Delete and recreate a migration if you no longer want a previous path mapping to apply.

Delete a path mapping
        path mapping delete [--path-mapping-id] string

Mandatory parameters#

  • --path-mapping-id The ID of the path mapping.

Example#

        path mapping delete --path-mapping-id hdp-hdi

path mapping list#

List all path mappings.

List all path mappings
        path mapping list [--target] string

Optional parameters#

  • --target List path mappings for the specified target filesystem id.

Examples#

Example for listing all path mappings
        path mapping list --target hdp-hdi
Example for listing path mappings for a specific target
        path mapping list --target hdp-hdi

path mapping show#

Show details of a specified path mapping.

Get path mapping details
        path mapping show [--path-mapping-id] string

Optional parameters#

  • --path-mapping-id The ID of the path mapping.

Example#

        path mapping show --path-mapping-id hdp-hdi

Built-in commands#

clear#

Clear the shell screen. You can also type ctrl-L
        clear

echo#

Prints whatever text that you write to the console. This can be used to sanity check a command before running it (for example: echo migration add --path /repl1 --target mytarget –-migration-id myNewMigration --exclusions 100mbfiles).

Print message
        echo [--message] string

exit, quit#

Entering either exit or quit will stop operation of Data Migrator when it is run from the command line. All processing will cease, and you will be returned to your system shell.

If your Data Migrator command line is connected to a Data Migrator system service, this command will end your interactive session with that service, which will remain in operation to continue processing migrations.

If this command is encountered during non-interactive processing of input (such as when you pipe input to an instance as part of another shell script) no further commands contained in that input will be processed.

Exit the shell
        exitALSO KNOWN AS        quit

help#

Use the help command to get details of all commands available from the action prompt.

Display help about available commands
        help [-C] string

For longer commands, you can use backslashes (\) to indicate continuation, or use quotation marks (") to enclose the full command. When using quotation marks, you can press Tab on your keyboard to make Data Migrator automatically suggest the remainder of your typed command.

See the examples below for reference.

Example#

help connect
        connect - Connect to Data Migrator and Hive Migrator.
        connect [--host] string  [--ssl]  [--lm2port] int  [--hvm-port] int  [--timeout] integer  [--user] string  
Use of backslashes
help hive\ migration\ add
        hive migration add - Create new migration.
        hive migration add [--source] string  [--target] string  [--name] string  [--auto-start]  [--once]  [--rule-names] list  
Use of quotation marks
help "filesystem add local"
        filesystem add local - Add a local filesystem.
        filesystem add local [--file-system-id] string  [--fs-root] string  [--source]  [--scan-only]  [--properties-files] list  [--properties] string

history#

Enter history at the action prompt to list all previously entered commands.

Entering history --file <filename> will save up to 500 most recently entered commands in text form to the file specified. Use this to record commands that you have executed.

Display or save the history of previously run commands
        history [--file] file

Optional parameters#

  • --file The name of the file in which to save the history of commands.

script#

Load and execute commands from a text file using the script --file <filename> command. This file should have one command per line, and each will be executed as though they were entered directly at the action prompt in that sequence.

Use scripts outside of the WANdisco CLI by referencing the script when running the livedata-migrator command (see examples).

Read and execute commands from a file
        script [--file] file

Mandatory parameters#

  • --file The name of the file containing script commands.
Example contents of a script file
        hive agent check --name sourceAgent        hive agent check --name azureAgent

Examples#

info

These examples assume that myScript is inside the working directory.

Example inside CLI
        script --file myScript
Example outside of CLI (non-interactive)
        livedata-migrator --script=./myScript

Change log level commands#

log debug#

Enable debug level logging
       log debug

log info#

Enable info level logging
       log info

log off#

Disable logging
       log off

log trace#

Enable trace level logging
       log trace

Connect commands#

connect livemigrator#

Connect to the Data Migrator service on your Data Migrator host with this command.

note

This is a manual method of connecting to the Data Migrator service as the livedata-migrator command (shown in CLI - Sign in) will attempt to establish this connection automatically.

Connect Data Migrator
        connect livemigrator [--host] string                             [--ssl]                             [--port] int                             [--timeout] integer                             [--user] string

Mandatory parameters#

  • --host The hostname or IP address for the Data Migrator host.

Optional parameters#

  • --ssl Enter this parameter if you want to establish a TLS connection to Data Migrator. Enable Server TLS on the Data Migrator service before using this parameter.
  • --port The Data Migrator port to connect on (default is 18080).
  • --timeout Define the connection timeout in milliseconds. Set this parameter to override the default connection timeout of 5 minutes (300000ms).
  • --user The username to use for authenticating to the Data Migrator service. Used only when the Data Migrator instance has basic authentication enabled. You will still be prompted to enter the user password.

Connect to the Data Migrator service on your Data Migrator host with this command.

        connect livemigrator --host localhost --port 18080

connect hivemigrator#

Connect to the Hive Migrator service on your Data Migrator host with this command.

note

This is a manual method of connecting to the Hive Migrator service as the livedata-migrator command (shown in CLI - Log in section) will attempt to establish this connection automatically.

Connect Hivemigrator
        connect hivemigrator [--host] string                             [--ssl]                             [--port] int                             [--timeout] long                             [--user] string

Mandatory parameters#

  • --host The hostname or IP address for the Data Migrator host that contains the Hive Migrator service.

Optional parameters#

  • --ssl Enter this parameter if you want to establish a TLS connection to Hive Migrator.
  • --port The Hive Migrator service port to connect on (default is 6780).
  • --timeout Define the connection timeout in milliseconds. Set this parameter to override the default connection timeout of 5 minutes (300000ms).
  • --user The username to use for authenticating to the Hive Migrator service. Used only when Hive Migrator has basic authentication enabled. You will still be prompted to enter the user password.

Example#

        connect hivemigrator --host localhost --port 6780

Email notifications subscription commands#

notification email addresses add#

Add email addresses to the subscription list for email notifications.

Subscribe email address to notifications.
    notification email addresses add [--addresses]

Mandatory parameters#

  • --addresses A comma-separated lists of email addresses to be added.

Example#

        notification email addresses add --addresses myemail@company.org,personalemail@gmail.com

notification email addresses remove#

Remove email addresses from the subscription list for email notifications.

Unsubscribe email address to notifications.
    notification email addresses remove [--addresses]  

Mandatory parameters#

  • --addresses A comma-separated lists of email addresses to be removed. Use auto-completion to quickly select from subscribed emails.

Example#

        notification email addresses remove --addresses myemail@company.org,personalemail@gmail.com

notification email smtp set#

Configure the details of an SMTP server for Data Migrator to connect to.

Configure the SMTP adapter.
    notification email smtp set         [--host] string                                          [--port] integer                                          [--security] security-enum                                          [--email] string                                          [--login] string                                          [--password] string                                          [--subject-prefix]  string

Mandatory parameters#

  • --host The host address of the SMTP server.
  • --port The port to connect to the SMTP server. Many SMTP servers use port 25.
  • --security The type of security the server uses. Available options: NONE,SSL,STARTLS_ENABLED,STARTTLS_REQUIRED, or TLS.
  • --email The email address for Data Migrator to use with emails sent through the SMTP server. This address will be the sender of all configured email notifications.

Optional parameters#

  • --login The username to authenticate with the SMTP server.
  • --password The password to authenticate with the SMTP server sign-in. Required if you sign in.
  • --subject-prefix Set an email subject prefix to help identify and filter Data Migrator notifications.

Example#

        notification email smtp set --host my.internal.host --port 587 --security TLS --email livedatamigrator@wandisco.com  --login myusername --password mypassword

notification email smtp show#

Display the details of the SMTP server Data Migrator is configured to use.

Show the current configuration of SMTP adapter.
    notification email smtp show

notification email subscriptions show#

Show a list of currently subscribed emails and notifications.

Show email notification subscriptions.
    notification email subscriptions show

notification email types add#

Add notification types to the email notification subscription list.

See the output from the command notification email types show for a list of all currently available notification types.

Subscribe on notification types.
        notification email types add [--types]  

Mandatory parameters#

  • --types A comma-separated list of notification types to subscribe to.

Example#

        notification email types add MISSING_EVENTS,EVENTS_BEHIND,MIGRATION_AUTO_STOPPED

notification email types remove#

Remove notification types from the email notification subscription list.

Unsubscribe on notification types.
    notification email types remove [--types]  

Mandatory parameters#

  • --types A comma-separated list of notification types to unsubscribe from.

Example#

        notification email types remove MISSING_EVENTS,EVENTS_BEHIND,MIGRATION_AUTO_STOPPED

notification email types show#

Return a list of all available notification types to subscribe to.

Show email notification types.
        notification email types show

Hive Backup Commands#

hive backup add#

Immediately create a metadata backup file.

Create new backup
        hive backup add

hive backup config show#

Show the current metadata backup configuration.

Show configuration of backups.
        hive backup config show

hive backup list#

List all existing metadata backup files.

List all backups
        hive backup list

hive backup restore#

Restore from a specified metadata backup file.

Restore backup by name
        hive backup restore --name string

hive backup schedule configure#

Configure a backup schedule for metadata migrations.

Configure backup schedule
        hive backup schedule configure --period-minutes 10 --enable
{  "enabled": true,  "periodMinutes": 10}

hive backup schedule show#

Show the current metadata backup schedule.

Show current backup schedule
        hive backup schedule show
  {  "enabled": true,  "periodMinutes": 10  }

hive backup show#

Show a specified metadata backup file.

hive backup show
        hive backup show --name string

Hive configuration commands#

hive config certificate generate#

Generate system certificates
        hive config certificate generate

hive config certificate upload#

Create a new path mapping
        hive config certificate upload  [--path-mapping-id] string                                        [--private-key] file                                        [--certificate] file                                        [--trusted-certificate] file

Mandatory parameters#

  • --private-key Client private key used to establish a TLS connection to the remote agent.
  • --certificate Client certificate used to establish a TLS connection to the remote agent.
  • --trusted-certificate Trusted certificate used to establish a TLS connection to the remote agent.

Hive rule configuration commands#

hive rule add,hive rule create#

Create a Hive migration rule that is used to define which databases and tables are migrated.

info

Enter these rules when starting a new migration to control which databases and tables are migrated.

Add new Hive migration rule
        hive rule add [--database-pattern] string                      [--table-pattern] string                      [--name] string
ALSO KNOWN AS        hive rule create

Mandatory parameters#

  • --database-pattern Enter a Hive DDL pattern that will match the database names you want to migrate.
  • --table-pattern Enter a Hive DDL pattern that will match the table names you want to migrate.
tip

You can use a single asterisk (*) if you want to match all databases and/or all tables within the Metastore/database.

Optional parameters#

  • --name The name for the Hive rule.

Example#

Match all database names that start with test and all tables inside of them
        hive rule add --name test_databases --database-pattern test* --table-pattern *

hive rule configure#

Change the parameters of an existing Hive rule.

The parameters that can be changed are the same as the ones listed in the hive rule add,hive rule create section.

All parameters are optional except --name, which is required to enter the existing Hive rule that you wish to configure.

Example#

        hive rule configure --name test_databases --database-pattern test_db*

hive rule delete#

Delete selected Hive migration rule
        hive rule delete [--name] string

Example#

        hive rule delete --name test_databases

hive rule list#

Get a list of defined rules
        hive rule list

hive rule show#

Show rule details
        hive rule show [--name] string

Example#

        hive rule show --name test_databases

Hive show commands#

hive show conf#

Returns a description of the specified Hive configuration property.
        hive show conf  [--parameter] string                          [--agent-name] string

Hive show configuration parameters#

  • --agent-name The name of the agent.
  • --parameter The configuration parameter/property that you want to show the value of.

Example#

Example when sourceAgent is an Apache Hive agent
        hive show conf --agent-name sourceAgent --parameter hive.metastore.uris

hive show database#

Show detailed information about a given database and agent (or sourceAgent if not set).
        hive show database [--database] string                             [--agent-name] string

Hive show database parameters#

  • --database The database name. If not specified, the default will be default.
  • --agent-name The name of the agent.

Example#

        hive show database --agent-name sourceAgent --database mydb01

hive show databases#

Get databases list from a given agent or sourceAgent if agent isn't set.
        hive show databases [--like] string                              [--agent-name] string

Hive show databases parameters#

  • --like The Hive DDL pattern to use to match the database names (for example: testdb* will match any database name that begins with "testdb").
  • --agent-name The name of the agent.

Example#

        hive show database --agent-name sourceAgent --like testdb*

hive show indexes#

Get indexes list for a given database/table and agent (or sourceAgent if not set).
        hive show indexes [--database] string                            [--table] string                            [--agent-name] string

Hive show indexes parameters#

  • --database The database name.
  • --table The table name.
  • --agent-name The name of the agent.

Example#

        hive show indexes --agent-name sourceAgent --database mydb01 --table mytbl01

hive show partitions#

Get partitions list for a given database/table and agent (or sourceAgent if not set).
        hive show partitions [--database] string                               [--table] string                               [--agent-name] string

Hive show partitions parameters#

  • --database The database name.
  • --table The table name.
  • --agent-name The name of the agent.

Example#

        hive show partitions --agent-name sourceAgent --database mydb01 --table mytbl01

hive show table#

Show detailed information about a given table using the given agent (or sourceAgent if not set).
        hive show table [--database] string                          [--table] string                          [--agent-name] string

Hive show table parameters#

  • --database The database name where the table is located.
  • --table The table name.
  • --agent-name The name of the agent.

Example#

        hive show table --agent-name sourceAgent --database mydb01 --table mytbl01

hive show tables#

Get tables list for a given database (default if not set ) and agent (sourceAgent if not set).
        hive show tables [[--like] string]  [[--database] string]  [[--agent-name] string]

Hive show tables parameters#

  • --like The Hive DDL pattern to use to match the table names (for example: testtbl* will match any table name that begins with "testtbl").
  • --database Database name. Defaults to default if not set.
  • --agent-name The name of the agent.

Example#

        hive show tables --agent-name sourceAgent --database mydb01 --like testtbl*

License manipulation commands#

license show#

Show the details of the active license
        license show [--full]

license upload#

Upload a new license by submitting its location on the local filesystem
        license upload [--path] string

Example#

        license upload --path /user/hdfs/license.key

Notification commands#

notification latest#

Get the latest notification
        notification latest

notification list#

Get notifications
        notification list [--count] integer                          [--since] string                          [--type] string                          [--exclude-resolved]                          [--level] string

Optional parameters#

  • --count The number of notifications to return.
  • --since Return notifications created after this date/time.
  • --type The type of notification to return e.g. LicenseExceptionNotification.
  • --exclude-resolved Exclude resolved notifications.
  • --level The level of notification to return.

notification show#

Show notification details
        notification show [--notification-id]  string

Mandatory parameters#

  • --notification-id The id of the notification to return.

Source commands#

source clear#

Clear all information that Data Migrator maintains about the source filesystem by issuing the source clear command. This will allow you to define an alternative source to one previously defined or detected automatically.

Delete all sources
        source clear

source delete#

Use source delete to delete information about a specific source by ID. You can obtain the ID for a source filesystem with the output of the source show command.

Delete a source
        source delete [--file-system-id] string

Mandatory parameters#

  • --file-system-id The ID of the source filesystem resource you want to delete. In the UI, this is called Display Name.

Example#

        source delete --file-system-id auto-discovered-source-hdfs

source show#

Get information about the source filesystem configuration.

Show the source filesystem configuration
        source show [--detailed]

Optional parameters#

  • ---detailed Include all configuration properties for the source filesystem in the response.