Command reference
System service commands
The service scripts are used to control operation of each individual service. In most supported Linux distributions, the following commands can be used to manage Data Migrator, Hive Migrator, and WANdisco UI processes.
Data Migrator
systemd command | Use it to... |
---|---|
systemctl start livedata-migrator | Start a service that isn't currently running. |
systemctl stop livedata-migrator | Stop a running service. |
systemctl restart livedata-migrator | Run a command that performs a stop and then a start .If the service isn't running, this works the same as a start command. |
systemctl status livedata-migrator | Get details of the running service's status. |
Running without systemd
For servers running older versions of Linux that don't include systemd
, change the listed commands to use the following format:
service <service name> <command>
service livedata-migrator restart
CentOS 6 systems don't support the service
command. Instead, use initctl
with the format:
initctl <command> <service name>
For example:
initctl start livedata-migrator
Hive Migrator
Service script | Use it to... |
---|---|
systemctl start hivemigrator | Start a service that isn't currently running. |
systemctl stop hivemigrator | Stop a running service. |
systemctl restart hivemigrator | Run a command that performs a stop and then a start .If the service isn't running, this works the same as a start command. |
systemctl status hivemigrator | Get details of the running service's status. |
Always start/restart Hive Migrator services in the following order:
- Remote agents
- Hive Migrator service.
Not starting services in this order may cause live migrations to fail.
Running without systemd
For servers running older versions of Linux that don't include systemd
, change the listed commands to use the following format:
service <service name> <command>
service hivemigrator status
CentOS 6 systems don't support the service
command. Instead, use initctl
with the format:
initctl <command> <service name>
For example:
initctl start hivemigrator
Hive Migrator remote server
Service script | Use it to... |
---|---|
systemctl start hivemigrator-remote-server | Start a service that isn't currently running. |
systemctl stop hivemigrator-remote-server | Stop a running service. |
systemctl restart hivemigrator-remote-server | Run a command that performs a stop and then a start .If the service isn't running, this works the same as a start command. |
systemctl status hivemigrator-remote-server | Get details of the running service's status. |
Always start/restart Hive Migrator services in the following order:
- Remote agents
- Hive Migrator service.
Not starting services in this order may cause live migrations to fail.
If you're working in an environment without systemd or a system and service manager, you need to run the start.sh
script located in /opt/wandisco/hivemigrator
again after running the restart command.
Running without systemd
For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:
service <service name> <command>
service hivemigrator-remote-server status
CentOS 6 systems don't support the service
command. Instead, use initctl
with the format:
initctl <command> <service name>
For example:
initctl start hivemigrator-remote-server
WANdisco UI
Service script | Use it to... |
---|---|
systemctl start livedata-ui | Start a service that isn't currently running. |
systemctl stop livedata-ui | Stop a running service. |
systemctl restart livedata-ui | Run a command that performs a stop and then a start .If the service isn't running, this works the same as a start command. |
systemctl status livedata-ui | Get details of the running service's status. |
Running without systemd
For servers running older versions of Linux that don't include systemd
, change the listed commands to use the following format:
service <service name> <command>
service livedata-ui status
CentOS 6 systems don't support the service
command. Instead, use initctl
with the format:
initctl <command> <service name>
For example:
initctl start livedata-ui
Connect to the WANdisco CLI
Open a terminal session on the Data Migrator host machine and enter the following command:
livedata-migrator
When the CLI connects to the Data Migrator and Hive Migrator services, you get the following command prompt:
WANdisco LiveData Migrator >>
The CLI is now ready to accept commands.
Optional parameters
--host
The IP or hostname of the Data Migrator API to connect to. Defaults to localhost when not specified.--vm-port
Data Migrator API port. Defaults to 18080 when not specified.--hm-port
Hivemigrator API port. Defaults to 6780 when not specified.--lm-ssl
Flag to use https. Defaults to http when not specified.
Version check
Check the current versions of included components by using the livedata-migrator
command with the --version
parameter. For example:
# livedata-migrator --version
This doesn't start the CLI. You get a list of the current Data Migrator components, along with their version numbers.
WANdisco CLI features
Feature | How to use it |
---|---|
Review available commands | Use the help command to get details of all commands available. |
Command completion | Hit the <tab> key at any time to get assistance or to complete partially-entered commands. |
Cancel input | Type <Ctrl-C> before entering a command to return to an empty action prompt. |
Syntax indication | Invalid commands are highlighted as you type. |
Clear the display | Type <Ctrl-L> at any time. |
Previous commands | Navigate previous commands using the up and down arrows, and use standard emacs shortcuts. |
Interactive or scripted operation | You can interact with the command line interface directly, or send it commands on standard input to incorporate it into shell scripts. See script for more information and examples. |
WANdisco CLI commands
You can manage filesystems, migrations, and more in the WANdisco CLI.
Backup commands
backup add
backup add
backup config show
backup config show
{
"backupsLocation": "/opt/wandisco/livedata-migrator/db/backups",
"lastSuccessfulTs": 0,
"backupSchedule": {
"enabled": true,
"periodMinutes": 10
},
"storedFilePaths": [
"/etc/wandisco/livedata-migrator/application.properties",
"/etc/wandisco/livedata-migrator/vars.env",
"/etc/wandisco/livedata-migrator/logback-spring.xml",
"/etc/wandisco/livedata-migrator/application.properties",
"/etc/wandisco/livedata-migrator/vars.env",
"/etc/wandisco/livedata-migrator/logback-spring.xml"
]
}
backup list
backup list
backup restore
backup restore --name string
backup schedule configure
backup schedule configure --period-minutes 10 --enable
{
"enabled": true,
"periodMinutes": 10
}
backup schedule show
backup schedule show
{
"enabled": true,
"periodMinutes": 10
}
backup show
backup show --name string
Bandwidth policy commands
bandwidth policy delete
bandwidth policy delete
bandwidth policy set
bandwidth policy set [--value] long
[--unit] string
Mandatory parameters
--value
Define the number of byte units.--unit
Define the byte unit to be used.
Decimal units:B
,KB
,MB
,GB
,TB
,PB
.
Binary units:KiB
,MiB
,GiB
,TiB
,PiB
.
Example
bandwidth policy set --value 10 --unit MB
bandwidth policy show
bandwidth policy show
Filesystem commands
filesystem add adls2 oauth
Add an Azure Data Lake Storage (ADLS) Gen2 container as a migration target using the filesystem add adls2 oauth
command, which requires a service principal and OAuth 2 credentials.
The service principal that you want to use must have either the Storage Blob Data Owner role assigned to the ADLS Gen2 storage account, or an access control list with RWX permissions for the migration path and all parent paths. For more information, see the Microsoft documentation.
filesystem add adls2 oauth [--container-name] string
[--file-system-id] string
[--insecure]
[--oauth2-client-endpoint] string
[--oauth2-client-id] string
[--oauth2-client-secret] string
[--properties] string
[--properties-files] list
[--source]
[--storage-account-name] string
Mandatory parameters
--container-name
The name of the container in the storage account to which content will be migrated.--file-system-id
The ID to give the new filesystem resource.--oauth2-client-endpoint
The client endpoint for the Azure service principal.
This will often take the form ofhttps://login.microsoftonline.com/{tenant}/oauth2/v2.0/token
where{tenant}
is the directory ID for the Azure service principal. You can enter a custom URL (such as a proxy endpoint that manually interfaces with Azure Active Directory (Azure AD)).--oauth2-client-id
The client ID (also known as application ID) for your Azure service principal.--oauth2-client-secret
The client secret (also known as application secret) for the Azure service principal.--storage-account-name
The name of the ADLS Gen2 storage account to target.
Optional parameters
--insecure
If you enter this parameter, Data Migrator will not use TLS to encrypt communication with ADLS Gen2. This may improve throughput, but should only be used when you have other means of securing communication.--properties
Enter properties to use in a comma-separated key/value list.--properties-files
Reference a list of existing properties files, each containing Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
.--source
Add this filesystem as the source for migrations.
Example
filesystem add adls2 oauth --file-system-id mytarget
--storage-account-name myadls2
--oauth2-client-id b67f67ex-ampl-e2eb-bd6d-client9385id
--oauth2-client-secret 2IPO8*secretk-9OPs8n*TexampleHJ=
--oauth2-client-endpoint https://login.microsoftonline.com/78u098ex-ampl-e498-8bce-ndpoint5f2e5/oauth2/v2.0/token
--container-name lm2target
filesystem add adls2 sharedKey
Add an ADLS Gen2 container as a migration target using the filesystem add adls2 sharedKey
command, which requires credentials in the form of an account key.
filesystem add adls2 sharedKey [--file-system-id] string
[--storage-account-name] string
[--container-name] string
[--insecure]
[--shared-key] string
[--properties-files] list
[--properties] string
[--source]
Mandatory parameters
--file-system-id
The ID to give the new filesystem resource. In the UI, this is called Display Name.--storage-account-name
The name of the ADLS Gen2 storage account to target. In the UI, this is called Account Name.--shared-key
The shared account key to use as credentials to write to the storage account. In the UI, this is called Access Key.--container-name
The name of the container in the storage account to which content will be migrated. In the UI, this is called Container Name.
Optional parameters
--insecure
If you enter this parameter, Data Migrator will not use TLS to encrypt communication with ADLS Gen2. This may improve throughput, but should only be used when you have other means of securing communication. You can only see this in the UI when Use Secure Protocol is unchecked.--properties-files
Reference a list of existing properties files, each containing Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
.--properties
Enter properties to use in a comma-separated key/value list.--source
Add this filesystem as the source for migrations.
Example
filesystem add adls2 sharedKey --file-system-id mytarget
--storage-account-name myadls2
--container-name lm2target
--shared-key Yi8NxHGqoQ79DBGLVn+COK/sRDwbNqAEXAMPLEDaMxRkvXt2ijUtASHAREDj/vaS/NbzR5rtjEKEY31eIopUVA==
filesystem add gcs
Add a Google Cloud Storage as a migration target using the filesystem add gcs
command, which requires credentials in the form of an account key file.
filesystem add gcs [--file-system-id] string
[--service-account-json-key-file] string
[--service-account-p12-key-file] string
[--service-account-json-key-file-server-location] string
[--service-account-p12-key-file-server-location] string
[--service-account-email] string
[--bucket-name] string
[--properties-files] list
[--properties] string
Mandatory parameters
--file-system-id
The ID to give the new filesystem resource. In the UI, this is called Display Name.--bucket-name
The bucket name of a Google Cloud Storage account. In the UI, this is called Bucket Name.Service account key parameters
infoEnter your service account key for the Google Cloud Storage bucket by choosing one of the parameters below.
You can also upload the service account key directly when using the UI (this isn't supported through the CLI).
--service-account-json-key-file-server-location
The absolute filesystem path on the Data Migrator server of your service account key file in JSON format. You can either create a Google Cloud Storage service account key or use an existing one.
In the UI, this is called Key File and becomes visible when you select Key File Options -> Provide a Path.--service-account-p12-key-file-server-location
The absolute filesystem path on the Data Migrator server of your service account key file in P12 format. You can either create a Google Cloud Storage service account key or use an existing one.
In the UI, this is called Key File and becomes visible when you select Key File Options -> Provide a Path.--service-account-json-key-file
The absolute filesystem path on the host running the WANdisco CLI of your service account key file in JSON format. Use this parameter if you are running the WANdisco CLI on a different host to your Data Migrator server.--service-account-p12-key-file
The absolute filesystem path on the host running the WANdisco CLI of your service account key file in P12 format. Use this parameter if you are running the WANdisco CLI on a different host to your Data Migrator server.
Optional parameters
--service-account-email
The email address linked to your Google Cloud Storage service account. In the UI, this is called Email address and is required when selecting the Upload P12 Key File option.--properties-files
Reference a list of existing properties files, each containing Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
.--properties
Enter properties to use in a comma-separated key/value list.
Example
filesystem add gcs --file-system-id gcsAgent
--bucket-name myGcsBucket
--service-account-p12-key-file-server-location /user/hdfs/targetStorage/myAccountKey.p12
--service-account-email user@mydomain.com
filesystem add hdfs
Add a Hadoop Distributed File System (HDFS) as either a migration source or target using the filesystem add hdfs
command.
Creating a HDFS resource with this command will normally only be used when migrating to a target HDFS filesystem (rather than another storage service like ADLS Gen2 or S3a). Data Migrator will attempt to auto-discover the source HDFS when started from the command line unless Kerberos is enabled on your source environment.
If Kerberos is enabled on your source environment, use the filesystem auto-discover-source hdfs
command to enter Kerberos credentials and auto-discover your source HDFS configuration.
filesystem add hdfs [--file-system-id] string
[--default-fs] string
[--user] string
[--kerberos-principal] string
[--kerberos-keytab] string
[--source]
[--scan-only]
[--success-file] string
[--properties-files] list
[--properties] string
Mandatory parameters
--file-system-id
The ID to give the new filesystem resource. In the UI, this is called Display Name.--default-fs
A string that defines how Data Migrator accesses HDFS. In the UI, this is called Default FS.
It can be specified in a number of forms:- As a single HDFS URI, such as
hdfs://192.168.1.10:8020
(using an IP address) orhdfs://myhost.localdomain:8020
(using a hostname). - As a HDFS URI that references a nameservice if the NameNodes have high availability, for example,
hdfs://mynameservice
. For more information, see HDFS High Availability.
- As a single HDFS URI, such as
Optional parameters
Cross-realm authentication is required in the following scenarios:
- Migration will occur between a source and target HDFS.
- Kerberos is enabled on both clusters.
See the links below for guidance for common Hadoop distributions:
--user
The name of the HDFS user to be used when performing operations against the filesystem. In environments where Kerberos is disabled, this user must be the HDFS super user, such ashdfs
.--kerberos-principal
The Kerberos principal to authenticate with and perform migrations as. This principal should map to the HDFS super user using auth_to_local rules.--kerberos-keytab
The Kerberos keytab containing the principal defined for the--kerberos-principal
parameter. This must be accessible to the local system user running the Data Migrator service (default ishdfs
).--source
Enter this parameter to use the filesystem resource created as a source. This is referenced in the UI when configuring the Unknown source.--scan-only
Enter this parameter to create a static source filesystem for use in one-time migrations. Requires--source
.--properties-files
Reference a list of existing properties files that contain Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
. In the UI, this is called Provide a path to files under Additional Configuration.--properties
Enter properties to use in a comma-separated key/value list. In the UI, this is called Additional Configuration under the Additional Configuration option.--success-file
Specify a file name or glob pattern for files that Data Migrator will migrate last from the directory they're contained in. For example,--success-file /mypath/myfile.txt
or--success-file /**_SUCCESS
. You can use these files to confirm the directory they're in has finished migrating.
Properties files are required for NameNode HA
If your Hadoop cluster has NameNode HA enabled, enter the local filesystem path to the properties files that define the configuration for the nameservice ID.
Source HDFS filesystem: These configuration files will likely be in a default location depending on the distribution of the Hadoop cluster.
Target HDFS filesystem: Ensure that the target Hadoop cluster configuration is available on your Data Migrator host's local filesystem.
For the UI, use Provide a path to files under the Additional Configuration option and define the directory containing the
core-site.xml
andhdfs-site.xml
files.Example for path containing source cluster configuration/etc/hadoop/conf
Example for path containing target cluster configuration/etc/targetClusterConfig
Alternatively, define the absolute filesystem paths to these files:
Example for absolute paths to source cluster configuration files/etc/hadoop/conf/core-site.xml
/etc/hadoop/conf/hdfs-site.xmlExample for absolute paths to target cluster configuration files/etc/targetClusterConfig/core-site.xml
/etc/targetClusterConfig/hdfs-site.xmlFor the CLI/API, use the
--properties-files
parameter and define the absolute paths to thecore-site.xml
andhdfs-site.xml
files (see the Examples section for CLI usage of this parameter).
Examples
HDFS as source
filesystem add hdfs --file-system-id mysource
--default-fs hdfs://sourcenameservice
--properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
filesystem add hdfs --file-system-id mysource
--default-fs hdfs://sourcenameservice
--properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
--kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab
--kerberos-principal hdfs@SOURCEREALM.COM
HDFS as target
If you enter a HDFS filesystem as a target, the property files for the target cluster must exist on the local filesystem and be accessible to the Data Migrator system user.
filesystem add hdfs --file-system-id mytarget --default-fs hdfs://targetnameservice --properties-files /etc/targetClusterConfig/core-site.xml,/etc/targetClusterConfig/hdfs-site.xml --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal hdfs@SOURCEREALM.COM
filesystem add hdfs --file-system-id mytarget --default-fs hdfs://namenode.targetdomain:8020 --user hdfs
filesystem add local
Add a local filesystem as either a migration target or source using the filesystem add local
command.
filesystem add local [--file-system-id] string
[--fs-root] string
[--source]
[--scan-only]
[--properties-files] list
[--properties] string
Mandatory parameters
--file-system-id
The ID to give the new filesystem resource. In the UI, this is called Display Name.
Optional parameters
--fs-root
The directory in the local filesystem to scan for data or send data to, depending on whether the filesystem is defined as a source or a target. Should be supplied using the full directory path from the root.--source
Enter this parameter to use the filesystem resource created as a source. This is referenced in the UI when configuring the Unknown source.--scan-only
Enter this parameter to create a static source filesystem for use in one-time migrations. Requires--source
.--properties-files
Reference a list of existing properties files.--properties
Enter properties to use in a comma-separated key/value list.
If no fs-root
is specified, the file path will default to the root of your system.
Examples
Local filesystem as source
filesystem add local --file-system-id mytarget --fs-root ./tmp --source
Local filesystem as target
filesystem add local --file-system-id mytarget --fs-root ./Users/username/destinationfolder/
filesystem add s3a
Add an Amazon Simple Storage Service (Amazon S3) bucket as a target filesystem using the filesystem add s3a
command. This method also supports IBM Cloud Object Storage buckets.
filesystem add s3a [--file-system-id] string
[--bucket-name] string
[--endpoint] string
[--access-key] string
[--secret-key] string
[--sqs-queue] string
[--sqs-endpoint] string
[--credentials-provider] string
[--source]
[--scan-only]
[--properties-files] list
[--properties] string
[--s3type] string
[--bootstrap.servers] string
[--topic] string
For guidance about access, permissions, and security when adding an Amazon S3 bucket as a target filesystem, see Security best practices in IAM.
S3A mandatory parameters
--file-system-id
The ID for the new filesystem resource. In the UI, this is called Display Name.--bucket-name
The name of your Amazon S3 bucket. In the UI, this is called Bucket Name.--credentials-provider
The Java class name of a credentials provider for authenticating with the Amazon S3 endpoint. In the UI, this is called Credentials Provider. This isn't a required parameter when adding an IBM Cloud Object Storage bucket through the UI.
The Provider options available include:org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
Use this provider to offer credentials as an access key and secret access key with the
--access-key
and--secret-key
Parameters.com.amazonaws.auth.InstanceProfileCredentialsProvider
Use this provider when running Data Migrator on an Elastic Compute Cloud (EC2) instance that has been assigned an IAM role with policies that allow it to access the Amazon S3 bucket.
com.amazonaws.auth.DefaultAWSCredentialsProviderChain
A commonly-used credentials provider chain that looks for credentials in this order:
- Environment Variables -
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
, orAWS_ACCESS_KEY
andAWS_SECRET_KEY
. - Java System Properties -
aws.accessKeyId
andaws.secretKey
. - Web Identity Token credentials from the environment or container.
- Credential profiles file at the default location (
~/.aws/credentials
) shared by all AWS SDKs and the AWS CLI. - Credentials delivered through the Amazon EC2 container service if
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
environment variable is set and security manager has permission to access the variable. - Instance profile credentials delivered through the Amazon EC2 metadata service.
- Environment Variables -
com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider
This provider supports the use of multiple AWS credentials, which are stored in a credentials file.
When adding a source filesystem, use the following properties:
awsProfile - Name for the AWS profile.
awsCredentialsConfigFile - Path to the AWS credentials file. The default path is
~/.aws/credentials
.For example:
filesystem add s3a --file-system-id testProfile1Fs --bucket-name profile1-bucket --credentials-provider com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider --properties awsProfile=<profile-name>,
awsCredentialsConfigFile=</path/to/the/aws/credentials" file>In the CLI, you can also use
--aws-profile
and--aws-config-file
.For example:
filesystem add s3a --file-system-id testProfile1Fs --bucket-name profile1-bucket --credentials-provider com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider --aws-profile <profile-name>
--aws-config-file </path/to/the/aws/credentials/file>Learn more about using AWS profiles: Configuration and credential file settings.
Endpoint (UI & IBM Cloud Object Storage only): This is required when adding an IBM Cloud Object Storage bucket. IBM provide a list of available endpoints that can be found in their public documentation.
S3A optional parameters
--access-key
When using theorg.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
credentials provider, enter the access key with this parameter. In the UI, this is called Access Key. This is a required parameter when adding an IBM Cloud Object Storage bucket.--secret-key
When using theorg.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
credentials provider, enter the secret key using this parameter. In the UI, this is called Secret Key. This is a required parameter when adding an IBM Cloud Object Storage bucket.--endpoint
(S3 as a target only) Enter a specific endpoint to access the S3 bucket such as an AWS PrivateLink endpoint (for example:https://bucket.vpce-0e25b8cdd720f900e-argc85vg.s3.us-east-1.vpce.amazonaws.com
). When using this parameter, do not use thefs.s3a.endpoint
property as an additional custom property as this supersedes it.--sqs-queue
Enter an SQS queue name. This field is required if you enter an SQS endpoint.--sqs-endpoint
Enter an SQS endpoint.--source
(Preview) Enter this parameter to use the filesystem resource created as a source. This is referenced in the UI when configuring the Unknown source.--scan-only
Enter this parameter to create a static source filesystem for use in one-time migrations. Requires--source
.--properties-files
Reference a list of existing properties files, each containing Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
.--properties
Enter properties to use in a comma-separated key/value list. In the UI, this is called S3A Properties (see S3a Default Properties and S3a Custom Properties for more information).--s3type
Indicates an s3a compatibility filesystem type. You can set the parameter value to one of the following or leave it blank:- aws
- oracle
- ibmcos
--bootstrap.servers
Kafka server address.--topic
Kafka's topic where s3 object change notifications are provided.noteAmazon S3a as a source is currently a preview feature.
S3a default properties
These properties are defined by default when adding an S3a filesystem.
fs.s3a.impl
(defaultorg.apache.hadoop.fs.s3a.S3AFileSystem
): The implementation class of the S3a Filesystem.fs.AbstractFileSystem.s3a.impl
(defaultorg.apache.hadoop.fs.s3a.S3A
): The implementation class of the S3a AbstractFileSystem.fs.s3a.user.agent.prefix
(defaultAPN/1.0 WANdisco/1.0 LiveDataMigrator/1.11.6
): Sets a custom value that will be pre-pended to the User-Agent header sent in HTTP requests to the S3 back-end by S3aFileSystem.fs.s3a.impl.disable.cache
(defaulttrue
): Disables the S3 filesystem cache when set to 'true'.hadoop.tmp.dir
(defaulttmp
): The parent directory for other temporary directories.fs.s3a.connection.maximum
(default120
) Defines the maximum number of simultaneous connections to the S3 filesystem.fs.s3a.threads.max
(default150
): Defines the total number of threads to make available in the filesystem for data uploads or any other queued filesystem operation.fs.s3a.max.total.tasks
(default60
): Defines the number of operations which can be queued for execution at a time.fs.s3a.healthcheck
(Defaulttrue
): Allows the S3A filesystem health check to be turned off by changingtrue
tofalse
. This option is useful for setting up Data Migrator while cloud services are offline. However, when disabled, errors in S3A configuration may be missed, resulting in hard-to-diagnose migration stalls.
S3a custom properties
These are some of the additional properties that can be added when creating an S3a filesystem.
fs.s3a.fast.upload.buffer
(defaultdisk
): Defines how the filesystem will buffer the upload.fs.s3a.fast.upload.active.blocks
(default8
): Defines how many blocks a single output stream can have uploading or queued at a given time.fs.s3a.block.size
(default32M
): Defines the maximum size of blocks during file transfer. Use the suffixK
,M
,G
,T
orP
to scale the value in Kilobytes, Megabytes, Gigabytes, Terabytes or Petabytes respectively.fs.s3a.buffer.dir
(defaulttmp
): Defines the directory used by disk buffering.fs.s3a.endpoint.region
(default Current region): Explicitly sets the bucket region.
To configure a Oracle Cloud Storage bucket which isn't in your default region.
Specify a fs.s3a.endpoint.region=<region>
with the --properties
flag when adding the filesystem with the CLI.
Find an additional list of S3a properties in the S3a documentation.
Upload buffering
Migrations using an S3A target destination will buffer all uploads. By default, the buffering will occur on the local disk of system Data Migrator is running on, in the /tmp
directory.
Data Migrator will automatically delete the temporary buffering files once they are no longer needed.
If you want to use a different type of buffering, you can change the property fs.s3a.fast.upload.buffer
. The following values can be supplied:
Buffering Option | Details | Property Value |
---|---|---|
Array Buffer | Buffers the uploaded data in memory instead of on disk, using the Java heap. | array |
Byte Buffer | Buffers the uploaded data in memory instead of on disk, but does not use the Java heap. | bytebuffer |
Disk Buffering | The default option. Buffers the upload to disk. | disk |
Both the array
and bytebuffer
options may lead to the consumption of large amounts of memory. Other properties (such as fs.s3a.fast.upload.active.blocks
) may be used to fine-tune the migration to avoid issues.
If you run out of disk space on which to buffer the migration, the migration will stall with a series of errors. To avoid this, ensure the filesystem containing the directory used for buffering (/tmp
by default) has enough remaining space to facilitate the transfer.
S3a Example
filesystem add s3a --file-system-id mytarget
--bucket-name mybucket1
--credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
--access-key pkExampleAccessKeyiz
--secret-key OP87ChoExampleSecretKeyI904lT7AaDBGJpp0D
IBM Cloud Object Storage Examples
Add source IBM Cloud Object Storage filesystem. Note that this does not work if SSL is used on the endpoint address.
filesystem add s3a --source --file-system-id cos_s3_source2
--bucket-name container2
--credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
--access-key pkExampleAccessKeyiz
--secret-key c3vq6vaNtExampleSecretKeyVuqJMIHuV9IF3n9
--s3type ibmcos
--bootstrap.servers=10.0.0.123:9092
--topic newcos-events--enpoint http://10.0.0.124
Add path mapping.
path mapping add --path-mapping-id testPath
--description description-string
--source-path /
--target targetHdfs2
--target-path /repl_test1
{
"id": "testPath",
"description": "description-string",
"sourceFileSystem": "cos_s3_source2",
"sourcePath": "/",
"targetFileSystem": "targetHdfs2",
"targetPath": "/repl_test1"
}Adding file to container
./directory cp ~/Downloads/wq4.pptx cos/container2/
Removing a file from a container
~/Downloads/directory ./directory rm cos/container2/wq4.pptx
List objects in container
./directory ls cos/container2/
Via S3a API
aws s3api list-objects --endpoint-url=http://10.0.0.201
--bucket container2
filesystem auto-discover-source hdfs
Discover your local HDFS filesystem by entering the Kerberos credentials for your source environment.
You can also manually configure the source HDFS filesystem using the filesystem add hdfs
command.
filesystem auto-discover-source hdfs [--kerberos-principal] string
[--kerberos-keytab] string
Kerberos parameters
--kerberos-principal
The Kerberos principal to authenticate with and perform migrations as. This principal should map to the HDFS super user using auth_to_local rules.--kerberos-keytab
The Kerberos keytab containing the principal defined for the--kerberos-principal
parameter. This must be accessible to the local system user running the Data Migrator service (default ishdfs
).
Example
filesystem auto-discover-source hdfs --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal hdfs@REALM.COM
filesystem clear
Delete all target filesystem references with the filesystem clear
. This leaves any migrated content intact in those targets, but removes all resources that act as references to the target filesystems.
filesystem clear
filesystem delete
Delete a specific filesystem resource by ID. This leaves all migrated content intact at that target, but removes the resource that acts as a reference to that filesystem.
filesystem delete [--file-system-id] string
Mandatory parameters
--file-system-id
The ID of the filesystem resource to delete. In the UI, this is called Display Name.
Example
filesystem delete --file-system-id mytarget
filesystem list
List defined filesystem resources.
filesystem list [--detailed]
Mandatory parameters
--detailed
Include all properties for each filesystem in the JSON result.
filesystem show
View details for a filesystem resource.
filesystem show [--file-system-id] string
[--detailed]
Mandatory parameters
--file-system-id
The ID of the filesystem resource to show. In the UI, this is called Display Name.
Example
filesystem show --file-system-id mytarget
filesystem types
View information about the filesystem types available for use with Data Migrator.
filesystem types
filesystem update adls2 oauth
Update an existing ADLS Gen2 container migration target with a specified filesystem ID using the filesystem update adls2 oauth
command. You will be prompted to optionally update the service principal and OAuth 2 credentials.
Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add adls2 oauth
section.
All parameters are optional except --file-system-id
, which specifies the filesystem you want to update.
Example
filesystem update adls2 oauth --file-system-id mytarget --storage-account-name myadls2 --oauth2-client-id b67f67ex-ampl-e2eb-bd6d-client9385id --oauth2-client-secret 2IPO8*secretk-9OPs8n*TexampleHJ= --oauth2-client-endpoint https://login.microsoftonline.com/78u098ex-ampl-e498-8bce-ndpoint5f2e5/oauth2/v2.0/token --container-name lm2target
filesystem update adls2 sharedKey
Update an existing ADLS Gen2 container migration target using the filesystem update adls2 sharedKey
command. You will be prompted to optionally update the secret key.
Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add adls2 sharedKey
section.
All parameters are optional except --file-system-id
, which specifies the filesystem you want to update.
Example
filesystem update adls2 sharedKey --file-system-id mytarget --storage-account-name myadls2 --container-name lm2target --shared-key Yi8NxHGqoQ79DBGLVn+COK/sRDwbNqAEXAMPLEDaMxRkvXt2ijUtASHAREDj/vaS/NbzR5rtjEKEY31eIopUVA==
filesystem update gcs
Update a Google Cloud Storage migration target using the filesystem update gcs
command.
Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add gcs
section.
All parameters are optional except --file-system-id
, which specifies the filesystem you want to update.
Example
filesystem update gcs --file-system-id gcsAgent --bucket-name myGcsBucket --service-account-p12-key-file-server-location /user/hdfs/targetStorage/myAccountKey.p12 --service-account-email user@mydomain.com
filesystem update hdfs
Update either a source or target Hadoop Distributed filesystem using the filesystem update hdfs
command.
Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add hdfs
section.
All parameters are optional except --file-system-id
, which specifies the filesystem you want to update.
Examples
filesystem update hdfs --file-system-id mysource
--default-fs hdfs://sourcenameservice
--properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
filesystem update hdfs --file-system-id mytarget
--default-fs hdfs://sourcenameservice
--properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
--kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab
--kerberos-principal hdfs@SOURCEREALM.COM
filesystem update local
Update a target or source local filesystem using the filesystem update local
command.
Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add local
section.
All parameters are optional except --file-system-id
, which specifies the filesystem you want to update.
Example
filesystem update local --file-system-id mytarget --fs-root ./tmp
filesystem update s3a
Update an S3 bucket target filesystem using the filesystem update s3a
command. This method also supports IBM Cloud Object Storage buckets.
Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add s3a
section.
All parameters are optional except --file-system-id
, which specifies the filesystem you want to update.
Example
filesystem update s3a --file-system-id mytarget
--bucket-name mybucket1 --credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider --access-key pkExampleAccessKeyiz --secret-key eSeCreTkeYd8uEDnDHRHuV9IF3n9
Hive agent configuration commands
hive agent add azure
Add a local or remote Hive agent to connect to an Azure SQL database using the hive agent add azure
command.
If your Data Migrator host can communicate directly with the Azure SQL database, then a local Hive agent will be sufficient. Otherwise, consider using a remote Hive agent.
For a remote Hive agent connection, enter a remote host (Azure VM, HDI cluster node) that will be used to communicate with the local Data Migrator server (constrained to a user-defined port).
A small service will be deployed on this remote host so that the data can transfer between the Hive agent and the remote metastore.
hive agent add azure [--name] string
[--db-server-name] string
[--database-name] string
[--database-user] string
[--database-password] string
[--auth-method] azure-sqlauthentication-method
[--client-id] string
[--storage-account] string
[--container-name] string
[--insecure] boolean
[--host] string
[--port] integer
[--no-ssl]
[--autodeploy] boolean
[--ssh-user] string
[--ssh-key] file
[--ssh-port] int
[--use-sudo]
[--ignore-host-checking]
[--file-system-id] string
[--default-fs-override] string
Mandatory parameters
The Azure Hive agent requires a ADLS Gen2 storage account and container name, this is only for the purposes of generating the correct location for the metadata. The agent will not access the container and data will not be written to it.
--name
The ID to give to the new Hive agent. In the UI, this is called Display Name.--db-server-name
Azure SQL database server name.--database-name
The Azure SQL database name. In the UI, this is called Azure SQL Database Name.noteHive Migrator doesn’t support Azure SQL database names containing blank spaces (
-
).--storage-account
The name of the ADLS Gen2 storage account. In the UI, this is called Account Name.--container-name
The name of the container in the ADLS Gen2 storage account. In the UI, this is called Container Name.--auth-method
Azure SQL database connection authentication method (SQL_PASSWORD, AD_MSI, AD_INTEGRATED, AD_PASSWORD, ACCESS_TOKEN).
Additionally, use only one of the following parameters:
--file-system-id
The name of the filesystem that will be associated with this agent (for example:myadls2storage
). This will ensure any path mappings are correctly linked between the filesystem and the agent. In the UI, this is called Filesystem.--default-fs-override
Enter an override for the default filesystem URI instead of a filesystem name (for example:abfss://mycontainer@mystorageaccount.dfs.core.windows.net
). In the UI, this is called Default Filesystem Override.
Optional parameters
--client-id
Azure resource's clientId.--insecure
Define an insecure connection (TLS disabled) to the Azure SQL database server (default isfalse
). In the UI, this is called Use Secure Protocol.
Authentication parameters
Choose one of the authentication methods listed and include the additional parameters required for the chosen method.
--auth-method
The authentication method to use to connect to the Azure SQL server. In the UI, this is called Authentication Method.
The following methods can be used:SQL_PASSWORD
- Enter a username and password to access the database. In the UI, this is called SQL Password.AD_MSI
- Use a system-assigned or user-assigned managed identity. In the UI, this is called Active Directory MSI.
Required parameters for SQL_PASSWORD
--database-user
The user name to access the database. In the UI, this is called Database Username.--database-password
The user password to access the database. In the UI, this is called Database Password.
Required parameters for AD_MSI
To use this method, the following pre-requirements must be met:
Data Migrator or the remote Azure Hive agent must be installed on an Azure resource with the managed identity assigned to it. The host must also have Azure AD authentication enabled.
Your Azure SQL server must be enabled for Azure AD authentication.
You have created a contained user in the Azure SQL database that is mapped to the Azure AD resource (where Data Migrator or the remote Azure Hive agent is installed).
The username of the contained user will depend on whether you are using a system-assigned or user-assigned identity.
Azure SQL database command for a system-assigned managed identityCREATE USER "<azure_resource_name>" FROM EXTERNAL PROVIDER;
ALTER ROLE db_owner ADD MEMBER "<azure_resource_name>";The
<azure_resource_name>
is the name of the Azure resource where Data Migrator or remote Azure Hive agent is installed (for example:myAzureVM
).Azure SQL database command for a user-assigned managed identityCREATE USER <managed_identity_name> FROM EXTERNAL PROVIDER;
ALTER ROLE db_owner ADD MEMBER <managed_identity_name>;The
<managed_identity_name>
is the name of the user-assigned managed identity (for example:myManagedIdentity
).
Once all pre-requirements are met, see the system-assigned identity or user-assigned identity parameters.
System-assigned identity
No other parameters are required for a system-managed identity.
User-assigned identity
The --client-id
parameter must be specified:
--client-id
The Client ID of your Azure managed identity. In the UI, this is called MSI Client ID.
Parameters for remote Hive agents only
--host
The host where the remote Hive agent will be deployed.--port
The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.--no-ssl
TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it. You can't adjust this parameter after agent creation.
Parameters for automated deployment
--autodeploy
The remote agent will be automatically deployed when this flag is used. If using this, the--ssh-key
parameter must also be specified.--ssh-user
The SSH user to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter).--ssh-key
The absolute path to the SSH private key to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter).--ssh-port
The SSH port to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter). Default is port22
.--use-sudo
All commands performed by the SSH user will usesudo
on the remote host when performing automatic deployment (using the--autodeploy
parameter).--ignore-host-checking
Ignore strict host key checking when performing the automatic deployment (using the--autodeploy
parameter).
Steps for manual deployment
If you do not wish to use the --autodeploy
function, follow these steps to deploy a remote Hive agent for Azure SQL manually:
Transfer the remote server installer to your remote host (Azure VM, HDI cluster node):
Example of secure transfer from local to remote hostscp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
On your remote host, make the installer script executable:
chmod +x hivemigrator-remote-server-installer.sh
On your remote host, run the installer as root (or sudo) user in silent mode:
./hivemigrator-remote-server-installer.sh -- --silent
On your remote host, start the remote server service:
service hivemigrator-remote-server start
On your local host, run the
hive agent add azure
command without using--autodeploy
and its related parameters to configure your remote Hive agent.See the Example for remote Azure SQL deployment - manual example below for further guidance.
Examples
hive agent add azure --name azureAgent --db-server-name mysqlserver.database.windows.net --database-name mydb1 --auth-method SQL_PASSWORD --database-user azureuser --database-password mypassword --storage-account myadls2 --container-name mycontainer --file-system-id myadls2storage
hive agent add azure --name azureRemoteAgent --db-server-name mysqlserver.database.windows.net --database-name mydb1 --auth-method AD_MSI --storage-account myadls2 --container-name mycontainer --file-system-id myadls2storage --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5052
hive agent add azure --name azureRemoteAgent --db-server-name mysqlserver.database.windows.net --database-name mydb1 --auth-method AD_MSI --client-id b67f67ex-ampl-e2eb-bd6d-client9385id --storage-account myadls2 --container-name mycontainer --file-system-id myadls2storage --host myRemoteHost.example.com --port 5052
For a remote Hive agent connection, enter a remote host (EC2 instance) that will be used to communicate with the local Data Migrator server (constrained to a user-defined port).
A small service will be deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.
hive agent add filesystem
Add a filesystem Hive agent to connect to your host's local filesystem using the hive agent add filesystem
command.
hive agent add filesystem [--file-system-id] string
[--root-folder] string
[--name] string
--file-system-id
The filesystem ID to be used.--root-folder
The path to use as the root directory for the filesystem agent.--name
The ID to give to the new Hive agent.
Example
hive agent add filesystem --file-system-id myfilesystem --root-folder /var/lib/mysql --name fsAgent
hive agent add glue
Add an AWS Glue Hive agent to connect to an AWS Glue data catalog using the hive agent add glue
command.
If your Data Migrator host can communicate directly with the AWS Glue Data Catalog, then a local Hive agent will be sufficient. Otherwise, consider using a remote Hive agent.
For a remote Hive agent connection, enter a remote host (EC2 instance) that will be used to communicate with the local Data Migrator server (constrained to a user-defined port).
A small service will be deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.
hive agent add glue [--name] string
[--access-key] string
[--secret-key] string
[--glue-endpoint] string
[--aws-region] string
[--glue-catalog-id] string
[--credentials-provider] string
[--glue-max-retries] integer
[--glue-max-connections] integer
[--glue-max-socket-timeout] integer
[--glue-connection-timeout] integer
[--file-system-id] string
[--default-fs-override] string
[--host] string
[--port] integer
[--no-ssl]
Glue parameters
--name
The ID to give to the new Hive agent. In the UI, this is called Name.--glue-endpoint
The AWS Glue service endpoint for connections to the data catalog. VPC endpoint types are also supported. In the UI, this is called AWS Glue Service Endpoint.--aws-region
The AWS region that your data catalog is located in (default isus-east-1
). If--glue-endpoint
is specified, this parameter will be ignored. In the UI, this is called AWS Region.
Additionally, use only one of the following parameters:
--file-system-id
The name of the filesystem that will be associated with this agent (for example:mys3bucket
). This will ensure any path mappings are correctly linked between the filesystem and the agent. In the UI, this is called Filesystem.--default-fs-override
Enter an override for the default filesystem URI instead of a filesystem name (for example:s3a://mybucket/
). In the UI, this is called Default Filesystem Override.
Glue credential parameters
--credentials-provider
The AWS catalog credentials provider factory class. In the UI, this is called AWS Catalog Credentials Provider.- If you don't enter this parameter, the default is DefaultAWSCredentialsProviderChain.
- If you enter the
--access-key
and--secret-key
parameters, the credentials provider will automatically default to StaticCredentialsProviderFactory.
--access-key
The AWS access key. In the UI, this is called Access Key.--secret-key
The AWS secret key. In the UI, this is called Secret Key.
Glue optional parameters
--glue-catalog-id
The AWS Account ID to access the Data Catalog. This is used if the Data Catalog is owned by a different account to the one provided by the credentials provider and cross-account access has been granted.--glue-max-retries
The maximum number of retries the Glue client will perform after an error.--glue-max-connections
The maximum number of parallel connections the Glue client will allocate.--glue-max-socket-timeout
The maximum time the Glue client will allow for an established connection to timeout.--glue-connection-timeout
The maximum time the Glue client will allow to establish a connection.
Parameters for remote Hive agents only
--host
The host where the remote Hive agent will be deployed.--port
The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.--no-ssl
TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it. You can't adjust this parameter after agent creation.
Steps for remote agent deployment
Follow these steps to deploy a remote Hive agent for AWS Glue:
Transfer the remote server installer to your remote host (Amazon EC2 instance):
Example of secure transfer from local to remote hostscp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
On your remote host, run the installer as root (or sudo) user in silent mode:
./hivemigrator-remote-server-installer.sh -- --silent
On your remote host, start the remote server service:
service hivemigrator-remote-server start
On your local host, run the
hive agent add glue
command to configure your remote Hive agent.See the Example for remote AWS Glue agent example below for further guidance.
Examples
hive agent add glue --name glueAgent --access-key ACCESS6HCFPAQIVZTKEY --secret-key SECRET1vTMuqKOIuhET0HAI78UIPfSRjcswTKEY --glue-endpoint glue.eu-west-1.amazonaws.com --aws-region eu-west-1 --file-system-id mys3bucket
hive agent add glue --name glueAgent --access-key ACCESS6HCFPAQIVZTKEY --secret-key SECRET1vTMuqKOIuhET0HAI78UIPfSRjcswTKEY --glue-endpoint glue.eu-west-1.amazonaws.com --aws-region eu-west-1 --file-system-id mys3bucket --host myRemoteHost.example.com --port 5052
hive agent add hive
Add a Hive agent to connect to a local or remote Apache Hive Metastore using the hive agent add hive
command.
When connecting to a remote Apache Hive Metastore, enter a host on the remote cluster that will be used to communicate with the local Data Migrator server (constrained to a user-defined port).
A small service will be deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.
hive agent add hive [--config-path] string
[--config-file] string
[--kerberos-principal] string
[--kerberos-keytab] string
[--name] string
[--host] string
[--port] integer
[--no-ssl]
[--autodeploy]
[--ssh-user] string
[--ssh-key] file
[--ssh-port] int
[--use-sudo]
[--ignore-host-checking]
[--file-system-id] string
[--default-fs-override] string
Mandatory parameters
--kerberos-principal
Not required if Kerberos is disabled. The Kerberos principal to use to access the Hive service (for example:hive/myhost.example.com@REALM.COM
). In the UI, this is called Principal.--kerberos-keytab
Not required if Kerberos is disabled. The path to the Kerberos keytab containing the principal to access the Hive service (for example:/etc/security/keytabs/hive.service.keytab
). In the UI, this is called Keytab.--name
The ID to give to the new Hive agent. In the UI, this is called Name.
Additionally, use only one of the following parameters:
--file-system-id
The name of the filesystem that will be associated with this agent (for example:myhdfs
). This will ensure any path mappings are correctly linked between the filesystem and the agent. In the UI, this is called Filesystem.--default-fs-override
Enter an override for the default filesystem URI instead of a filesystem name (for example:hdfs://nameservice01
). In the UI, this is called Default Filesystem Override.
Optional parameters
--config-path
For a local agent for a target metastore or when Hive config is not located in /etc/hive/conf, supply a path containing the hive-site.xml, core-site.xml, and hdfs-site.xml.--config-file
If the configuration files are not located on the same path, use this parameter to enter all the paths as a comma-delimited list. For example,/path1/core-site.xml,/path2/hive-site.xml,/path3/hdfs-site.xml
.
When configuring a CDP target
--jdbc-url
The JDBC URL for the database.--jdbc-driver-name
Full class name of JDBC driver.--jdbc-username
Full class name of JDBC driver.--jdbc-password
Password for connecting to database.
Don't use the optional parameters, --config-path
and --config-file
in the same add command.
Use --config-path
when configuration files are on the same path, or --config-file
when the configuration files are on separate paths.
Parameters for remote Hive agents only
--host
The host where the remote Hive agent will be deployed.--port
The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.--no-ssl
TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it. You can't adjust this parameter after agent creation.
Parameters for automated deployment
--autodeploy
The remote agent will be automatically deployed when this flag is used. If using this, the--ssh-key
parameter must also be specified.--ssh-user
The SSH user to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter).--ssh-key
The absolute path to the SSH private key to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter).--ssh-port
The SSH port to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter). Default is port22
.--use-sudo
All commands performed by the SSH user will usesudo
on the remote host when performing automatic deployment (using the--autodeploy
parameter).--ignore-host-checking
Ignore strict host key checking when performing the automatic deployment (using the--autodeploy
parameter).
Steps for manual deployment
If you do not wish to use the --autodeploy
function, follow these steps to deploy a remote Hive agent for Apache Hive manually:
Transfer the remote server installer to your remote host:
Example of secure transfer from local to remote hostscp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
On your remote host, make the installer script executable:
chmod +x hivemigrator-remote-server-installer.sh
On your remote host, run the installer as root (or sudo) user in silent mode:
./hivemigrator-remote-server-installer.sh -- --silent
On your remote host, start the remote server service:
service hivemigrator-remote-server start
On your local host, run the
hive agent add hive
command without using--autodeploy
and its related parameters to configure your remote Hive agent.See the Example for remote Apache Hive deployment - manual example below for further guidance.
hive agent add hive --name sourceAgent --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@LOCALREALM.COM --file-system-id mysourcehdfs
hive agent add hive --name targetautoAgent --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5052 --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@REMOTEREALM.COM --config-path <example directory path> --file-system-id mytargethdfs
Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.
hive agent add hive --name targetmanualAgent --host myRemoteHost.example.com --port 5052 --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@REMOTEREALM.COM --config-path <example directory path> --file-system-id mytargethdfs
Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.
If you enter Kerberos and configuration path information for remote agents, ensure that the directories and Kerberos principal are correct for your chosen remote host (not your local host).
hive agent add databricks
Databricks agents are currently available as a preview feature.
The source table format must be in one of the following formats to ensure a successful migration to Databricks Delta Lake:
- CSV
- JSON
- Avro
- ORC
- Parquet
- Text
Add a Databricks Hive agent to connect to a Databricks Delta Lake metastore (AWS, Azure, or Google Cloud Platform (GCP)) using the hive agent add databricks
command.
hive agent add databricks [--name] string
[--jdbc-server-hostname] string
[--jdbc-port] int
[--jdbc-http-path] string
[--access-token] string
[--fs-mount-point] string
[--convert-to-delta]
[--delete-after-conversion]
[--file-system-id] string
[--default-fs-override] string
[--host] string
[--port] integer
[--no-ssl]
Enable JDBC connections to Databricks
The following steps are required to enable Java Database Connectivity (JDBC) to Databricks Delta Lake:
Download the Databricks JDBC driver.
Unzip the package and upload the
SparkJDBC42.jar
file to the LiveData Migrator host machine.Move the
SparkJDBC42.jar
file to the LiveData Migrator directory below:/opt/wandisco/hivemigrator/agent/databricks
Change ownership of the Jar file to the HiveMigrator system user and group:
Example for hive:hadoopchown hive:hadoop /opt/wandisco/hivemigrator/agent/databricks/SparkJDBC42.jar
Databricks mandatory parameters
--name
The ID to give to the new Hive agent. In the UI, this is called Name.--jdbc-server-hostname
The server hostname for the Databricks cluster (AWS, Azure or GCP). In the UI, this is called JDBC Server Hostname.--jdbc-port
The port used for JDBC connections to the Databricks cluster (AWS, Azure or GCP). In the UI, this is called JDBC Port.--jdbc-http-path
The HTTP path for the Databricks cluster (AWS, Azure or GCP). In the UI, this is called JDBC Http Path.--access-token
The personal access token to be used for the Databricks cluster (AWS, Azure or GCP). In the UI, this is called Access Token.
Additionally, use only one of the following parameters:
If the --convert-to-delta
option is used, the --default-fs-override
parameter must also be provided with the value set to dbfs:
, or a path inside the Databricks filesystem. For example, dbfs:/mount/externalStorage
.
--file-system-id
The name of the filesystem that will be associated with this agent (for example:myadls2
ormys3bucket
). This will ensure any path mappings are correctly linked between the filesystem and the agent. In the UI, this is called Filesystem.--default-fs-override
Provide an override for the default filesystem URI instead of a filesystem name (for example:dbfs:
). In the UI, this is called DefaultFs Override.
Databricks optional parameters
--fs-mount-point
Define the ADLS/S3/GCP location in the Databricks filesystem for containing migrations (for example:/mnt/mybucketname
). In the UI, this is called FS Mount Point.
This parameter is required if --convert-to-delta
is used. The Databricks agent will copy all associated table data and metadata into this location within the Databricks filesystem during conversion.
--convert-to-delta
All underlying table data and metadata is migrated to the filesystem location defined by the--fs-mount-point
parameter. Use this option to automatically copy the associated data and metadata into Delta Lake on Databricks (AWS, Azure or GCP), and convert tables into Delta Lake format. In the UI, this is called Convert to delta format.The following parameter can only be used if
--convert-to-delta
has been specified:--delete-after-conversion
Use this option to delete the underlying table data and metadata from the filesystem location (defined by--fs-mount-point
) once it has been converted into Delta Lake on Databricks. In the UI, this is called Delete after conversion.infoOnly use this option if you are performing one-time migrations for the underlying table data. The Databricks agent does not support continuous (live) updates of table data when transferring to Delta Lake on Databricks.
If a migration to Databricks runs without the
--convert-to-delta
option, then some migrated data may not be visible from the Databricks side. To avoid this issue, ensure that the value ofdefault-fs-override
is set to "dbfs:
" with the value of--fs-mount-point
.Example:
--default-fs-override dbfs:/mnt/mybucketname
Example
hive agent add databricks --name databricksAgent --jdbc-server-hostname mydbcluster.cloud.databricks.com --jdbc-port 443 --jdbc-http-path sql/protocolv1/o/8445611123456789/0234-125567-testy978 --access-token daexamplefg123456789t6f0b57dfdtoken4 --file-system-id mys3bucket --default-fs-override dbfs:/mnt/mybucketname --fs-mount-point /mnt/mybucket --convert-to-delta
hive agent add dataproc
Add a Hive agent to connect to a local or remote Google Dataproc Metastore using the hive agent add dataproc
command.
When connecting to a remote Dataproc Metastore, enter a host on the remote cluster that will be used to communicate with the local Data Migrator server (constrained to a user-defined port).
A small service will be deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.
hive agent add dataproc [--config-path] string
[--kerberos-principal] string
[--kerberos-keytab] string
[--name] string
[--host] string
[--port] integer
[--no-ssl]
[--autodeploy]
[--ssh-user] string
[--ssh-key] file
[--ssh-port] int
[--use-sudo]
[--ignore-host-checking]
[--file-system-id] string
[--default-fs-override] string
Mandatory parameters
--kerberos-principal
Not required if Kerberos is disabled. The Kerberos principal to use to access the Hive service (for example:hive/myhost.example.com@REALM.COM
). In the UI, this is called Principal.--kerberos-keytab
Not required if Kerberos is disabled. The path to the Kerberos keytab containing the principal to access the Hive service (for example:/etc/security/keytabs/hive.service.keytab
). In the UI, this is called Keytab.--name
The ID to give to the new Hive agent. In the UI, this is called Name.
Additionally, use only one of the following parameters:
--file-system-id
The name of the filesystem that will be associated with this agent (for example:myhdfs
). This will ensure any path mappings are correctly linked between the filesystem and the agent. In the UI, this is called Filesystem.
Optional parameters
--default-fs-override
Enter an override for the default filesystem URI instead of a filesystem name (for example:hdfs://nameservice01
). In the UI, this is called Default Filesystem Override.--config-path
The path to the directory containing the Hive configuration filescore-site.xml
,hive-site.xml
andhdfs-site.xml
. If not specified, Data Migrator will use the default location for the cluster distribution. In the UI, this is called Override Default Hadoop Configuration Path.
Parameters for remote Hive agents only
--host
The host where the remote Hive agent will be deployed.--port
The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.--no-ssl
TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it. You can't adjust this parameter after agent creation.
Parameters for automated deployment
--autodeploy
The remote agent will be automatically deployed when this flag is used. If using this, the--ssh-key
parameter must also be specified.--ssh-user
The SSH user to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter).--ssh-key
The absolute path to the SSH private key to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter).--ssh-port
The SSH port to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter). Default is port22
.--use-sudo
All commands performed by the SSH user will usesudo
on the remote host when performing automatic deployment (using the--autodeploy
parameter).--ignore-host-checking
Ignore strict host key checking when performing the automatic deployment (using the--autodeploy
parameter).
Steps for manual deployment
If you do not wish to use the --autodeploy
function, follow these steps to deploy a remote Hive agent for Apache Hive manually:
Transfer the remote server installer to your remote host:
Example of secure transfer from local to remote hostscp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
On your remote host, make the installer script executable:
chmod +x hivemigrator-remote-server-installer.sh
On your remote host, run the installer as root (or sudo) user in silent mode:
./hivemigrator-remote-server-installer.sh -- --silent
On your remote host, start the remote server service:
service hivemigrator-remote-server start
On your local host, run the
hive agent add dataproc
command without using--autodeploy
and its related parameters to configure your remote Hive agent.See the Example for remote Apache Hive deployment - manual example below for further guidance.
Examples
hive agent add dataproc --name sourceAgent --file-system-id mysourcehdfs
hive agent add dataproc --name targetautoAgent --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5052 --config-path <example directory path> --file-system-id mytargethdfs
Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.
hive agent add dataproc --name targetmanualAgent --host myRemoteHost.example.com --port 5052 --config-path <example directory path> --file-system-id mytargethdfs
Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.
If you enter Kerberos and configuration path information for remote agents, ensure that the directories and Kerberos principal are correct for your chosen remote host (not your local host).
hive agent add snowflake basic
Add an agent using basic authentication.
hive agent add snowflake basic [--account-identifier] string
[--file-system-id] string
[--name ] string
[--password] string
[--stage] string
[--stage-schema] string
[--warehouse] string
[--default-fs-override] string
[--schema] string
[--stage-database] string
[--user] string
[--network-timeout] int
[--query-timeout] int
[--role] string
Mandatory parameters
--account-identifier
is the unique ID for your Snowflake account.--name
is a name that will be used to reference the remote agent.--warehouse
is the Snowflake-based cluster of compute resources.--stage
storage used to temporarily store data that is being moved into Snowflake. A stage can be internal (to Snowflake), or external, using a cloud storage service from Amazon S3, Azure, or Google Cloud.--user
is your Snowflake username.
Additionally, use only one of the following parameters:
--file-system-id
is the ID of the target filesystem. In the UI, this is called Filesystem.--default-fs-override
is an override for the default filesystem URI instead of a filesystem name.
Optional parameters
--stage-database
is an optional parameter for a Snowflake stage database with the default value "WANDISCO".--stage-schema
- is an optional parameter for a Snowflake stage schema, with the default value "PUBLIC".--schema
- is an optional parameter for a Snowflake schema, with the default value "PUBLIC".--role
- you can enter a custom role for the JDBC connection used by Hive Migrator.
Timeout parameters
--network-timeout
- Number of milliseconds to wait for a response when interacting with the Snowflake service before returning an error.--query-timeout
- Number of seconds to wait for a query to complete before returning an error.
Examples
hive agent add snowflake basic --account-identifier test_adls2 --name snowflakeAgent --stage myAzure --user exampleUser -- password examplePassword --warehouse DemoWH2
hive agent add snowflake privatekey
hive agent add snowflake privatekey [--account-ID] string
[--file-system-id] string
[--private-key-file] string
[--private-key-file-pwd] string
[--schema] string
[--stage-database] string
[--warehouse] string
[--default-fs-override] string
[--name] string
[--stage] string
[--stage-schema] string
[--user] string
Mandatory parameters
--account-identifier
is the unique ID for your Snowflake account.--private-key-file
is the path to your private key file.--private-key-file-pwd
is the password that corresponds with the above private-key-file.--name
is a name that will be used to reference the remote agent.--warehouse
is the Snowflake-based cluster of compute resources.--stage
storage used to temporarily store data that is being moved into Snowflake. A stage can be internal (to Snowflake), or external, using a cloud storage service from Amazon S3, Azure, or Google Cloud.--user
is your Snowflake username.
Additionally, use only one of the following parameters:
--file-system-id
is the ID of the target filesystem. In the UI, this is called Filesystem.--default-fs-override
is an override for the default filesystem URI instead of a filesystem name.
Optional parameters
--stage-database
is an optional parameter for a Snowflake stage database with the default value "WANDISCO".--stage-schema
- is an optional parameter for a Snowflake stage schema, with the default value "PUBLIC".--schema
- is an optional parameter for a Snowflake schema, with the default value "PUBLIC".
hive agent check
Check the configuration of an existing Hive agent using hive agent check
.
hive agent check [--name] string
Example
hive agent check --name azureAgent
hive agent configure azure
Change the configuration of an existing Azure Hive agent using hive agent configure azure
.
The parameters that can be changed are the same as the ones listed in the hive agent add azure
section.
All parameters are optional except --name
, which is required to enter the existing Hive agent that you wish to configure.
Example
hive agent configure azure --name azureAgent --database-password CorrectPassword
hive agent configure filesystem
Change the configuration of an existing filesystem Hive agent using hive agent configure filesystem
.
The parameters that can be changed are the same as the ones listed in the hive agent add filesystem
section.
All parameters are optional except --name
, which is required to enter the existing Hive agent that you wish to configure.
Example
hive agent configure filesystem --name fsAgent --root-folder /user/dbuser/databases
hive agent configure glue
Change the configuration of an existing AWS Glue Hive agent using hive agent configure glue
.
The parameters that can be changed are the same as the ones listed in the hive agent add glue
section.
Include all existing AWS Glue agent parameters in the command
Running the command with only changing parameters will blank all other parameters, breaking the agent.
This requirement is removed from Data Migrator 2.1
Example
hive agent configure glue --name glueAgent --access-key ACCESS6HCFPAQIVZTKEY --secret-key SECRET1vTMuqKOIuhET0HAI78UIPfSRjcswTKEY --glue-endpoint glue.eu-west-1.amazonaws.com --aws-region us-east-2 glue.eu-west-1.amazonaws.com
hive agent configure databricks
Change the configuration of an existing Databricks agent using hive agent configure databricks
.
The parameters that can be changed are the same as the ones listed in the hive agent add databricks
section.
All parameters are optional except --name
, which is required to enter the existing Hive agent that you wish to configure.
Example
hive agent configure hive --name databricksAgent --access-token myexamplefg123456789t6fnew7dfdtoken4
hive agent configure dataproc
Change the configuration of an existing Dataproc agent using hive agent configure dataproc
.
The parameters that can be changed are the same as the ones listed in the hive agent add dataproc
section.
All parameters are optional except --name
, which is required to enter the existing Hive agent that you wish to configure.
Example
hive agent configure dataproc --name dataprocAgent --port 9099
hive agent configure snowflake
Configure an existing Snowflake remote agent by using the hive agent configure snowflake
command.
hive agent configure snowflake basic [--account-identifier] string
[--file-system-id] string
[--user] string
[--password] string
[--stage] string
[--stage-schema] string
[--warehouse] string
[--default-fs-override] string
[--name] string
[--schema] string
[--stage-database] string
Example Snowflake remote agent configuration
hive agent configure snowflake basic --user snowflakeAgent --password <password-here> --stage internal
hive agent configure snowflake privatekey [--account-identifier] string
[--file-system-id] string
[--private-key-file] string
[--private-key-file-pwd] string
[--schema] string
[--stage-database] string
[--warehouse] string
[--default-fs-override] string
[--name] string
[--stage] string
[--stage-schema] string
Example Snowflake remote agent configuration
hive agent configure snowflake privatekey --private-key-file-pwd <password> --private-key-file /path/to/keyfiles/ --user snowflakeAgent --schema star-schema
hive agent delete
Delete the specified Hive agent with hive agent delete
.
hive agent delete [--name] string
Example
hive agent delete --name azureAgent
hive agent list
List configured Hive agents with hive agent list
.
hive agent list [--detailed]
Example
hive agent list --detailed
hive agent show
Show the configuration of a Hive agent with hive agent show
.
hive agent show [--name] string
Example
hive agent show --name azureAgent
hive agent types
Print a list of supported Hive agent types with hive agent types
.
hive agent types
Example
hive agent types
Exclusion commands
exclusion add date
Create a date-based exclusion that checks the 'modified date' of any directory or file that the Data Migrator encounters during a migration to which the exclusion has been applied. If the path or file being examined by Data Migrator has a 'modified date' earlier than the specified date, it will be excluded from the migration.
Once associated with a migration using migration exclusion add
, files that match the policy will not be migrated.
exclusion add date [--exclusion-id] string
[--description] string
[--before-date] string
Mandatory parameters
--exclusion-id
The ID for the exclusion policy. In the UI, this is called Name.--description
A user-friendly description for the policy. In the UI, this is called Description.--before-date
An ISO formatted date and time, which can include an offset for a particular time zone. In the UI, this is called TBA.
Example
exclusion add date --exclusion-id beforeDate --description "Files earlier than 2020-10-01T10:00:00PDT" --before-date 2020-10-01T10:00:00-07:00
exclusion add file-size
Create an exclusion that can be applied to migrations to constrain the files transferred by a policy based on file size. Once associated with a migration using migration exclusion add
, files that match the policy will not be migrated.
exclusion add file-size [--exclusion-id] string
[--description] string
[--value] long
[--unit] string
Mandatory parameters
--exclusion-id
The ID for the exclusion policy. In the UI, this is called Name.--description
A user-friendly description for the policy. In the UI, this is called Description.--value
The numerical value for the file size, in a unit defined by the--unit
parameter. In the UI, this is called Value.--unit
A string to define the unit used. You can useB
for bytes,GB
for gigabytes,KB
for kilobytes,MB
for megabytes,PB
for petabytes,TB
for terabytes,GiB
for gibibytes,KiB
for kibibytes,MiB
for mebibytes,PiB
for pebibytes, orTiB
for tebibytes when creating exclusions with the CLI.
Example
exclusion add file-size --exclusion-id 100mbfiles --description "Files greater than 100 MB" --value 100 --unit MB
exclusion add regex
Create an exclusion using a regular expression to prevent certain files and directories being transferred based on matching file or directory names. Once associated with a migration using migration exclusion add
, files and directories that match the regular expression will not be migrated.
exclusion add regex [--exclusion-id] string
[--description] string
[--regex] string
[--type] string
Mandatory parameters
--exclusion-id
The ID for the exclusion policy. In the UI, this is called Name.--description
A user-friendly description for the policy. In the UI, this is called Description.--regex
A regular expression in a syntax of either Java PCRE, Automata or GLOB type. In the UI, this is called Regex.
Optional parameters
--type
Choose the regular expression syntax type. There are three options available:JAVA_PCRE
(default)AUTOMATA
GLOB
Examples
exclusion add regex --description "No paths or files that start with test" --exclusion-id exclusion1 --type GLOB --regex test*
exclusion add regex --description "No paths of files that start with test" --exclusion-id exclusion1 --regex ^test\.*
Using backslash characters within --regex
parameter
If you wish to use a \
character as part of your regex value, you must escape this character with an additional backslash.
exclusion add regex --description "No paths that start with a backslash followed by test" --exclusion-id exclusion2 --regex ^\\test\.*
The response displayed if running through the CLI will not hide the additional backslash. However, the internal representation will be as expected within Data Migrator (it will read as ^\test.*
).
This workaround isn't required for API inputs, as it only affects the Spring Shell implementation used for the CLI.
exclusion delete
Delete an exclusion policy so that it is no longer available for migrations.
exclusion delete [--exclusion-id] string
Mandatory parameters
--exclusion-id
The ID for the exclusion policy to delete. In the UI, this is called Name.
Example
exclusion delete --exclusion-id exclusion1
exclusion list
List all exclusion policies defined.
exclusion list - List all exclusion rules.
exclusion show
Get details for an individual exclusion policy by ID.
exclusion show [--exclusion-id] string
Mandatory parameters
--exclusion-id
The ID for the exclusion policy to show. In the UI, this is called Name.
Example
exclusion show --exclusion-id 100mbfiles
Migration commands
migration add
migration add [--name or --migration-id] string
[--path] string
[--target] string
[--exclusions] string
[--action-policy] string
[--auto-start]
[--source] string
[--scan-only]
[--verbose]
[--detailed]
Do not write to target filesystem paths when a migration is underway. This could interfere with Data Migrator functionality and lead to undetermined behavior.
Use different filesystem paths when writing to the target filesystem directly (and not through Data Migrator).
Mandatory parameters
--path
Defines the source filesystem directory that is the scope of the migration. All content (other than that excluded) will be migrated to the target. In the UI, this is called Path for {source-filesystem}.
ADLS Gen2 has a filesystem restriction of 60 segments. Make sure your path has less than 60 segments when defining the path string parameter.
--target
Specifies the name of the target filesystem resource to which migration will occur. In the UI, this is called Target.
Optional parameters
--name
or--migration-id
Enter a name or ID for the new migration. An ID is auto-generated if you don't enter one. In the UI, this is called Migration Name.--exclusions
A comma-separated list of exclusions by name. In the UI, this is called Add new exclusion.--auto-start
Enter this parameter if you want the migration to start immediately. If you don't enter one, the migration will only take effect once you start to run it. In the UI, this is called Auto-start migration.--action-policy
This parameter determines what happens if the migration encounters content in the target path with the same name and size. In the UI, this is called Skip Or Overwrite Settings.
There are two options available:com.wandisco.livemigrator2.migration.OverwriteActionPolicy
(default policy)
Every file is replaced, even if file size is identical on the target storage. In the UI, this is called Overwrite.com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy
If the file size is identical between the source and target, the file is skipped. If it’s a different size, the whole file is replaced. In the UI, this is called Skip if Size Match.
--source
Specifies the name of the source filesystem.--scan-only
Select this option to create a one-time migration.--verbose
Enter this parameter for additional information about the migration.--detailed
Enter this parameter for additional information about the migration.
Example
migration add --path /repl1 --target mytarget –-migration-id myNewMigration --exclusions 100mbfiles
migration delete
Delete a stopped migration resource.
migration delete [--name or --migration-id] string
Mandatory parameters
--name
or--migration-id
The migration name or ID to delete.
Example
migration delete --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e
migration exclusion add
Associate an exclusion resource with a migration so that the exclusion policy applies to items processed for the migration. Exclusions must be associated with a migration before they take effect.
migration exclusion add [--name or --migration-id] string
[--exclusion-id] string
Mandatory parameters
--name
or--migration-id
The migration name or ID with which to associate the exclusion.--exclusion-id
The ID of the exclusion to associate with the migration. In the UI, this is called Name.
Example
migration exclusion add --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e --exclusion-id myexclusion1
migration exclusion delete
Remove an exclusion from association with a migration so that its policy no longer applies to items processed for the migration.
migration exclusion delete [--name or --migration-id] string
[--exclusion-id] string
Mandatory parameters
--name
or--migration-id
The migration name or ID from which to remove the exclusion.--exclusion-id
The ID of the exclusion to remove from the migration. In the UI, this is called Name.
Example
migration exclusion delete --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e --exclusion-id myexclusion1
migration list
Present the list of all migrations defined.
migration list [--detailed or --verbose]
Optional parameters
--detailed
or--verbose
Returns additional information about each migration.
migration path status
View all actions scheduled on a source filesystem in the specified path.
migration path status [--source-path] string
[--source] string
Mandatory parameters
--source-path
The path on the filesystem to review actions for. Supply a full directory.--source
The filesystem ID of the source system the path is in.
Example
migration path status --source-path /root/mypath/ --source mySource
migration pending-region add
Add a path for rescanning to a migration.
migration pending-region add [--name or --migration-id] string
[--path] string
[--action-policy] string
Mandatory parameters
--name
or--migration-id
The migration name or ID.--path
The path string of the region to add for rescan.
Optional parameters
--action-policy
This parameter determines what happens if the migration encounters content in the target path with the same name and size. In the UI, this is called Skip Or Overwrite Settings.
There are two options available:com.wandisco.livemigrator2.migration.OverwriteActionPolicy
(default policy)
Every file is replaced, even if file size is identical on the target storage. In the UI, this is called Overwrite.com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy
If the file size is identical between the source and target, the file is skipped. If it’s a different size, the whole file is replaced. In the UI, this is called Skip if Size Match.
Example
migration pending-region add --name myMigration --path etc/files --action-policy com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy
migration reset
Reset a stopped migration to the state it was in before it was started. This deletes and replaces it with a new migration that has the same settings as the old one.
migration reset [--name or --migration-id] string
[--action-policy] string
[--reload-mappings]
[--detailed or --verbose]
Mandatory parameters
--name
or--migration-id
The name of the migration you want to reset.--migration-id
The ID of the migration you want to reset.
Optional parameters
--action-policy
Accepts two string values:com.wandisco.livemigrator2.migration.OverwriteActionPolicy
causes the new migration to re-migrate all files from scratch, including those already migrated to the target filesystem, regardless of file size.com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy
skips migrating files that exist on both the target and source, if the file size is consistent between them. Use tab auto-completion with this parameter to view both options and a short description of each.--reload-mappings
Resets the migration's path mapping configuration, using the newest default path mapping configuration for Data Migrator.--detailed
or--verbose
Returns additional information about the reset migration, similarly tomigration show
.
Example
migration reset --name mymigration
migration resume
Resume a migration that you've stopped from transferring content to its target.
migration resume [--name or --migration-id] string
[--detailed or --verbose]
Mandatory parameters
--name
or--migration-id
The migration name or ID to resume.
Example
migration resume --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e
migration run
,migration start
Start a migration that was created without the --auto-start
parameter.
migration run [--name or --migration-id] string
[--detailed or --verbose]
Mandatory parameters
--name
or--migration-id
The migration name or ID to run.
Optional parameters
--detailed
or--verbose
Outputs additional information about the migration.
Example
migration run –-migration-id myNewMigration
migration show
Enter a JSON description of a specific migration.
migration show [--name or --migration-id] string
[--detailed or --verbose]
Mandatory parameters
--name
or--migration-id
The migration name or ID to show.
Optional parameters
--detailed
or--verbose
Outputs additional information about the migration.
Example
migration show --name myNewMigration
migration stop
Stop a migration from transferring content to its target, placing it into the STOPPED
state. Stopped migrations can be resumed.
migration stop [--name or --migration-id] string
Mandatory parameters
--name
or--migration-id
The migration name or ID to stop.
Example
migration stop --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e
migration verification add
Add a migration verification for a specified migration. This will scan your source and target filesystems (in the migration path) and compare them for any discrepancies in either file sizes or in missing files.
The verification status will show the number of missing paths and files on the target filesystem and also the number of file size mismatches between the source and target. The verification status can be viewed by using migration verification show
(for individual verification jobs) or migration verification list
(for all verification jobs).
Once a verification job is complete, a verification report will be created in the /var/log/wandisco/livedata-migrator
directory in the format of verification-report-{verificationId}-{startTime}.log
. This report will contain more details including any paths that have discrepancies.
See migration verifications for more details.
migration verification add [--name or --migration-id] string
[--override]
Mandatory parameters
--name
or--migration-id
The migration name or ID to start (or override) a verification on.
Optional parameters
--override
Stop the currently running verification and start a new one.
Examples
migration verification add --name myMigration
migration verification add --name myMigration --override
migration verification list
List all running migration verification jobs and their statuses (use migration verification show
when just wanting the status for one verification job).
migration verification list
migration verification show
Show the status of a specific migration verification.
migration verification show [--verification-id] string
Mandatory parameters
--verification-id
Show the status of the verification job for this verification ID (only one verification job can be running per migration).
Example
WANdisco LiveData Migrator >> migration verification show --verification-id 91c79b1b-c61f-4c39-be61-18072ac3a086-1676979356465
{
"id": "91c79b1b-c61f-4c39-be61-18072ac3a086-1676979356465",
"migrationId": "ver1",
"migrationInternalId": "91c79b1b-c61f-4c39-be61-18072ac3a086",
"status": "COMPLETE",
"createdTimestamp": 1676979356467,
"startedTimestamp": 1676979356518,
"finishedTimestamp": 1676979356598,
"createdAt": "2023-02-21T11:35:56.467Z",
"startedAt": "2023-02-21T11:35:56.518Z",
"finishedAt": "2023-02-21T11:35:56.598Z",
"paths": [
"/DATA/d1"
],
"ignoreAfterTimestamp": 1676978431233,
"originalPaths": [
"/DATA/d1"
],
"verificationDepth": 0,
"filesOnSource": 1,
"directoriesOnSource": 0,
"bytesOnSource": 842,
"filesExcluded": 0,
"filesExcludedExistsOnTarget": 0,
"filesExcludedNotExistsOnTarget": 0,
"dataExcluded": 0,
"bytesExcluded": 0,
"bytesExcludedExistsOnTarget": 0,
"bytesExcludedNotExistsOnTarget": 0,
"directoriesExcluded": 0,
"directoriesExcludedExistsOnTarget": 0,
"directoriesExcludedNotExistsOnTarget": 0,
"filesOnTarget": 1,
"directoriesOnTarget": 0,
"bytesOnTarget": 842,
"filesMissingOnTarget": 0,
"directoriesMissingOnTarget": 0,
"filesMissingOnSource": 0,
"directoriesMissingOnSource": 0,
"fileSizeMismatches": 0,
"totalDiscrepancies": 0
}
status
Get a text description of the overall status of migrations. Information is provided on the following:
- Total number of migrations defined.
- Average bandwidth being used over 10s, 60s, and 300s intervals.
- Peak bandwidth observed over 300s interval.
- Average file transfer rate per second over 10s, 60s, and 300s intervals.
- Peak file transfer rate per second over a 300s interval.
- List of migrations, including one-time migrations, with source path and migration id, and with current progress broken down by migration state: completed, live, stopped, running and ready.
status [--diagnostics]
[--migrations]
[--network]
[--transfers]
[--watch]
[--refresh-delay] int
[--full-screen]
Optional parameters
--diagnostics
Returns additional information about your Data Migrator instance and its migrations, useful for troubleshooting.--migrations
Displays information about each running migration.--network
Displays file transfer throughput in Gib/s during the last 10 seconds, 1 minute and 30 minutes.--transfers
Displays overall performance information about data transfers across the last 10 seconds, 1 minute and 30 minute intervals.--watch
Auto-refresh.--refresh-delay
Auto-refresh interval (in seconds).--full-screen
Auto-refresh fullscreen
Examples
WANdisco LiveMigrator >> status
Network (10s) (1m) (30m)
Average Throughput: 10.4 Gib/s 9.7 Gib/s 10.1 Gib/s
Average Files/s: 425 412 403
11 Migrations dd:hh:mm dd:hh:mm
Complete: 1 Transferred Excluded Duration
/static1 5a93d5 67.1 GiB 2.3 GiB 00:12:34
Live: 3 Transferred Excluded Duration
/repl1 9088aa 143.2 GiB 17.3 GiB 00:00:34
/repl_psm1 a4a7e6 423.6 TiB 9.6 GiB 02:05:29
/repl5 ab140d 118.9 GiB 1.2 GiB 00:00:34
Running: 5 Transferred Excluded Duration Remaining
/repl123 e3727c 30.3/45.2 GiB 67% 9.8 GiB 00:00:34 00:00:17
/repl2 88e4e7 26.2/32.4 GiB 81% 0.2 GiB 00:01:27 00:00:12
/repl3 372056 4.1/12.5 GiB 33% 1.1 GiB 00:00:25 00:01:05
/repl4 6bc813 10.6/81.7 TiB 8% 12.4 GiB 00:04:21 01:02:43
/replxyz dc33cb 2.5/41.1 GiB 6% 6.5 GiB 01:00:12 07:34:23
Ready: 2
/repl7 070910 543.2 GiB
/repltest d05ca0 7.3 GiB
WANdisco LiveMigrator >> status
WANdisco LiveMigrator >> status --transfers
Files (10s) (1m) (30m)
Average Migrated/s: 362 158 4781
< 1 KB 14 27 3761
< 1 MB 151 82 0
< 1 GB 27 1 2
< 1 PB 0 0 0
< 1 EB 0 0 0
Peak Migrated/s: 505 161 8712
< 1 KB 125 48 7761
< 1 MB 251 95 4
< 1 GB 29 7 3
< 1 PB 0 0 0
< 1 EB 0 0 0
Average Scanned/s: 550 561 467
Average Rescanned/s: 24 45 56
Average Excluded/s: 7 7 6
WANdisco LiveMigrator >> status --diagnostics
Uptime: 0 Days 1 Hours 23 Minutes 24 Seconds
SystemCpuLoad: 0.1433 ProcessCpuLoad: 0.0081
JVM GcCount: 192 GcPauseTime: 36 s (36328 ms)
OS Connections: 1, Tx: 0 B, Rx: 0 B, Retransmit: 0
Transfer Bytes (10/30/300s): 0.00 Gib/s, 0.00 Gib/s, 0.00 Gib/s
Transfer Files (10/30/300s): 0.00/s 0.00/s 0.00/s
Active Transfers/pull.threads: 0/100
Migrations: 0 RUNNING, 4 LIVE, 0 STOPPED
Actions Total: 0, Largest: "testmigration" 0, Peak: "MyMigration" 1
PendingRegions Total: 0 Avg: 0, Largest: "MyMigration" 0
FailedPaths Total: 0, Largest: "MyMigration" 0
File Transfer Retries Total: 0, Largest: "MyMigration" 0
Total Excluded Scan files/dirs/bytes: 26, 0, 8.1 MB
Total Iterated Scan files/dirs/bytes: 20082, 9876, 2.7 GB
EventsBehind Current/Avg/Max: 0/0/0, RPC Time Avg/Max: 4/8
EventsQueued: 0, Total Events Added: 504
Transferred File Size Percentiles:
2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B
Transferred File Transfer Rates Percentiles per Second:
2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B
Active File Size Percentiles:
0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B
Active File Transfer Rates Percentiles per Second:
0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B
Hive migration commands
hive migration add
Create a new Hive migration to initiate metadata migration from your source Metastore.
Create hive rules before initiating a Hive migration to enter which databases and tables are migrated.
hive migration add [--source] string
[--target] string
[--name] string
[--auto-start]
[--once]
[--rule-names] list
Mandatory parameters
--source
The name of the Hive agent for the source of migration.--target
The name of the Hive agent for the target of migration.
Optional parameters
--name
The name to identify the migration with.--auto-start
Enter this parameter to start the migration immediately after creation.--once
Enter this parameter to perform a one-time migration, and not continuously scan for new or changing metadata.--rule-names
The rule name or list of rule names to use with the migration. Multiple rules need to be comma-separated (for example:rule1,rule2,rule3
).
Example
hive migration add --source sourceAgent --target remoteAgent --rule-names test_dbs,user_dbs --name hive_migration --auto-start
Auto-completion of the --rule-names
parameter will not work correctly if it is added at the end of the Hive migration parameters. See the troubleshooting guide for workarounds.
hive migration delete
Delete a Hive migration.
A Hive migration must be stopped before it can be deleted. This can be achieved by using the --force-stop
parameter with this command.
hive migration delete [--name] string [--force-stop]
Example
hive migration delete --name hive_migration --force-stop
hive migration list
List all Hive migrations.
hive migration list
hive migration show
Display information about a Hive migration.
hive migration show
hive migration start
Start a Hive migration or a list of Hive migrations (comma-separated).
Enter the --once
parameter to perform a one-time migration, and not continuously scan for new or changing metadata.
hive migration start [--names] list [--once]
Example
hive migration start --names hive_migration1,hive_migration2
hive migration start all
Start all Hive migrations.
Enter the --once
parameter to perform a one-time migration, and not continuously scan for new or changing metadata.
hive migration start all [--once]
Example
hive migration start all --once
hive migration status
Show the status of a Hive migration or a list of Hive migrations (comma-separated).
hive migration status [--names] list
Example
hive migration status --names hive_migration1,hive_migration2
hive migration status all
Show the status of all Hive migrations.
hive migration status all
Example
hive migration status all
hive migration stop
Stop a running hive migration or a list of running hive migrations (comma-separated).
hive migration stop [--names] list
Example
hive migration stop --names hive_migration1,hive_migration2
hive migration stop all
Stop all running Hive migrations.
hive migration stop all
Example
hive migration stop all
hive migration reset
Reset a stopped Hive migration. This returns the migration to a CREATED
state.
hive migration reset [--names] string
[--force-stop]
A Hive migration must be stopped before it can be reset. This can be achieved by using the --force-stop
parameter with this command.
The reset migration will use the latest agent settings.
For example, if the target agent’s Default Filesystem Override setting was updated after the original migration started, the reset migration will use the latest Default Filesystem Override value.
To reset multiple Hive migrations, use a comma-separated list of migration names with the --names
parameter.
Example
hive migration reset --names hive_migration1
hive migration reset --force-stop --names hive_migration1,hive_migration2
Path mapping commands
path mapping add
Create a path mapping that allows you to define a alternative target path for a specific target filesystem. These will be automatically applied to a new migration.
When path mapping isn't used, the source path is created on the target filesystem.
Path mappings can't be applied to existing migrations. Delete and recreate a migration if you want a path mapping to apply.
path mapping add [--path-mapping-id] string
[--source-path] string
[--target] string
[--target-path] string
[--description] string
Mandatory parameters
--source-path
The path on the source filesystem.--target
The target filesystem id (value defined for the--file-system-id
parameter).--target-path
The path for the target filesystem.--description
Description of the path mapping enclosed in quotes ("text"
).
Optional parameters
--path-mapping-id
An ID for this path mapping. An ID will be auto-generated if you don't enter one.
Example
path mapping add --path-mapping-id hdp-hdi --source-path /apps/hive/warehouse --target mytarget --target-path /hive/warehouse --description "HDP to HDI - Hive warehouse directory"
path mapping delete
Delete a path mapping.
Deleting a path mapping will not affect any existing migrations that have the path mapping applied. Delete and recreate a migration if you no longer want a previous path mapping to apply.
path mapping delete [--path-mapping-id] string
Mandatory parameters
--path-mapping-id
The ID of the path mapping.
Example
path mapping delete --path-mapping-id hdp-hdi
path mapping list
List all path mappings.
path mapping list [--target] string
Optional parameters
--target
List path mappings for the specified target filesystem id.
Examples
path mapping list --target hdp-hdi
path mapping list --target hdp-hdi
path mapping show
Show details of a specified path mapping.
path mapping show [--path-mapping-id] string
Optional parameters
--path-mapping-id
The ID of the path mapping.
Example
path mapping show --path-mapping-id hdp-hdi
Built-in commands
clear
clear
echo
Prints whatever text that you write to the console. This can be used to sanity check a command before running it (for example: echo migration add --path /repl1 --target mytarget –-migration-id myNewMigration --exclusions 100mbfiles
).
echo [--message] string
exit
, quit
Entering either exit
or quit
will stop operation of Data Migrator when it is run from the command line. All processing will cease, and you will be returned to your system shell.
If your Data Migrator command line is connected to a Data Migrator system service, this command will end your interactive session with that service, which will remain in operation to continue processing migrations.
If this command is encountered during non-interactive processing of input (such as when you pipe input to an instance as part of another shell script) no further commands contained in that input will be processed.
exit
ALSO KNOWN AS
quit
help
Use the help
command to get details of all commands available from the action prompt.
help [-C] string
For longer commands, you can use backslashes (\
) to indicate continuation, or use quotation marks ("
) to enclose the full command. When using quotation marks, you can press Tab on your keyboard to make Data Migrator automatically suggest the remainder of your typed command.
See the examples below for reference.
Example
help connect
connect - Connect to Data Migrator and Hive Migrator.
connect [--host] string [--ssl] [--lm2port] int [--hvm-port] int [--timeout] integer [--user] string
help hive\ migration\ add
hive migration add - Create new migration.
hive migration add [--source] string [--target] string [--name] string [--auto-start] [--once] [--rule-names] list
help "filesystem add local"
filesystem add local - Add a local filesystem.
filesystem add local [--file-system-id] string [--fs-root] string [--source] [--scan-only] [--properties-files] list [--properties] string
history
Enter history
at the action prompt to list all previously entered commands.
Entering history --file <filename>
will save up to 500 most recently entered commands in text form to the file specified. Use this to record commands that you have executed.
history [--file] file
Optional parameters
--file
The name of the file in which to save the history of commands.
script
Load and execute commands from a text file using the script --file <filename>
command. This file should have one command per line, and each will be executed as though they were entered directly at the action prompt in that sequence.
Use scripts outside of the WANdisco CLI by referencing the script when running the livedata-migrator
command (see examples).
script [--file] file
Mandatory parameters
--file
The name of the file containing script commands.
hive agent check --name sourceAgent
hive agent check --name azureAgent
Examples
These examples assume that myScript
is inside the working directory.
script --file myScript
livedata-migrator --script=./myScript
Change log level commands
log debug
log debug
log info
log info
log off
log off
log trace
log trace
Connect commands
connect livemigrator
Connect to the Data Migrator service on your Data Migrator host with this command.
This is a manual method of connecting to the Data Migrator service as the livedata-migrator
command (shown in CLI - Sign in) will attempt to establish this connection automatically.
connect livemigrator [--host] string
[--ssl]
[--port] int
[--timeout] integer
[--user] string
Mandatory parameters
--host
The hostname or IP address for the Data Migrator host.
Optional parameters
--ssl
Enter this parameter if you want to establish a TLS connection to Data Migrator. Enable Server TLS on the Data Migrator service before using this parameter.--port
The Data Migrator port to connect on (default is18080
).--timeout
Define the connection timeout in milliseconds. Set this parameter to override the default connection timeout of 5 minutes (300000ms).--user
The username to use for authenticating to the Data Migrator service. Used only when the Data Migrator instance has basic authentication enabled. You will still be prompted to enter the user password.
Connect to the Data Migrator service on your Data Migrator host with this command.
connect livemigrator --host localhost --port 18080
connect hivemigrator
Connect to the Hive Migrator service on your Data Migrator host with this command.
This is a manual method of connecting to the Hive Migrator service as the livedata-migrator
command (shown in CLI - Log in section) will attempt to establish this connection automatically.
connect hivemigrator [--host] string
[--ssl]
[--port] int
[--timeout] long
[--user] string
Mandatory parameters
--host
The hostname or IP address for the Data Migrator host that contains the Hive Migrator service.
Optional parameters
--ssl
Enter this parameter if you want to establish a TLS connection to Hive Migrator.--port
The Hive Migrator service port to connect on (default is6780
).--timeout
Define the connection timeout in milliseconds. Set this parameter to override the default connection timeout of 5 minutes (300000ms).--user
The username to use for authenticating to the Hive Migrator service. Used only when Hive Migrator has basic authentication enabled. You will still be prompted to enter the user password.
Example
connect hivemigrator --host localhost --port 6780
Email notifications subscription commands
notification email addresses add
Add email addresses to the subscription list for email notifications.
notification email addresses add [--addresses]
Mandatory parameters
--addresses
A comma-separated lists of email addresses to be added.
Example
notification email addresses add --addresses myemail@company.org,personalemail@gmail.com
notification email addresses remove
Remove email addresses from the subscription list for email notifications.
notification email addresses remove [--addresses]
Mandatory parameters
--addresses
A comma-separated lists of email addresses to be removed. Use auto-completion to quickly select from subscribed emails.
Example
notification email addresses remove --addresses myemail@company.org,personalemail@gmail.com
notification email smtp set
Configure the details of an SMTP server for Data Migrator to connect to.
notification email smtp set [--host] string
[--port] integer
[--security] security-enum
[--email] string
[--login] string
[--password] string
[--subject-prefix] string
Mandatory parameters
--host
The host address of the SMTP server.--port
The port to connect to the SMTP server. Many SMTP servers use port 25.--security
The type of security the server uses. Available options:NONE
,SSL
,STARTLS_ENABLED
,STARTTLS_REQUIRED
, orTLS
.--email
The email address for Data Migrator to use with emails sent through the SMTP server. This address will be the sender of all configured email notifications.
Optional parameters
--login
The username to authenticate with the SMTP server.--password
The password to authenticate with the SMTP server sign-in. Required if you sign in.--subject-prefix
Set an email subject prefix to help identify and filter Data Migrator notifications.
Example
notification email smtp set --host my.internal.host --port 587 --security TLS --email livedatamigrator@wandisco.com --login myusername --password mypassword
notification email smtp show
Display the details of the SMTP server Data Migrator is configured to use.
notification email smtp show
notification email subscriptions show
Show a list of currently subscribed emails and notifications.
notification email subscriptions show
notification email types add
Add notification types to the email notification subscription list.
See the output from the command notification email types show
for a list of all currently available notification types.
notification email types add [--types]
Mandatory parameters
--types
A comma-separated list of notification types to subscribe to.
Example
notification email types add MISSING_EVENTS,EVENTS_BEHIND,MIGRATION_AUTO_STOPPED
notification email types remove
Remove notification types from the email notification subscription list.
notification email types remove [--types]
Mandatory parameters
--types
A comma-separated list of notification types to unsubscribe from.
Example
notification email types remove MISSING_EVENTS,EVENTS_BEHIND,MIGRATION_AUTO_STOPPED
notification email types show
Return a list of all available notification types to subscribe to.
notification email types show
Hive Backup Commands
hive backup add
Immediately create a metadata backup file.
hive backup add
hive backup config show
Show the current metadata backup configuration.
hive backup config show
hive backup list
List all existing metadata backup files.
hive backup list
hive backup restore
Restore from a specified metadata backup file.
hive backup restore --name string
hive backup schedule configure
Configure a backup schedule for metadata migrations.
hive backup schedule configure --period-minutes 10 --enable
{
"enabled": true,
"periodMinutes": 10
}
hive backup schedule show
Show the current metadata backup schedule.
hive backup schedule show
{
"enabled": true,
"periodMinutes": 10
}
hive backup show
Show a specified metadata backup file.
hive backup show --name string
Hive configuration commands
hive config certificate generate
hive config certificate generate
hive config certificate upload
hive config certificate upload [--path-mapping-id] string
[--private-key] file
[--certificate] file
[--trusted-certificate] file
Mandatory parameters
--private-key
Client private key used to establish a TLS connection to the remote agent.--certificate
Client certificate used to establish a TLS connection to the remote agent.--trusted-certificate
Trusted certificate used to establish a TLS connection to the remote agent.
Hive rule configuration commands
hive rule add
,hive rule create
Create a Hive migration rule that is used to define which databases and tables are migrated.
Enter these rules when starting a new migration to control which databases and tables are migrated.
hive rule add [--database-pattern] string
[--table-pattern] string
[--name] string
ALSO KNOWN AS
hive rule create
Mandatory parameters
--database-pattern
Enter a Hive DDL pattern that will match the database names you want to migrate.--table-pattern
Enter a Hive DDL pattern that will match the table names you want to migrate.
You can use a single asterisk (*
) if you want to match all databases and/or all tables within the Metastore/database.
Optional parameters
--name
The name for the Hive rule.
Example
hive rule add --name test_databases --database-pattern test* --table-pattern *
hive rule configure
Change the parameters of an existing Hive rule.
The parameters that can be changed are the same as the ones listed in the hive rule add
,hive rule create
section.
All parameters are optional except --name
, which is required to enter the existing Hive rule that you wish to configure.
Example
hive rule configure --name test_databases --database-pattern test_db*
hive rule delete
hive rule delete [--name] string
Example
hive rule delete --name test_databases
hive rule list
hive rule list
hive rule show
hive rule show [--name] string
Example
hive rule show --name test_databases
Hive show commands
hive show conf
hive show conf [--parameter] string
[--agent-name] string
Hive show configuration parameters
--agent-name
The name of the agent.--parameter
The configuration parameter/property that you want to show the value of.
Example
hive show conf --agent-name sourceAgent --parameter hive.metastore.uris
hive show database
hive show database [--database] string
[--agent-name] string
Hive show database parameters
--database
The database name. If not specified, the default will bedefault
.--agent-name
The name of the agent.
Example
hive show database --agent-name sourceAgent --database mydb01
hive show databases
hive show databases [--like] string
[--agent-name] string
Hive show databases parameters
--like
The Hive DDL pattern to use to match the database names (for example:testdb*
will match any database name that begins with "testdb").--agent-name
The name of the agent.
Example
hive show database --agent-name sourceAgent --like testdb*
hive show indexes
hive show indexes [--database] string
[--table] string
[--agent-name] string
Hive show indexes parameters
--database
The database name.--table
The table name.--agent-name
The name of the agent.
Example
hive show indexes --agent-name sourceAgent --database mydb01 --table mytbl01
hive show partitions
hive show partitions [--database] string
[--table] string
[--agent-name] string
Hive show partitions parameters
--database
The database name.--table
The table name.--agent-name
The name of the agent.
Example
hive show partitions --agent-name sourceAgent --database mydb01 --table mytbl01
hive show table
hive show table [--database] string
[--table] string
[--agent-name] string
Hive show table parameters
--database
The database name where the table is located.--table
The table name.--agent-name
The name of the agent.
Example
hive show table --agent-name sourceAgent --database mydb01 --table mytbl01
hive show tables
hive show tables [[--like] string] [[--database] string] [[--agent-name] string]
Hive show tables parameters
--like
The Hive DDL pattern to use to match the table names (for example:testtbl*
will match any table name that begins with "testtbl").--database
Database name. Defaults to default if not set.--agent-name
The name of the agent.
Example
hive show tables --agent-name sourceAgent --database mydb01 --like testtbl*
License manipulation commands
license show
license show [--full]
license upload
license upload [--path] string
Example
license upload --path /user/hdfs/license.key
Notification commands
notification latest
notification latest
notification list
notification list [--count] integer
[--since] string
[--type] string
[--exclude-resolved]
[--level] string
Optional parameters
--count
The number of notifications to return.--since
Return notifications created after this date/time.--type
The type of notification to return e.g. LicenseExceptionNotification.--exclude-resolved
Exclude resolved notifications.--level
The level of notification to return.
notification show
notification show [--notification-id] string
Mandatory parameters
--notification-id
The id of the notification to return.
Source commands
source clear
Clear all information that Data Migrator maintains about the source filesystem by issuing the source clear
command. This will allow you to define an alternative source to one previously defined or detected automatically.
source clear
source delete
Use source delete
to delete information about a specific source by ID. You can obtain the ID for a source filesystem with the output of the source show
command.
source delete [--file-system-id] string
Mandatory parameters
--file-system-id
The ID of the source filesystem resource you want to delete. In the UI, this is called Display Name.
Example
source delete --file-system-id auto-discovered-source-hdfs
source show
Get information about the source filesystem configuration.
source show [--detailed]
Optional parameters
---detailed
Include all configuration properties for the source filesystem in the response.