Command Reference
Looking to learn more about LiveData Migrator commands? This reference page includes a comprehensive description of each command available from the LiveData Migrator CLI.
Each command description includes the information available from the help
command. Tab-completion will also give you guidance when entering commands on the available options and help auto-complete the needed values.
tip
You can also find information about UI configuration items here. Look for them in their equivalent CLI mandatory and optional parameters.
#
Built-in commandsThe built-in commands are always available in a LiveData Migrator command line interactive session. They are unrelated to migration resources and operation (other than exit
/quit
), but help you to interact with LiveData Migrator and automate processing through scripts for the action prompt.
#
Source Commandssource clear
#
Clear all information that LiveData Migrator maintains about the source file system by issuing the source clear
command. This will allow you to define an alternative source to one previously defined or detected automatically.
SYNOPSYS source clear
source delete
#
Use source delete
to delete information about a specific source file system by identifier. You can obtain the identifier for a source file system with the output of the source fs show
command.
SYNOPSYS source delete [--file-system-id] string
#
Mandatory Parameters--file-system-id
The identifier of the source file system resource to delete. This is referenced in the UI as Storage Name.
#
Examplesource delete --file-system-id auto-discovered-source-hdfs
source fs show
#
Get information about the source file system configuration.
SYNOPSYS source fs show [--detailed]
#
Optional Parameters---detailed
Include all configuration properties for the source file system in the response.
#
File System Commandsfilesystem add adls2 oauth
#
Add an Azure Data Lake Storage Gen 2 container as a migration target using the filesystem add adls2 oauth
command, which requires a service principal and OAuth 2 credentials.
note
The service principal that you want to use must have the Storage Blob Data Owner role assigned to the ADLS Gen2 storage account. See the Microsoft documentation for further guidance.
SYNOPSYS filesystem add adls2 oauth [--file-system-id] string [--storage-account-name] string [--oauth2-client-id] string [--oauth2-client-secret] string [--oauth2-client-endpoint] string [--container-name] string [--insecure] [[--properties-files] list] [[--properties] string]
#
Mandatory Parameters--file-system-id
The identifier to give the new file system resource. This is referenced in the UI as Storage Name.--storage-account-name
The name of the ADLS Gen 2 storage account to target. This is referenced in the UI as Account Name.--oauth2-client-id
The client ID (also known as application ID) for your Azure service principal. This is referenced in the UI as Client ID.--oauth2-client-secret
The client secret (also known as application secret) for the Azure service principal. This is referenced in the UI as Secret.--oauth2-client-endpoint
The client endpoint for the Azure service principal. This is referenced in the UI as Endpoint.
This will often take the form ofhttps://login.microsoftonline.com/{tenant}/oauth2/v2.0/token
where{tenant}
is the directory ID for the Azure service principal. You can specify a custom URL if desired (such as a proxy endpoint that manually interfaces with Azure Active Directory).--container-name
The name of the container in the storage account to which content will be migrated. This is referenced in the UI as Container Name.
#
Optional Parameters--insecure
When provided, LiveData Migrator will not use TLS to encrypt communication with ADLS Gen 2. This may improve throughput, but should only be used when you have other means of securing communication. This is referenced in the UI when Use Secure Protocol is unchecked.--properties-files
Reference a list of existing properties files, each containing Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
.--properties
Specify properties to use in a comma-separated key/value list.
#
Examplefilesystem add adls2 oauth --file-system-id mytarget --storage-account-name myadls2 --oauth2-client-id b67f67ex-ampl-e2eb-bd6d-client9385id --oauth2-client-secret 2IPO8*secretk-9OPs8n*TexampleHJ= --oauth2-client-endpoint https://login.microsoftonline.com/78u098ex-ampl-e498-8bce-ndpoint5f2e5/oauth2/v2.0/token --container-name lm2target
filesystem add adls2 sharedKey
#
Add an Azure Data Lake Storage Gen 2 container as a migration target using the filesystem add adls2 sharedKey
command, which requires credentials in the form of an account key.
SYNOPSYS filesystem add adls2 sharedKey [--file-system-id] string [--storage-account-name] string [--shared-key] string [--container-name] string [--insecure] [[--properties-files] list] [[--properties] string]
#
Mandatory Parameters--file-system-id
The identifier to give the new file system resource. This is referenced in the UI as Storage Name.--storage-account-name
The name of the ADLS Gen 2 storage account to target. This is referenced in the UI as Account Name.--shared-key
The shared account key to use as credentials to write to the storage account. This is referenced in the UI as Access Key.--container-name
The name of the container in the storage account to which content will be migrated. This is referenced in the UI as Container Name.
#
Optional Parameters--insecure
When provided, LiveData Migrator will not use TLS to encrypt communication with ADLS Gen 2. This may improve throughput, but should only be used when you have other means of securing communication. This is referenced in the UI when Use Secure Protocol is unchecked.--properties-files
Reference a list of existing properties files, each containing Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
.--properties
Specify properties to use in a comma-separated key/value list.
#
Examplefilesystem add adls2 sharedKey --file-system-id mytarget --storage-account-name myadls2 --container-name lm2target --shared-key Yi8NxHGqoQ79DBGLVn+COK/sRDwbNqAEXAMPLEDaMxRkvXt2ijUtASHAREDj/vaS/NbzR5rtjEKEY31eIopUVA==
filesystem add gcs
#
Add a Google Cloud Storage as a migration target using the filesystem add gcs
command, which requires credentials in the form of an account key file.
SYNOPSYS filesystem add gcs [--file-system-id] string [[--service-account-json-key-file] string] [[--service-account-p12-key-file] string] [[--service-account-json-key-file-server-location] string] [[--service-account-p12-key-file-server-location] string] [[--service-account-email] string] [--bucket-name] string [[--properties-files] list] [[--properties] string]
#
Mandatory Parameters--file-system-id
The identifier to give the new file system resource. This is referenced in the UI as Storage Name.--bucket-name
The bucket name of a Google Cloud Storage account. This is referenced in the UI as Bucket Name.
#
Service account key parametersinfo
Provide your service account key for the GCS bucket by choosing one of the parameters below.
You can also upload the service account key directly when using the UI (this is not supported through the CLI).
--service-account-json-key-file-server-location
The absolute filesystem path on the LiveData Migrator server of your service account key file in JSON format. You can either create a GCS service account key or use an existing one.
This is referenced in the UI as Key File when the Key File Options -> Provide a Path option is selected.--service-account-p12-key-file-server-location
The absolute filesystem path on the LiveData Migrator server of your service account key file in P12 format. You can either create a GCS service account key or use an existing one.
This is referenced in the UI as Key File when the Key File Options -> Provide a Path option is selected.--service-account-json-key-file
The absolute filesystem path on the host running the LiveData Migrator CLI of your service account key file in JSON format. Only use this parameter if you are running the LiveData Migrator CLI on a different host to your LiveData Migrator server.--service-account-p12-key-file
The absolute filesystem path on the host running the LiveData Migrator CLI of your service account key file in P12 format. Only use this parameter if you are running the LiveData Migrator CLI on a different host to your LiveData Migrator server.
#
Optional Parameters--service-account-email
The email address linked to your GCS service account. This is referenced in the UI as Email address and is required when selecting the Upload P12 Key File option.--properties-files
Reference a list of existing properties files, each containing Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
.--properties
Specify properties to use in a comma-separated key/value list.
#
Examplefilesystem add gcs --file-system-id gcsAgent --bucket-name myGcsBucket --service-account-p12-key-file-server-location /user/hdfs/targetStorage/myAccountKey.p12 --service-account-email user@mydomain.com
filesystem add hdfs
#
Add a Hadoop Distributed File System (HDFS) as either a migration source or target using the filesystem add hdfs
command.
Creating a HDFS resource with this command will normally only be used when migrating to a target HDFS storage (rather than another storage service like ADLS Gen 2 or S3a). LiveData Migrator will attempt to auto-detect the source HDFS when started from the command line unless Kerberos is enabled on your source environment.
If Kerberos is enabled on your source environment, use the filesystem auto-discover-source hdfs
command to provide Kerberos credentials and auto-discover your source HDFS configuration.
SYNOPSYS filesystem add hdfs [--file-system-id] string [[--default-fs] string] [[--user] string] [[--kerberos-principal] string] [[--kerberos-keytab] string] [--source] [--scan-only] [[--properties-files] list] [[--properties] string]
#
Mandatory Parameters--file-system-id
The identifier to give the new file system resource. This is referenced in the UI as Storage Name.--default-fs
A string that defines how LiveData Migrator accesses HDFS. This is referenced in the UI as Default FS.
It can be specified in a number of forms:- As a single HDFS URI, such as
hdfs://192.168.1.10:8020
(using an IP address) orhdfs://myhost.localdomain:8020
(using a hostname). - As an HDFS URI that references a nameservice ID defined in the cluster properties, like
hdfs://mynameservice
, where there is a configuration property for the cluster that defines the value of thedfs.nameservices
value to include that nameservice ID, likemynameservice
and all required configuration properties for that nameservice, likedfs.ha.namenodes.mynameservice
,dfs.namenode.rpc-address.mynameservice.nn1
, anddfs.namenode.http-address.mynameservice.nn1
, etc.
- As a single HDFS URI, such as
#
Optional ParametersKerberos: Cross-realm authentication required between source and target HDFS
Cross-realm authentication is required in the following scenario:
- Migration will occur between a source and target HDFS.
- Kerberos is enabled on both clusters.
See the links below for guidance for common Hadoop distributions:
--user
The name of the HDFS user to be used when performing operations against the file system. In environments where Kerberos is disabled, this user must be the HDFS super user, such ashdfs
.--kerberos-principal
The Kerberos principal to authenticate with and perform migrations as. This principal should map to the HDFS super user using auth_to_local rules.--kerberos-keytab
The Kerberos keytab containing the principal defined for the--kerberos-principal
parameter. This must be accessible to the local system user running the LiveData Migrator service (default ishdfs
).--source
Provide this parameter to use the file system resource created as a source. This is referenced in the UI when configuring the Unknown source.--scan-only
Provide this parameter to create a static source filesystem for use in one-time migrations. Requires--source
.--properties-files
Reference a list of existing properties files that contain Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
. This is referenced in the UI as Provide a path to files under the Additional Configuration option.--properties
Specify properties to use in a comma-separated key/value list. This is referenced in the UI as Additional Configuration under the Additional Configuration option.
#
Properties files are required for NameNode HAIf your Hadoop cluster has NameNode HA enabled, you must provide the local filesystem path to the properties files that define the configuration for the nameservice ID.
Source HDFS filesystem: These configuration files will likely be in a default location depending on the distribution of the Hadoop cluster.
Target HDFS filesystem: Ensure that the target Hadoop cluster configuration is available on your LiveData Migrator host's local filesystem.
For the UI, use Provide a path to files under the Additional Configuration option and define the directory containing the
core-site.xml
andhdfs-site.xml
files.Example for path containing source cluster configuration/etc/hadoop/conf
Example for path containing target cluster configuration/etc/targetClusterConfig
Alternatively, define the absolute filesystem paths to these files:
Example for absolute paths to source cluster configuration files/etc/hadoop/conf/core-site.xml/etc/hadoop/conf/hdfs-site.xml
Example for absolute paths to target cluster configuration files/etc/targetClusterConfig/core-site.xml/etc/targetClusterConfig/hdfs-site.xml
For the CLI/API, use the
--properties-files
parameter and define the absolute paths to thecore-site.xml
andhdfs-site.xml
files (see the Examples section for CLI usage of this parameter).
#
Examples#
HDFS as sourcefilesystem add hdfs --file-system-id mysource --source --default-fs hdfs://sourcenameservice --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
filesystem add hdfs --file-system-id mysource --source --default-fs hdfs://sourcenameservice --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal hdfs@SOURCEREALM.COM
#
HDFS as targetnote
When specifying a HDFS filesystem as a target, the property files for the target cluster must exist on the local filesystem and be accessible to the LiveData Migrator system user.
filesystem add hdfs --file-system-id mytarget --default-fs hdfs://targetnameservice --properties-files /etc/targetClusterConfig/core-site.xml,/etc/targetClusterConfig/hdfs-site.xml --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal hdfs@SOURCEREALM.COM
filesystem add hdfs --file-system-id mytarget --default-fs hdfs://namenode.targetdomain:8020 --user hdfs
filesystem add local
#
Add a Hadoop Compatible local filesystem as either a migration target or source using the filesystem add local
command.
SYNOPSYS filesystem add local [--file-system-id] string [[--fs-root] string] [--source] [--scan-only] [[--properties-files] list] [[--properties] string]
#
Mandatory Parameters--file-system-id
The identifier to give the new file system resource. This is referenced in the UI as Storage Name.
#
Optional Parameters--fs-root
The directory in the local filesystem to scan for data or send data to, depending on whether the filesystem is defined as a source or a target. Should be supplied using the full directory path from the root.--source
Provide this parameter to use the file system resource created as a source. This is referenced in the UI when configuring the Unknown source.--scan-only
Provide this parameter to create a static source filesytem for use in one-time migrations. Requires--source
.--properties-files
Reference a list of existing properties files, each containing Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
.--properties
Specify properties to use in a comma-separated key/value list.
note
If no fs-root
is specified, the file path will default to the root of your system.
#
Examples#
Local filesystem as sourcefilesystem add local --file-system-id mytarget --fs-root ./tmp --source
#
Local filesystem as a targetfilesystem add local --file-system-id mytarget --fs-root ./Users/username/destinationfolder/
filesystem add s3a
#
Add an Amazon Simple Storage Service (Amazon S3) bucket as a target filesystem using the filesystem add s3a
command. This method also supports IBM COS buckets.
SYNOPSYS filesystem add s3a [--file-system-id] string [--bucket-name] string [[--endpoint] string] [[--access-key] string] [[--secret-key] string] [--credentials-provider] string [--source] [--scan-only] [[--properties-files] list] [[--properties] list]
For guidance about access, permissions, and security when adding an Amazon S3 bucket as a target filesystem, see Security best practices in IAM.
#
S3a Mandatory Parameters--file-system-id
The identifier for the new file system resource. This is referenced in the UI as Storage Name.--bucket-name
The name of your Amazon S3 bucket. This is referenced in the UI as Bucket Name.--credentials-provider
The Java class name of a credentials provider for authenticating with the Amazon S3 endpoint. This is referenced in the UI as Credentials Provider. This is not a required parameter when adding an IBM COS bucket through the UI.
The Provider options available include:org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
Use this provider to offer credentials as an access key and secret access key with the
--access-key
and--secret-key
Parameters.com.amazonaws.auth.InstanceProfileCredentialsProvider
Use this provider when running LiveData Migrator on an Elastic Compute Cloud (EC2) instance that has been assigned an IAM role with policies that allow it to access the Amazon S3 bucket.
com.amazonaws.auth.DefaultAWSCredentialsProviderChain
A commonly-used credentials provider chain that looks for credentials in this order:
- Environment Variables -
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
, orAWS_ACCESS_KEY
andAWS_SECRET_KEY
. - Java System Properties -
aws.accessKeyId
andaws.secretKey
. - Web Identity Token credentials from the environment or container.
- Credential profiles file at the default location (
~/.aws/credentials
) shared by all AWS SDKs and the AWS CLI. - Credentials delivered through the Amazon EC2 container service if
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
environment variable is set and security manager has permission to access the variable. - Instance profile credentials delivered through the Amazon EC2 metadata service.
- Environment Variables -
Endpoint (UI & IBM COS only): This is required when adding an IBM COS bucket. IBM provide a list of available endpoints that can be found in their public documentation.
#
S3a Optional Parameters--access-key
When using theorg.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
credentials provider, specify the access key with this parameter. This is referenced in the UI as Access Key. This is a required parameter when adding an IBM COS bucket.--secret-key
When using theorg.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
credentials provider, specify the secret key using this parameter. This is referenced in the UI as Secret Key. This is a required parameter when adding an IBM COS bucket.--endpoint
(S3 as a target only) Provide a specific endpoint to access the S3 bucket such as an AWS PrivateLink endpoint (for example:vpce-0e25b8cdd720f900e-argc85vg.s3.us-east-1.vpce.amazonaws.com
). When using this parameter, do not use thefs.s3a.endpoint
property as an additional custom property as this supersedes it. This is referenced in the UI as Use AWS PrivateLink -> PrivateLink VPC.--source
(Preview) Provide this parameter to use the file system resource created as a source. This is referenced in the UI when configuring the Unknown source.--scan-only
Provide this parameter to create a static source filesystem for use in one-time migrations. Requires--source
.--properties-files
Reference a list of existing properties files, each containing Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
.--properties
Specify properties to use in a comma-separated key/value list. This is referenced in the UI as S3A Properties (see S3a Default Properties and S3a Custom Properties for more information).
note
Amazon S3a as a source is currently a preview feature.
#
S3a Default PropertiesThese properties are defined by default when adding an S3a filesystem.
fs.s3a.impl
(defaultorg.apache.hadoop.fs.s3a.S3AFileSystem
): The implementation class of the S3a Filesystem.fs.AbstractFileSystem.s3a.impl
(defaultorg.apache.hadoop.fs.s3a.S3A
): The implementation class of the S3a AbstractFileSystem.fs.s3a.user.agent.prefix
(defaultAPN/1.0 WANdisco/1.0 LiveDataMigrator/1.11.6
): Sets a custom value that will be pre-pended to the User-Agent header sent in HTTP requests to the S3 back-end by S3aFileSystem.fs.s3a.impl.disable.cache
(defaulttrue
): Disables the S3 file system cache when set to 'true'.hadoop.tmp.dir
(defaulttmp
): The parent directory for other temporary directories.fs.s3a.connection.maximum
(default120
) Defines the maximum number of simultaneous connections to the S3 filesystem.fs.s3a.threads.max
(default100
): Defines the total number of threads to make available in the filesystem for data uploads or any other queued filesystem operation.fs.s3a.max.total.tasks
(default60
): Defines the number of operations which can be queued for execution at a time.
#
S3a Custom PropertiesThese are some of the additional properties that can be added when creating an S3a filesystem.
fs.s3a.fast.upload.buffer
(defaultdisk
): Defines how the filesystem will buffer the upload.fs.s3a.fast.upload.active.blocks
(default8
): Defines how many blocks a single output stream can have uploading or queued at a given time.fs.s3a.block.size
(default32M
): Defines the maximum size of blocks during file transfer. Use the suffixK
,M
,G
,T
orP
to scale the value in Kilobytes, Megabytes, Gigabytes, Terabytes or Petabytes respectively.fs.s3a.buffer.dir
(defaulttmp
): Defines the directory used by disk buffering.
Find an additional list of S3a properties in the S3a documentation.
#
Upload BufferingMigrations using an S3A target destination will buffer all uploads. By default, the buffering will occur on the local disk of system LiveData Migrator is running on, in the /tmp
directory.
LiveData Migrator will automatically delete the temporary buffering files once they are no longer needed.
If you want to use a different type of buffering, you can change the property fs.s3a.fast.upload.buffer
. The following values can be supplied:
Buffering Option | Details | Property Value |
---|---|---|
Array Buffer | Buffers the uploaded data in memory instead of on disk, using the Java heap. | array |
Byte Buffer | Buffers the uploaded data in memory instead of on disk, but does not use the Java heap. | bytebuffer |
Disk Buffering | The default option. Buffers the upload to disk. | disk |
Both the array
and bytebuffer
options may lead to the consumption of large amounts of memory. Other properties (such as fs.s3a.fast.upload.active.blocks
) may be used to fine-tune the migration to avoid issues.
note
If you run out of disk space on which to buffer the migration, the migration will stall with a series of errors. To avoid this, ensure the file system containing the directory used for buffering (/tmp
by default) has enough remaining space to facilitate the transfer.
#
Examplefilesystem add s3a --file-system-id mytarget --bucket-name mybucket1 --credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider --access-key B6ZEXAMPLEACCESSKEYA --secret-key OP87ChoExampleSecretKeyI904lT7AaDBGJpp0D
filesystem auto-discover-source hdfs
#
Discover your local HDFS filesystem by specifying the Kerberos credentials for your source environment.
You can also manually configure the source HDFS filesystem using the filesystem add hdfs
command.
SYNOPSYS filesystem auto-discover-source hdfs [[--kerberos-principal] string] [[--kerberos-keytab] string]
#
Kerberos parameters--kerberos-principal
The Kerberos principal to authenticate with and perform migrations as. This principal should map to the HDFS super user using auth_to_local rules.--kerberos-keytab
The Kerberos keytab containing the principal defined for the--kerberos-principal
parameter. This must be accessible to the local system user running the LiveData Migrator service (default ishdfs
).
#
Examplefilesystem auto-discover-source hdfs --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal hdfs@REALM.COM
filesystem clear
#
Delete all target file system references with the filesystem clear
. This leaves any migrated content intact in those targets, but removes all resources that act as references to the target file systems.
NAME filesystem clear - Delete all targets.
SYNOPSYS filesystem clear
filesystem del
#
Delete a specific file system resource by identifier. This leaves all migrated content intact at that target, but removes the resource that acts as a reference to that file system.
SYNOPSYS filesystem delete [--file-system-id] string
#
Mandatory Parameters--file-system-id
The identifier of the file system resource to delete. This is referenced in the UI as Storage Name.
#
Examplefilesystem delete --file-system-id mytarget
filesystem list
#
List defined file system resources.
SYNOPSYS filesystem list [--detailed]
#
Mandatory Parameters--detailed
Include all properties for each file system in the JSON result.
filesystem show
#
View details for a file system resource.
SYNOPSYS filesystem show [--file-system-id] string [--detailed]
#
Mandatory Parameters--file-system-id
The identifier of the file system resource to show. This is referenced in the UI as Storage Name.
#
Examplefilesystem show --file-system-id mytarget
filesystem types
#
View information about the file system types available for use with LiveData Migrator. File systems that provide an eventListenerType
other than no-op
can be used in migrations that will migrate ongoing changes during operation.
SYNOPSYS filesystem types
filesystem update adls2 oauth
#
Update an existing Azure Data Lake Storage Gen 2 container migration target with a specified filesystem ID using the filesystem update adls2 oauth
command. You will be prompted to optionally update the service principal and OAuth 2 credentials.
Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add adls2 oauth
section.
All parameters are optional except --file-system-id
, which specifies the file system you want to update.
#
Examplefilesystem update adls2 oauth --file-system-id mytarget --storage-account-name myadls2 --oauth2-client-id b67f67ex-ampl-e2eb-bd6d-client9385id --oauth2-client-secret 2IPO8*secretk-9OPs8n*TexampleHJ= --oauth2-client-endpoint https://login.microsoftonline.com/78u098ex-ampl-e498-8bce-ndpoint5f2e5/oauth2/v2.0/token --container-name lm2target
filesystem update adls2 sharedKey
#
Update an existing Azure Data Lake Storage Gen 2 container migration target using the filesystem update adls2 sharedKey
command. You will be prompted to optionally update the secret key.
Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add adls2 sharedKey
section.
All parameters are optional except --file-system-id
, which specifies the file system you want to update.
#
Examplefilesystem update adls2 sharedKey --file-system-id mytarget --storage-account-name myadls2 --container-name lm2target --shared-key Yi8NxHGqoQ79DBGLVn+COK/sRDwbNqAEXAMPLEDaMxRkvXt2ijUtASHAREDj/vaS/NbzR5rtjEKEY31eIopUVA==
filesystem update gcs
#
Update a Google Cloud Storage migration target using the filesystem update gcs
command.
Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add gcs
section.
All parameters are optional except --file-system-id
, which specifies the file system you want to update.
#
Examplefilesystem update gcs --file-system-id gcsAgent --bucket-name myGcsBucket --service-account-p12-key-file-server-location /user/hdfs/targetStorage/myAccountKey.p12 --service-account-email user@mydomain.com
filesystem update hdfs
#
Update either a source or target Hadoop Distributed File System using the filesystem update hdfs
command.
Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add hdfs
section.
All parameters are optional except --file-system-id
, which specifies the file system you want to update.
#
Examplesfilesystem update hdfs --file-system-id mysource --default-fs hdfs://sourcenameservice --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
filesystem update hdfs --file-system-id mytarget --default-fs hdfs://sourcenameservice --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal hdfs@SOURCEREALM.COM
filesystem update local
#
Update a target or source local filesystem using the filesystem update local
command.
Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add local
section.
All parameters are optional except --file-system-id
, which specifies the file system you want to update.
#
Examplefilesystem update local --file-system-id mytarget --fs-root ./tmp
filesystem update s3a
#
Update an S3 bucket target file system using the filesystem update s3a
command. This method also supports IBM COS buckets.
Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add s3a
section.
All parameters are optional except --file-system-id
, which specifies the file system you want to update.
#
Examplefilesystem update s3a --file-system-id mytarget --bucket-name mybucket1 --credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider --access-key B6ZEXAMPLEACCESSKEYA --secret-key OP87ChoExampleSecretKeyI904lT7AaDBGJpp0D
#
Exclusion Commandsexclusion add date
#
Create a date-based exclusion that checks the 'modified date' of any directory or file that the LiveData Migrator encounters during a migration to which the exclusion has been applied. If the path or file being examined by LiveData Migrator has a 'modified date' earlier than the specified date, it will be excluded from the migration.
Once associated with a migration using migration exclusion add
, files that match the policy will not be migrated.
SYNOPSYS exclusion add date [--exclusion-id] string [--description] string [--before-date] string
#
Mandatory Parameters--exclusion-id
The identifier for the exclusion policy. This is referenced in the UI as Name.--description
A user-friendly description for the policy. This is referenced in the UI as Description.--before-date
An ISO formatted date and time, which can include an offset for a particular time zone. This is referenced in the UI as TBA.
#
Exampleexclusion add date --exclusion-id beforeDate --description "Files earlier than 2020-10-01T10:00:00PDT" --before-date 2020-10-01T10:00:00-07:00
exclusion add file-size
#
Create an exclusion that can be applied to migrations to constrain the files transferred by a policy based on file size. Once associated with a migration using migration exclusion add
, files that match the policy will not be migrated.
SYNOPSYS exclusion add file-size [--exclusion-id] string [--description] string [--value] long [--unit] string
#
Mandatory Parameters--exclusion-id
The identifier for the exclusion policy. This is referenced in the UI as Name.--description
A user-friendly description for the policy. This is referenced in the UI as Description.--value
The numerical value for the file size, in a unit defined by. This is referenced in the UI as Value.--unit
A string to define the unit used, eitherB
for bytes,GB
for gibibytes,KB
for kibibytes,MB
for mebibytes,PB
for pebibytes, orTB
for tebibytes.
#
Exampleexclusion add file-size --exclusion-id 100mbfiles --description "Files greater than 100 MB" --value 100 --unit MB
exclusion add regex
#
Create an exclusion using a regular expression to prevent certain files and directories being transferred based on matching file or directory names. Once associated with a migration using migration exclusion add
, files and directories that match the regular expression will not be migrated.
SYNOPSYS exclusion add regex [--exclusion-id] string [--description] string [--regex] string [[--type] string]
#
Mandatory Parameters--exclusion-id
The identifier for the exclusion policy. This is referenced in the UI as Name.--description
A user-friendly description for the policy. This is referenced in the UI as Description.--regex
A regular expression in a syntax of either Java PCRE, Automata or GLOB type. This is referenced in the UI as Regex.
#
Optional Parameters--type
Choose the regular expression syntax type. There are three options available:JAVA_PCRE
(default)AUTOMATA
GLOB
#
Examplesexclusion add regex --description "No paths or files that start with test" --exclusion-id exclusion1 --type GLOB --regex test*
exclusion add regex --description "No paths of files that start with test" --exclusion-id exclusion1 --regex ^test\.*
--regex
parameter#
Using backslash characters within If you wish to use a \
character as part of your regex value, you must escape this character with an additional backslash.
exclusion add regex --description "No paths that start with a backslash followed by test" --exclusion-id exclusion2 --regex ^\\test\.*
The response displayed if running through the CLI will not hide the additional backslash. However, the internal representation will be as expected within LiveData Migrator (it will read as ^\test.*
).
This workaround is not required for API inputs, as it only affects the Spring Shell implementation used for the CLI.
exclusion del
#
Delete an exclusion policy so that it is no longer available for migrations.
NAME exclusion delete - Delete an exclusion rule.
SYNOPSYS exclusion delete [--exclusion-id] string
#
Mandatory Parameters--exclusion-id
The identifier for the exclusion policy to delete. This is referenced in the UI as Name.
#
Exampleexclusion delete --exclusion-id exclusion1
exclusion list
#
List all exclusion policies defined.
NAME exclusion list - List all exclusion rules.
SYNOPSYS exclusion list
exclusion show
#
Get details for an individual exclusion policy by identifier.
SYNOPSYS exclusion show [--exclusion-id] string
#
Mandatory Parameters--exclusion-id
The identifier for the exclusion policy to show. This is referenced in the UI as Name.
#
Exampleexclusion show --exclusion-id 100mbfiles
#
Path Mapping Commandspath mapping add
#
Create a path mapping that allows you to define a alternative target path for a specific target filesystem. These will be automatically applied to a new migration.
When path mapping is not used, the source path is created on the target filesystem.
note
Path mappings cannot be applied to existing migrations. Delete and recreate a migration if you want a path mapping to apply.
SYNOPSYS path mapping add [[--path-mapping-id] string] [--source-path] string [--target] string [--target-path] string [--description] string
#
Mandatory Parameters--source-path
The path on the source filesystem.--target
The target filesystem id (value defined for the--file-system-id
parameter).--target-path
The path for the target filesystem.--description
Description of the path mapping enclosed in quotes ("text"
).
#
Optional Parameters--path-mapping-id
An identifier for this path mapping. An identifier will be auto-generated if one is not provided.
#
Examplepath mapping add --path-mapping-id hdp-hdi --source-path /apps/hive/warehouse --target mytarget --target-path /hive/warehouse --description "HDP to HDI - Hive warehouse directory"
path mapping del
#
Delete a path mapping.
note
Deleting a path mapping will not affect any existing migrations that have the path mapping applied. Delete and recreate a migration if you no longer want a previous path mapping to apply.
SYNOPSYS path mapping delete [--path-mapping-id] string
#
Mandatory Parameters--path-mapping-id
The identifier of the path mapping.
#
Examplepath mapping delete --path-mapping-id hdp-hdi
path mapping list
#
List all path mappings.
SYNOPSYS path mapping list [[--target] string]
#
Optional Parameters--target
List path mappings for the specified target filesystem id.
#
Examplespath mapping list --target hdp-hdi
path mapping list --target hdp-hdi
path mapping show
#
Show details of a specified path mapping.
SYNOPSYS path mapping show [--path-mapping-id] string
#
Optional Parameters--path-mapping-id
The identifier of the path mapping.
#
Examplepath mapping show --path-mapping-id hdp-hdi
#
Migration Commandsmigration stop
#
Stop a migration from transferring content to its target, placing it into the STOPPED
state. Stopped migrations can be resumed.
SYNOPSYS migration stop [--name] string
#
Mandatory Parameters--name
The migration name or identifier to stop.
#
Examplemigration stop --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e
migration resume
#
Resume a migration that you've stopped from transferring content to its target.
SYNOPSYS migration resume [--name] string
#
Mandatory Parameters--name
The migration name or identifier to resume.
#
Examplemigration resume --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e
migration delete
#
Delete a stopped migration resource.
SYNOPSYS migration delete [--name] string
#
Mandatory Parameters--name
The migration name or identifier to delete.
#
Examplemigration delete --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e
migration exclusion add
#
Associate an exclusion resource with a migration so that the exclusion policy applies to items processed for the migration. Exclusions must be associated with a migration before they take effect.
SYNOPSYS migration exclusion add [--name] string [--exclusion-id] string
#
Mandatory Parameters--name
The migration name or identifier with which to associate the exclusion.--exclusion-id
The identifier of the exclusion to associate with the migration. This is referenced in the UI as Name.
#
Examplemigration exclusion add --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e --exclusion-id myexclusion1
migration exclusion del
#
Remove an exclusion from association with a migration so that its policy no longer applies to items processed for the migration.
SYNOPSYS migration exclusion delete [--name] string [--exclusion-id] string
#
Mandatory Parameters--name
The migration name or identifier from which to remove the exclusion.--exclusion-id
The identifier of the exclusion to remove from the migration. This is referenced in the UI as Name.
#
Examplemigration exclusion delete --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e --exclusion-id myexclusion1
``
Present the list of all migrations defined.
SYNOPSYS migration list
migration add
#
Create a new migration to initiate data migration from your source file system.
caution
Do not write to target filesystem paths when a migration is underway. This could interfere with LiveData Migrator functionality and lead to undetermined behavior.
Use different filesystem paths when writing to the target storage directly (and not through LiveData Migrator).
SYNOPSYS migration add [[] string] [--path] string [--target] string [[--exclusions] string] [[--action-policy] string] [--auto-start]
#
Mandatory Parameters--path
Defines the source file system directory that is the scope of the migration. All content (other than that excluded) will be migrated to the target. This is referenced in the UI as Path for {source-filesystem}.note
ADLS Gen2 has a filesystem restriction of 60 segments. Make sure your path has less than 60 segments when defining the path string parameter.
--target
Specifies the name of the target file system resource to which migration will occur. This is referenced in the UI as Target.
#
Optional Parameters--name
Provide a name or identifier for the new migration. An identifier will be auto-generated if one is not provided. This is referenced in the UI as Migration Name.--exclusions
A comma-separated list of exclusions by name. This is referenced in the UI as Add new exclusion.--auto-start
Provide this parameter if you want the migration to start immediately. If not provided, the migration will only take effect once run. This is referenced in the UI as Auto-start migration.--action-policy
This parameter determines what happens if the migration encounters content in the target path with the same name and size. This is referenced in the UI as Skip Or Overwrite Settings.
There are two options available:com.wandisco.livemigrator2.migration.OverwriteActionPolicy
(default policy)
Every file is replaced, even if file size is identical on the target storage. This is referenced in the UI as Overwrite.com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy
If the file size is identical between the source and target, the file is skipped. If itโs a different size, the whole file is replaced. This is referenced in the UI as Skip if Size Match.
#
Examplemigration add --path /repl1 --target mytarget โ-migration-id myNewMigration --exclusions 100mbfiles
migration start
#
Start a migration that was created without the --auto-start
parameter.
SYNOPSYS migration start [--name] string
#
Mandatory Parameters--name
The migration name or identifier to run.
#
Examplemigration start โ-migration-id myNewMigration
migration show
#
Provide a JSON description of a specific migration.
SYNOPSYS migration show [--name] string
#
Mandatory Parameters--name
The migration name or identifier to show.
#
Examplemigration show --name myNewMigration
migration pending-region add
#
Add a pending region to a migration.
SYNOPSYS migration pending-region add [--name] string [--path] string
#
Mandatory Parameters--name
The migration name or identifier to add a pending region to.--path
The path string of the region to add for rescan.
#
Examplemigration pending-region add --name myMigration --path etc/files
migration verification add
#
note
Migration verification commands are currently in preview. This feature must be enabled before it can be used.
Add a migration verification for a specified migration. This will scan your source and target filesystems (in the migration path) and compare them for any discrepancies.
The verification status will show the number of missing paths and files on the target filesystem and also the number of file size mismatches between the source and target. The verification status can be viewed by using migration verification show
(for individual verification jobs) or migration verification list
(for all verification jobs).
Once a verification job is complete, a verification report will be created in the /var/log/wandisco/livedata-migrator
directory in the format of verification-report-{verificationId}-{startTime}.log
. This report will contain more details including any paths that have discrepancies.
See migration verifications for more details.
SYNOPSYS migration verification add [--name] string [--override]
#
Mandatory Parameters--name
The migration name or identifier to start (or override) a verification on.
#
Optional Parameters--override
Stop the currently running verification and start a new one.
#
Examplesmigration verification add --name myMigration
migration verification add --name myMigration --override
migration verification list
#
note
Migration verification commands are currently in preview. This feature must be enabled before it can be used.
List all running migration verification jobs and their statuses (use migration verification show
when just wanting the status for one verification job).
SYNOPSYS migration verification list
migration verification show
#
note
Migration verification commands are currently in preview. This feature must be enabled before it can be used.
Show the status of a specific migration verification.
SYNOPSYS migration verification show [--name] string
#
Mandatory Parameters--name
Show the status of the current verification job running on this migration name or identifier (only one verification job can be running per migration).
#
ExampleSee verification status values for further explanation of the output.
WANdisco LiveData Migrator >> migration verification show --name testmig{ "migrationId" : "testmig", "state" : "COMPLETED", "verificationId" : "e1aedfbd-b094-4a1b-a294-69cdd5a6030a", "verificationPath" : "/testdir", "startTime" : "2021-04-29T13:27:44.278Z", "completeTime" : "2021-04-29T13:27:45.392Z", "verificationEdge" : "/testmig/testdir01/testfile01", "scannerSummary" : { "progressSummary" : { "filesScanned" : 177, "directoriesScanned" : 47, "bytesScanned" : 1105391944, "filesExcluded" : 51, "dirsExcluded" : 0, "bytesExcluded" : 0, "baseScanCompletionTime" : "2021-04-29T13:27:45.392Z" }, "contentSummary" : { "byteCount" : 1105391944, "fileCount" : 194, "directoryCount" : 81 } }, "verificationProgress" : { "matchedPathCount" : 224, "totalFailedPathCount" : 0, "targetFilesMissing" : 0, "targetDirectoriesMissing" : 0, "filesizeMismatches" : 0 }}
status
#
Get a text description of the overall status of migrations. Information is provided on the following:
- Total number of migrations defined.
- Average bandwidth being used over 10s, 60s, and 300s intervals.
- Peak bandwidth observed over 300s interval.
- Average file transfer rate per second over 10s, 60s, and 300s intervals.
- Peak file transfer rate per second over a 300s interval.
- List of migrations, including one-time migrations, with source path and migration id, and with current progress broken down by migration state: completed, live, stopped, running and ready.
NAME status - Get migration status.
SYNOPSYS status
#
Optional Parameters--transfers
Displays overall performance information about data transfers across the last 10 seconds, 1 minute and 30 minute intervals.--diagnostics
Returns additional information about your LiveData Migrator instance and its migrations, useful for troubleshooting.--migrations
Displays information about each running migration.--network
Displays file transfer throughput in Gib/s during the last 10 seconds, 1 minute and 30 minutes.
#
ExamplesWANdisco LiveMigrator >> status
Network (10s) (1m) (30m)---------Average Throughput: 10.4 Gib/s 9.7 Gib/s 10.1 Gib/sAverage Files/s: 425 412 403
11 Migrations dd:hh:mm dd:hh:mm-------------Complete: 1 Transferred Excluded Duration /static1 5a93d5 67.1 GiB 2.3 GiB 00:12:34
Live: 3 Transferred Excluded Duration /repl1 9088aa 143.2 GiB 17.3 GiB 00:00:34 /repl_psm1 a4a7e6 423.6 TiB 9.6 GiB 02:05:29 /repl5 ab140d 118.9 GiB 1.2 GiB 00:00:34
Running: 5 Transferred Excluded Duration Remaining /repl123 e3727c 30.3/45.2 GiB 67% 9.8 GiB 00:00:34 00:00:17 /repl2 88e4e7 26.2/32.4 GiB 81% 0.2 GiB 00:01:27 00:00:12 /repl3 372056 4.1/12.5 GiB 33% 1.1 GiB 00:00:25 00:01:05 /repl4 6bc813 10.6/81.7 TiB 8% 12.4 GiB 00:04:21 01:02:43 /replxyz dc33cb 2.5/41.1 GiB 6% 6.5 GiB 01:00:12 07:34:23
Ready: 2 /repl7 070910 543.2 GiB /repltest d05ca0 7.3 GiB
WANdisco LiveMigrator >> status
WANdisco LiveMigrator >> status --transfers
Files (10s) (1m) (30m)
Average Migrated/s: 362 158 4781< 1 KB 14 27 3761< 1 MB 151 82 0< 1 GB 27 1 2< 1 PB 0 0 0< 1 EB 0 0 0
Peak Migrated/s: 505 161 8712< 1 KB 125 48 7761< 1 MB 251 95 4< 1 GB 29 7 3< 1 PB 0 0 0< 1 EB 0 0 0
Average Scanned/s: 550 561 467Average Rescanned/s: 24 45 56Average Excluded/s: 7 7 6
WANdisco LiveMigrator >> status --diagnostics
Uptime: 0 Days 1 Hours 23 Minutes 24 SecondsSystemCpuLoad: 0.1433 ProcessCpuLoad: 0.0081JVM GcCount: 192 GcPauseTime: 36 s (36328 ms)OS Connections: 1, Tx: 0 B, Rx: 0 B, Retransmit: 0Transfer Bytes (10/30/300s): 0.00 Gib/s, 0.00 Gib/s, 0.00 Gib/sTransfer Files (10/30/300s): 0.00/s 0.00/s 0.00/sActive Transfers/pull.threads: 0/100Migrations: 0 RUNNING, 4 LIVE, 0 STOPPEDActions Total: 0, Largest: "testmigration" 0, Peak: "MyMigration" 1PendingRegions Total: 0 Avg: 0, Largest: "MyMigration" 0FailedPaths Total: 0, Largest: "MyMigration" 0File Transfer Retries Total: 0, Largest: "MyMigration" 0Total Excluded Scan files/dirs/bytes: 26, 0, 8.1 MBTotal Iterated Scan files/dirs/bytes: 20082, 9876, 2.7 GBEventsBehind Current/Avg/Max: 0/0/0, RPC Time Avg/Max: 4/8EventsQueued: 0, Total Events Added: 504Transferred File Size Percentiles: 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 BTransferred File Transfer Rates Percentiles per Second: 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 BActive File Size Percentiles: 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 BActive File Transfer Rates Percentiles per Second: 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B
#
Bandwidth Policy Commandsbandwidth policy del
#
Delete the current bandwidth policy and revert back to the default policy (unlimited bandwidth).
SYNOPSYS bandwidth policy del
bandwidth policy set
#
Set the bandwidth policy that will determine how much bandwidth LiveData Migrator can use.
If no policy is defined, the default policy is unlimited bandwidth.
SYNOPSYS bandwidth policy set [--value] long [--unit] string
#
Mandatory Parameters--value
Define the number of byte units.--unit
Define the byte unit to be used.
Decimal units:KB
,MB
,GB
,TB
,PB
,EB
,ZB
,YB
Binary units:KiB
,MiB
,GiB
,TiB
,PiB
,EiB
,ZiB
,YiB
#
Examplesbandwidth policy set --value 10 --unit MB
bandwidth policy set --value 10 --unit MiB
bandwidth policy show
#
Display the current bandwidth policy.
SYNOPSYS bandwidth policy show
#
Hive Agent Commandshive agent add azure
#
Add a local or remote hive agent to connect to an Azure SQL database using the hive agent add azure
command.
If your LiveData Migrator host can communicate directly with the Azure SQL database, then a local hive agent will be sufficient. Otherwise, consider using a remote hive agent.
remote deployments
For a remote hive agent connection, specify a remote host (Azure VM, HDI cluster node) that will be used to communicate with the local LiveData Migrator server (constrained to a user-defined port).
A small service will be deployed on this remote host so that the hive agent can migrate data to and/or from the Azure SQL database.
SYNOPSYS hive agent add azure [[--name] string] [--db-server-name] string [--database-name] string [--database-user] string [--database-password] string [--storage-account] string [--container-name] string [[--root-folder] string] [[--hdi-version] string] [[--insecure] boolean] [[--host] string] [[--port] integer] [--no-ssl] [[--autodeploy] boolean] [[--ssh-user] string] [[--ssh-key] file] [[--ssh-port] int] [--use-sudo] [--ignore-host-checking] [[--file-system-id] string] [[--default-fs-override] string]
#
Mandatory Parametersinfo
The Azure hive agent requires a ADLS Gen2 storage account and container name, this is only for the purposes of generating the correct location for the metadata. The agent will not access the container and data will not be written to it.
--db-server-name
The Azure SQL Database Server name. Only the name given to the server is required, the.database.windows.net
suffix should be omitted. This is referenced in the UI as Azure SQL Server Name.--database-name
The Azure SQL database name. This is referenced in the UI as Azure SQL Database Name.--storage-account
The name of the ADLS Gen 2 storage account. This is referenced in the UI as ADLS Gen2 Storage Account Name.--container-name
The name of the container in the ADLS Gen2 storage account. This is referenced in the UI as ADLS Gen2 Container Name.--hdi-version
The HDI version. This is relevant if you are intending to integrate your SQL server into a HDInsights cluster. This is referenced in the UI as HDI Version.--name
The identifier to give to the new Hive agent. This is referenced in the UI as Name.
Additionally, use only one of the following parameters:
--file-system-id
The name of the filesystem that will be associated with this agent (for example:myadls2storage
). This will ensure any path mappings are correctly linked between the filesystem and the agent. This is referenced in the UI as Filesystem.--default-fs-override
Provide an override for the default filesystem URI instead of a filesystem name (for example:abfss://mycontainer@mystorageaccount.dfs.core.windows.net
). This is referenced in the UI as DefaultFs Override.
#
Optional Parameters--root-folder
The root directory for the Azure metastore, this is used if you're intending to integrate the Azure SQL Database with a HDI cluster. This is referenced in the UI as Root Folder.--insecure
Define an insecure connection (TLS disabled) to the Azure SQL database server (default isfalse
). This is referenced in the UI as Use Secure Protocol.
#
Authentication ParametersChoose one of the authentication methods listed and include the additional parameters required for the chosen method.
--auth-method
The authentication method to use to connect to the Azure SQL server. This is referenced in the UI as Authentication Method.
The following methods can be used:SQL_PASSWORD
- Provide a username and password to access the database. This is referenced in the UI as SQL Password.AD_MSI
- Use a system-assigned or user-assigned managed identity. This is referenced in the UI as Active Directory MSI.
#
Required Parameters for SQL_PASSWORD--database-user
The user name to access the database. This is referenced in the UI as Database Username.--database-password
The user password to access the database. This is referenced in the UI as Database Password.
#
Required Parameters for AD_MSITo use this method, the following pre-requirements must be met:
LiveData Migrator or the remote Azure hive agent must be installed on an Azure resource with the managed identity assigned to it. The host must also have Azure Active Directory authentication enabled.
Your Azure SQL server must be enabled for Azure Active Directory authentication.
You have created a contained user in the Azure SQL database that is mapped to the Azure Active Directory resource (where LiveData Migrator or the remote Azure hive agent is installed).
The username of the contained user will depend on whether you are using a system-assigned or user-assigned identity.
Azure SQL database command for a system-assigned managed identityCREATE USER "<azure_resource_name>" FROM EXTERNAL PROVIDER;ALTER ROLE db_owner ADD MEMBER "<azure_resource_name>";
The
<azure_resource_name>
is the name of the Azure resource where LiveData Migrator or remote Azure hive agent is installed (for example:myAzureVM
).Azure SQL database command for a user-assigned managed identityCREATE USER <managed_identity_name> FROM EXTERNAL PROVIDER;ALTER ROLE db_owner ADD MEMBER <managed_identity_name>;
The
<managed_identity_name>
is the name of the user-assigned managed identity (for example:myManagedIdentity
).
Once all pre-requirements are met, see the system-assigned identity or user-assigned identity parameters.
#
System-assigned identityNo other parameters are required for a system-managed identity.
#
User-assigned identityThe --client-id
parameter must be specified:
--client-id
The Client ID of your Azure managed identity. This is referenced in the UI as MSI Client ID.
#
Parameters for remote hive agents only--host
The host where the remote hive agent will be deployed.--port
The port for the remote hive agent to use on the remote host. This port is used to communicate with the local LiveData Migrator server.--no-ssl
TLS encryption and certificate authentication is enabled by default between LiveData Migrator and the remote agent. Use this parameter to disable it.
#
Parameters for automated deployment--autodeploy
The remote agent will be automatically deployed when this flag is used. If using this, the--ssh-key
parameter must also be specified.--ssh-user
The SSH user to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter).--ssh-key
The absolute path to the SSH private key to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter).--ssh-port
The SSH port to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter). Default is port22
.--use-sudo
All commands performed by the SSH user will usesudo
on the remote host when performing automatic deployment (using the--autodeploy
parameter).--ignore-host-checking
Ignore strict host key checking when performing the automatic deployment (using the--autodeploy
parameter).
#
Steps for manual deploymentIf you do not wish to use the --autodeploy
function, follow these steps to deploy a remote hive agent for Azure SQL manually:
Transfer the remote server installer to your remote host (Azure VM, HDI cluster node):
Example of secure transfer from local to remote hostscp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
On your remote host, run the installer as root (or sudo) user in silent mode:
./hivemigrator-remote-server-installer.sh -- --silent
On your remote host, start the remote server service:
service hivemigrator-remote-server start
On your local host, run the
hive agent add azure
command without using--autodeploy
and its related parameters to configure your remote hive agent.See the Example for remote Azure SQL deployment - manual example below for further guidance.
#
Exampleshive agent add azure --name azureAgent --db-server-name mysqlserver --database-name mydb1 --auth-method SQL_PASSWORD --database-user azureuser --database-password mypassword --storage-account myadls2 --container-name mycontainer --root-folder /hive/warehouse --hdi-version 3.6 --file-system-id myadls2storage
hive agent add azure --name azureRemoteAgent --db-server-name mysqlserver --database-name mydb1 --auth-method AD_MSI --storage-account myadls2 --container-name mycontainer --root-folder /hive/warehouse --hdi-version 3.6 --file-system-id myadls2storage --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5552
hive agent add azure --name azureRemoteAgent --db-server-name mysqlserver --database-name mydb1 --auth-method AD_MSI --client-id b67f67ex-ampl-e2eb-bd6d-client9385id --storage-account myadls2 --container-name mycontainer --root-folder /hive/warehouse --hdi-version 3.6 --file-system-id myadls2storage --host myRemoteHost.example.com --port 5552
hive agent add filesystem
#
Add a filesystem hive agent to connect to your host's local filesystem using the hive agent add filesystem
command.
SYNOPSYS hive agent add filesystem [--filesystem-id] string [--root-folder] string [[--name] string]
#
Mandatory Parameters--filesystem-id
The filesystem identifier to be used.--root-folder
The path to use as the root directory for the filesystem agent.
#
Optional Parameters--name
The identifier to give to the new Hive agent.
#
Examplehive agent add filesystem --filesystem-id myfilesystem --root-folder /var/lib/mysql --name fsAgent
hive agent add glue
#
Add an AWS Glue hive agent to connect to an AWS Glue data catalog using the hive agent add glue
command.
If your LiveData Migrator host can communicate directly with the AWS Glue Data Catalog, then a local hive agent will be sufficient. Otherwise, consider using a remote hive agent.
remote deployments
For a remote hive agent connection, specify a remote host (EC2 instance) that will be used to communicate with the local LiveData Migrator server (constrained to a user-defined port).
A small service will be deployed on this remote host so that the hive agent can migrate data to and/or from the AWS Glue Data Catalog.
SYNOPSYS hive agent add glue [[--name] string] [[--access-key] string] [[--secret-key] string] [[--glue-endpoint] string] [[--aws-region] string] [[--glue-catalog-id] string] [[--credentials-provider] string] [[--glue-max-retries] integer] [[--glue-max-connections] integer] [[--glue-max-socket-timeout] integer] [[--glue-connection-timeout] integer] [[--file-system-id] string] [[--default-fs-override] string] [[--host] string] [[--port] integer] [--no-ssl]
#
Glue Parameters--name
The identifier to give to the new Hive agent. This is referenced in the UI as Name.--glue-endpoint
The AWS Glue service endpoint for connections to the data catalog. VPC endpoint types are also supported. This is referenced in the UI as AWS Glue Service Endpoint.--aws-region
The AWS region that your data catalog is located in (default isus-east-1
). If--glue-endpoint
is specified, this parameter will be ignored. This is referenced in the UI as AWS Region.
Additionally, use only one of the following parameters:
--file-system-id
The name of the filesystem that will be associated with this agent (for example:mys3bucket
). This will ensure any path mappings are correctly linked between the filesystem and the agent. This is referenced in the UI as Filesystem.--default-fs-override
Provide an override for the default filesystem URI instead of a filesystem name (for example:s3a://mybucket/
). This is referenced in the UI as DefaultFs Override.
#
Glue Credential Parameters--credentials-provider
The AWS catalog credentials provider factory class. This is referenced in the UI as AWS Catalog Credentials Provider.- If this parameter is not provided, the default is DefaultAWSCredentialsProviderChain.
- If the
--access-key
and--secret-key
parameters are provided, the credentials provider will automatically default to StaticCredentialsProviderFactory.
--access-key
The AWS access key. This is referenced in the UI as Access Key.--secret-key
The AWS secret key. This is referenced in the UI as Secret Key.
#
Glue Optional Parameters--glue-catalog-id
The AWS Account ID to access the Data Catalog. This is used if the Data Catalog is owned by a different account to the one provided by the credentials provider and cross-account access has been granted.--glue-max-retries
The maximum number of retries the Glue client will perform after an error.--glue-max-connections
The maximum number of parallel connections the Glue client will allocate.--glue-max-socket-timeout
The maximum time the Glue client will allow for an established connection to timeout.--glue-connection-timeout
The maximum time the Glue client will allow to establish a connection.
#
Parameters for remote hive agents only--host
The host where the remote hive agent will be deployed.--port
The port for the remote hive agent to use on the remote host. This port is used to communicate with the local LiveData Migrator server.--no-ssl
TLS encryption and certificate authentication is enabled by default between LiveData Migrator and the remote agent. Use this parameter to disable it.
#
Steps for remote agent deploymentFollow these steps to deploy a remote hive agent for AWS Glue:
Transfer the remote server installer to your remote host (Amazon EC2 instance):
Example of secure transfer from local to remote hostscp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
On your remote host, run the installer as root (or sudo) user in silent mode:
./hivemigrator-remote-server-installer.sh -- --silent
On your remote host, start the remote server service:
service hivemigrator-remote-server start
On your local host, run the
hive agent add glue
command to configure your remote hive agent.See the Example for remote AWS Glue agent example below for further guidance.
#
Exampleshive agent add glue --name glueAgent --access-key ACCESS6HCFPAQIVZTKEY --secret-key SECRET1vTMuqKOIuhET0HAI78UIPfSRjcswTKEY --glue-endpoint glue.eu-west-1.amazonaws.com --aws-region eu-west-1 --file-system-id mys3bucket
hive agent add glue --name glueAgent --access-key ACCESS6HCFPAQIVZTKEY --secret-key SECRET1vTMuqKOIuhET0HAI78UIPfSRjcswTKEY --glue-endpoint glue.eu-west-1.amazonaws.com --aws-region eu-west-1 --file-system-id mys3bucket --host myRemoteHost.example.com --port 5552
hive agent add hive
#
Add a hive agent to connect to a local or remote Apache Hive metastore using the hive agent add hive
command.
remote deployments
When connecting to a remote Apache Hive metastore, specify a host on the remote cluster that will be used to communicate with the local LiveData Migrator server (constrained to a user-defined port).
A small service will be deployed on this remote host so that the hive agent can migrate data to and/or from the remote Apache Hive metastore.
SYNOPSYS hive agent add hive [[--config-path] string] [[--kerberos-principal] string] [[--kerberos-keytab] string] [[--name] string] [[--host] string] [[--port] integer] [--no-ssl] [--autodeploy] [[--ssh-user] string] [[--ssh-key] file] [[--ssh-port] int] [--use-sudo] [--ignore-host-checking] [[--file-system-id] string] [[--default-fs-override] string]
#
Mandatory Parameters--kerberos-principal
Not required if Kerberos is disabled. The Kerberos principal to use to access the Hive service (for example:hive/myhost.example.com@REALM.COM
). This is referenced in the UI as Principal.--kerberos-keytab
Not required if Kerberos is disabled. The path to the Kerberos keytab containing the principal to access the Hive service (for example:/etc/security/keytabs/hive.service.keytab
). This is referenced in the UI as Keytab.--name
The identifier to give to the new Hive agent. This is referenced in the UI as Name.
Additionally, use only one of the following parameters:
--file-system-id
The name of the filesystem that will be associated with this agent (for example:myhdfs
). This will ensure any path mappings are correctly linked between the filesystem and the agent. This is referenced in the UI as Filesystem.--default-fs-override
Provide an override for the default filesystem URI instead of a filesystem name (for example:hdfs://nameservice01
). This is referenced in the UI as DefaultFs Override.
#
Optional Parameters--config-path
The path to the directory containing the Hive configuration files. If not specified, LiveData Migrator will use the default location for the cluster distribution. This is referenced in the UI as Override Default Hadoop Configuration Path.
#
Parameters for remote hive agents only--host
The host where the remote hive agent will be deployed.--port
The port for the remote hive agent to use on the remote host. This port is used to communicate with the local LiveData Migrator server.--no-ssl
TLS encryption and certificate authentication is enabled by default between LiveData Migrator and the remote agent. Use this parameter to disable it.
#
Parameters for automated deployment--autodeploy
The remote agent will be automatically deployed when this flag is used. If using this, the--ssh-key
parameter must also be specified.--ssh-user
The SSH user to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter).--ssh-key
The absolute path to the SSH private key to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter).--ssh-port
The SSH port to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter). Default is port22
.--use-sudo
All commands performed by the SSH user will usesudo
on the remote host when performing automatic deployment (using the--autodeploy
parameter).--ignore-host-checking
Ignore strict host key checking when performing the automatic deployment (using the--autodeploy
parameter).
#
Steps for manual deploymentIf you do not wish to use the --autodeploy
function, follow these steps to deploy a remote hive agent for Apache Hive manually:
Transfer the remote server installer to your remote host:
Example of secure transfer from local to remote hostscp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
On your remote host, run the installer as root (or sudo) user in silent mode:
./hivemigrator-remote-server-installer.sh -- --silent
On your remote host, start the remote server service:
service hivemigrator-remote-server start
On your local host, run the
hive agent add hive
command without using--autodeploy
and its related parameters to configure your remote hive agent.See the Example for remote Apache Hive deployment - manual example below for further guidance.
#
Exampleshive agent add hive --name sourceAgent --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@LOCALREALM.COM --file-system-id mysourcehdfs
hive agent add hive --name targetautoAgent --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5552 --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@REMOTEREALM.COM --config-path <example directory path> --file-system-id mytargethdfs
Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.
hive agent add hive --name targetmanualAgent --host myRemoteHost.example.com --port 5552 --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@REMOTEREALM.COM --config-path <example directory path> --file-system-id mytargethdfs
Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.
note
If specifying Kerberos and config path information for remote agents, ensure that the directories and Kerberos principal are correct for your chosen remote host (not your local host).
hive agent add databricks
#
note
Databricks agents are currently available as a preview feature.
info
The source table format must be Parquet to ensure a successful migration to Databricks Delta Lake.
Add a Databricks hive agent to connect to a Databricks Delta Lake metastore (AWS, Azure or GCP) using the hive agent add databricks
command.
If your LiveData Migrator host can communicate directly with the Databricks Delta Lake, then a local hive agent will be sufficient. Otherwise, consider using a remote hive agent.
remote deployments
For a remote hive agent connection, specify a remote host that will be used to communicate with the local LiveData Migrator server (constrained to a user-defined port).
A small service will be deployed on this remote host so that the hive agent can migrate data to and/or from the Databricks Delta Lake.
SYNOPSYS hive agent add databricks [[--name] string] [--jdbc-server-hostname] string [--jdbc-port] int [--jdbc-http-path] string [--access-token] string [[--fs-mount-point] string] [--convert-to-delta] [--delete-after-conversion] [[--file-system-id] string] [[--default-fs-override] string] [[--host] string] [[--port] integer] [--no-ssl]
#
Enable JDBC connections to DatabricksThe following steps are required to enable Java Database Connectivity (JDBC) to Databricks Delta Lake:
Download the Databricks JDBC driver.
Unzip the package and upload the
SparkJDBC42.jar
file to the LiveData Migrator host machine.Move the
SparkJDBC42.jar
file to the LiveData Migrator directory below:/opt/wandisco/hivemigrator/agent/databricks
Change ownership of the Jar file to the HiveMigrator system user and group:
Example for hive:hadoopchown hive:hadoop /opt/wandisco/hivemigrator/agent/databricks/SparkJDBC42.jar
#
Databricks Mandatory Parameters--name
The identifier to give to the new Hive agent. This is referenced in the UI as Name.--jdbc-server-hostname
The server hostname for the Databricks cluster (AWS, Azure or GCP). This is referenced in the UI as JDBC Server Hostname.--jdbc-port
The port used for JDBC connections to the Databricks cluster (AWS, Azure or GCP). This is referenced in the UI as JDBC Port.--jdbc-http-path
The HTTP path for the Databricks cluster (AWS, Azure or GCP). This is referenced in the UI as JDBC Http Path.--access-token
The personal access token to be used for the Databricks cluster (AWS, Azure or GCP). This is referenced in the UI as Access Token.
Additionally, use only one of the following parameters:
important
If the --convert-to-delta
option is used, the --default-fs-override
parameter must also be provided with the value set to dbfs:
, or a path inside the Databricks filesystem. For example, dbfs:/mount/externalStorage
.
--file-system-id
The name of the filesystem that will be associated with this agent (for example:myadls2
ormys3bucket
). This will ensure any path mappings are correctly linked between the filesystem and the agent. This is referenced in the UI as Filesystem.--default-fs-override
Provide an override for the default filesystem URI instead of a filesystem name (for example:dbfs:
). This is referenced in the UI as DefaultFs Override.
#
Databricks Optional Parameters--fs-mount-point
Define the ADLS/S3/GCP location within the Databricks filesystem for containing migrations (for example:/mnt/mybucketname
). This is referenced in the UI as FS Mount Point.note
This parameter is required if
--convert-to-delta
is used. The Databricks agent will copy all associated table data and metadata into this location within the Databricks filesystem during conversion.--convert-to-delta
All underlying table data and metadata is migrated to the storage location defined by the--fs-mount-point
parameter. Use this option to automatically copy the associated data and metadata into Delta Lake on Databricks (AWS, Azure or GCP), and convert tables into Delta Lake format. This is referenced in the UI as Convert to delta format.The following parameter can only be used if
--convert-to-delta
has been specified:--delete-after-conversion
Use this option to delete the underlying table data and metadata from the storage location (defined by--fs-mount-point
) once it has been converted into Delta Lake on Databricks. This is referenced in the UI as Delete after conversion.important
Only use this option if you are performing one-time migrations for the underlying table data. The Databricks agent does not support continuous (live) updates of table data when transferring to Delta Lake on Databricks.
If a migration to Databricks runs without the
--convert-to-delta
option, then some migrated data may not be visible from the Databricks side. To avoid this issue, ensure that the value ofdefault-fs-override
is set to "dbfs:
" with the value of--fs-mount-point
.Example:
--default-fs-override dbfs:/mnt/mybucketname
#
Parameters for remote hive agents only--host
The host where the remote hive agent will be deployed.--port
The port for the remote hive agent to use on the remote host. This port is used to communicate with the local LiveData Migrator server.--no-ssl
TLS encryption and certificate authentication is enabled by default between LiveData Migrator and the remote agent. Use this parameter to disable it.
#
Steps for remote agent deploymentFollow these steps to deploy a remote hive agent for Databricks Delta Lake:
Transfer the remote server installer to your remote host:
Example of secure transfer from local to remote hostscp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
On your remote host, run the installer as root (or sudo) user in silent mode:
./hivemigrator-remote-server-installer.sh -- --silent
On your remote host, start the remote server service:
service hivemigrator-remote-server start
On your local host, run the
hive agent add databricks
command to configure your remote hive agent.See the Example for remote Databricks agent example below for further guidance.
#
Exampleshive agent add databricks --name databricksAgent --jdbc-server-hostname mydbcluster.cloud.databricks.com --jdbc-port 443 --jdbc-http-path sql/protocolv1/o/8445611123456789/0234-125567-testy978 --access-token daexamplefg123456789t6f0b57dfdtoken4 --file-system-id mys3bucket --default-fs-override dbfs: --fs-mount-point /mnt/mybucket --convert-to-delta
hive agent add databricks --name databricksAgent --jdbc-server-hostname mydbcluster.cloud.databricks.com --jdbc-port 443 --jdbc-http-path sql/protocolv1/o/8445611123456789/0234-125567-testy978 --access-token daexamplefg123456789t6f0b57dfdtoken4 --file-system-id mys3bucket --default-fs-override dbfs: --fs-mount-point /mnt/mybucket --convert-to-delta --host myRemoteHost.example.com --port 5552
hive agent check
#
Check the configuration of an existing hive agent using hive agent check
.
SYNOPSYS hive agent check [--name] string
#
Examplehive agent check --name azureAgent
hive agent configure azure
#
Change the configuration of an existing Azure hive agent using hive agent configure azure
.
The parameters that can be changed are the same as the ones listed in the hive agent add azure
section.
All parameters are optional except --name
, which is required to specify the existing hive agent that you wish to configure.
#
Examplehive agent configure azure --name azureAgent --database-password CorrectPassword
hive agent configure filesystem
#
Change the configuration of an existing filesystem hive agent using hive agent configure filesystem
.
The parameters that can be changed are the same as the ones listed in the hive agent add filesystem
section.
All parameters are optional except --name
, which is required to specify the existing hive agent that you wish to configure.
#
Examplehive agent configure filesystem --name fsAgent --root-folder /user/dbuser/databases
hive agent configure glue
#
Change the configuration of an existing AWS Glue hive agent using hive agent configure glue
.
The parameters that can be changed are the same as the ones listed in the hive agent add glue
section.
All parameters are optional except --name
, which is required to specify the existing hive agent that you wish to configure.
#
Examplehive agent configure glue --name glueAgent --aws-region us-east-2
hive agent configure hive
#
Change the configuration of an existing Apache hive agent using hive agent configure hive
.
The parameters that can be changed are the same as the ones listed in the hive agent add hive
section.
All parameters are optional except --name
, which is required to specify the existing hive agent that you wish to configure.
#
Examplehive agent configure hive --name sourceAgent --kerberos-keytab /opt/keytabs/hive.keytab --kerberos-principal hive/myhostname.example.com@REALM.COM
hive agent configure databricks
#
Change the configuration of an existing Databricks agent using hive agent configure databricks
.
The parameters that can be changed are the same as the ones listed in the hive agent add databricks
section.
All parameters are optional except --name
, which is required to specify the existing hive agent that you wish to configure.
#
Examplehive agent configure hive --name databricksAgent --access-token myexamplefg123456789t6fnew7dfdtoken4
hive agent delete
#
Delete the specified hive agent with hive agent delete
.
SYNOPSYS hive agent delete [--name] string
#
Examplehive agent delete --name azureAgent
hive agent list
#
List configured hive agents with hive agent list
.
SYNOPSYS hive agent list [--detailed]
#
Examplehive agent list --detailed
hive agent show
#
Show the configuration of a hive agent with hive agent show
.
SYNOPSYS hive agent show [--name] string
#
Examplehive agent show --name azureAgent
hive agent types
#
Print a list of supported hive agent types with hive agent types
.
SYNOPSYS hive agent types
#
Examplehive agent types
#
Hive Rule commandshive rule add
,hive rule create
#
Create a hive migration rule that is used to define which databases and tables are migrated.
info
Specify these rules when starting a new migration to control which databases and tables are migrated.
SYNOPSYS hive rule add [--database-pattern] string [--table-pattern] string [[--name] string]
ALSO KNOWN AS hive rule create
#
Mandatory Parameters--database-pattern
Specify a Hive DDL pattern that will match the database names you want to migrate.--table-pattern
Specify a Hive DDL pattern that will match the table names you want to migrate.
tip
You can use a single asterisk (*
) if you want to match all databases and/or all tables within the metastore/database.
#
Optional Parameters--name
The name for the hive rule.
#
Examplehive rule add --name test_databases --database-pattern test* --table-pattern *
hive rule configure
#
Change the parameters of an existing hive rule.
The parameters that can be changed are the same as the ones listed in the hive rule add
,hive rule create
section.
All parameters are optional except --name
, which is required to specify the existing hive rule that you wish to configure.
#
Examplehive rule configure --name test_databases --database-pattern test_db*
hive rule delete
#
Delete a hive rule.
SYNOPSYS hive rule delete [--name] string
#
Examplehive rule delete --name test_databases
hive rule list
#
List all hive rules.
SYNOPSYS hive rule list
#
Examplehive rule list
hive rule show
#
Show details of a hive rule.
SYNOPSYS hive rule show [--name] string
#
Examplehive rule show --name test_databases
#
Hive Migration Commandshive migration add
#
Create a new hive migration to initiate metadata migration from your source metastore.
info
Create hive rules before initiating a hive migration to specify which databases and tables are migrated.
SYNOPSYS hive migration add [--source] string [--target] string [[--rule-names] list] [[--name] string] [--auto-start] [--once]
#
Mandatory Parameters--source
The name of the hive agent for the source of migration.--target
The name of the hive agent for the target of migration.
#
Optional Parameters--rule-names
The rule name or list of rule names to use with the migration. Multiple rules need to be comma-separated (for example:rule1,rule2,rule3
).--name
The name to identify the migration with.--auto-start
Specify this parameter to start the migration immediately after creation.--once
Specify this parameter to perform a one-time migration, and not continuously scan for new or changing metadata.
#
Examplehive migration add --source sourceAgent --target remoteAgent --rule-names test_dbs,user_dbs --name hive_migration --auto-start
note
Auto-completion of the --rule-names
parameter will not work correctly if it is added at the end of the hive migration parameters. See the troubleshooting guide for workarounds.
hive migration delete
#
Delete a hive migration.
note
A hive migration must be stopped state before it can be deleted, this can be achieved by using the --force-stop
parameter with this command.
SYNOPSYS hive migration delete [--name] string [--force-stop]
#
Examplehive migration delete --name hive_migration --force-stop
hive migration list
#
List all hive migrations.
SYNOPSYS hive migration list
#
Examplehive migration list
hive migration show
#
Display information about a hive migration.
SYNOPSYS hive migration show [--name] string
#
Examplehive migration show --name hive_migration
note
Specify the --detailed
parameter to output .
hive migration start
#
Start a hive migration or a list of hive migrations (comma-separated).
note
Specify the --once
parameter to perform a one-time migration, and not continuously scan for new or changing metadata.
SYNOPSYS hive migration start [--name] list [--once]
#
Examplehive migration start --name hive_migration1,hive_migration2
hive migration start all
#
Start all hive migrations.
note
Specify the --once
parameter to perform a one-time migration, and not continuously scan for new or changing metadata.
SYNOPSYS hive migration start all [--once]
#
Examplehive migration start all --once
hive migration status
#
Show the status of a hive migration or a list of hive migrations (comma-separated).
SYNOPSYS hive migration status [--name] list
#
Examplehive migration status --name hive_migration1,hive_migration2
hive migration status all
#
Show the status of all hive migrations.
SYNOPSYS hive migration status all
#
Examplehive migration status all
hive migration stop
#
Stop a running hive migration or a list of running hive migrations (comma-separated).
SYNOPSYS hive migration stop [--name] list
#
Examplehive migration stop --name hive_migration1,hive_migration2
hive migration stop all
#
Stop all running hive migrations.
SYNOPSYS hive migration stop all
#
Examplehive migration stop all
#
Hive Show Commandshive show conf
#
Show the value of a configuration property from a specific agent.
SYNOPSYS hive show conf [--parameter] string [[--agent-name] string]
#
Hive show conf parameters--agent-name
The name of the agent.--parameter
The configuration parameter/property that you want to show the value of.
#
Examplehive show conf --agent-name sourceAgent --parameter hive.metastore.uris
hive show database
#
Show details about a database from a specified agent.
SYNOPSYS hive show database [--database] string [[--agent-name] string]
#
Hive show database parameters--agent-name
The name of the agent.--database
The database name. If not specified, the default will bedefault
.
#
Examplehive show database --agent-name sourceAgent --database mydb01
hive show databases
#
Show a list of databases from a specified agent.
SYNOPSYS hive show databases [[--like] string] [[--agent-name] string]
#
Hive show databases parameters--agent-name
The name of the agent.--like
The Hive DDL pattern to use to match the database names (for example:testdb*
will match any database name that begins with "testdb").
#
Examplehive show database --agent-name sourceAgent --like testdb*
hive show indexes
#
Show a list of indexes for a database/table from a specified agent.
SYNOPSYS hive show indexes [--database] string [--table] string [[--agent-name] string]
#
Hive show indexes parameters--agent-name
The name of the agent.--database
The database name.--table
The table name.
#
Examplehive show indexes --agent-name sourceAgent --database mydb01 --table mytbl01
hive show partitions
#
Show a list of partitions for a database/table from a specified agent.
SYNOPSYS hive show partitions [--database] string [--table] string [[--agent-name] string]
#
Hive show partitions parameters--agent-name
The name of the agent.--database
The database name.--table
The table name.
#
Examplehive show partitions --agent-name sourceAgent --database mydb01 --table mytbl01
hive show table
#
Show details about a table from a specified agent.
SYNOPSYS hive show table [--database] string [--table] string [[--agent-name] string]
#
Hive show table parameters--agent-name
The name of the agent.--database
The database name where the table is located.--table
The table name.
#
Examplehive show table --agent-name sourceAgent --database mydb01 --table mytbl01
hive show tables
#
Show a list of tables for a database from a specified agent.
SYNOPSYS hive show tables [[--like] string] [[--database] string] [[--agent-name] string]
#
Hive show tables parameters--agent-name
The name of the agent.--like
The Hive DDL pattern to use to match the table names (for example:testtbl*
will match any table name that begins with "testtbl").
#
Examplehive show tables --agent-name sourceAgent --database mydb01 --like testtbl*
#
Notification Commandsnotification email addresses add
#
Add email addresses to the subscription list for email notifications.
SYNOPSYS notification email addresses add [--addresses] set
#
Mandatory Parameters--addresses
A comma-separated lists of email addresses to be added.
#
Examplenotification email addresses add --addresses myemail@company.org,personalemail@gmail.com
notification email addresses remove
#
Remove email addresses from the subscription list for email notifications.
SYNOPSYS notification email addresses remove [--addresses] set
#
Mandatory Parameters--addresses
A comma-separated lists of email addresses to be removed. Use auto-completion to quickly select from subscribed emails.
#
Examplenotification email addresses remove --addresses myemail@company.org,personalemail@gmail.com
notification email smtp set
#
Configure the details of an SMTP server for LiveData Migrator to connect to.
SYNOPSYS notification email smtp set [--host] string [--port] integer [--security] security-enum [--email] string [[--login] string] [[--password] string]
#
Mandatory Parameters--host
The host address of the SMTP server.--port
The port to connect to the SMTP server. Many SMTP servers use port 25.--security
The type of security the server uses. Can be eithertls
ornone
.--email
The email address for LiveData Migrator to use with emails sent through the SMTP server. This address will be the sender of all configured email notifications.
#
Optional Parameters--login
The username to authenticate with the SMTP server.--password
The password to authenticate with the SMTP server login. Required if a login is provided.
#
Examplenotification email smtp set --host my.internal.host --port 587 --security TLS --email livedatamigrator@wandisco.com --login myusername --password mypassword
notification email smtp show
#
Display the details of the SMTP server LiveData Migrator is configured to use.
SYNOPSYS notification email smtp show
notification email subscriptions show
#
Show a list of currently subscribed emails and notifications.
SYNOPSYS notification email subscriptions show
notification email types add
#
Add notification types to the email notification subscription list.
See the output from the command notification email types show
for a list of all currently available notification types.
SYNOPSYS notification email types add [--types] set
#
Mandatory Parameters--types
A comma-separated list of notification types to subscribe to.
#
Examplenotification email types add MISSING_EVENTS,EVENTS_BEHIND,MIGRATION_AUTO_STOPPED
notification email types remove
#
Remove notification types from the email notification subscription list.
SYNOPSYS notification email types remove [--types] set
#
Mandatory Parameters--types
A comma-separated list of notification types to unsubscribe from.
#
Examplenotification email types remove MISSING_EVENTS,EVENTS_BEHIND,MIGRATION_AUTO_STOPPED
migration path status
#
View all actions scheduled on a source filesystem in the specified path.
SYNOPSYS migration path status [--source-path] string [--source] string
#
Mandatory Parameters--source-path
The path on the filesystem to review actions for. Supply a full directory.--source
The filesystem ID of the source system the path is in.
#
Examplemigration path status --source-path /root/mypath/ --source mySource
notification email types show
#
Return a list of all available notification types to subscribe to.
SYNOPSYS notification email types show
notification latest
#
Display the latest notification LiveData Migrator presented and additional details about the notification.
SYNOPSYS notification latest
notification show
#
Show the details of a specific notification. Use tab autocompletion to cycle through the list of notifications received along with their type, timestamp and UUID.
SYNOPSYS notification show [--notification-id] string
#
Mandatory Parameters- `
--notification-id
The UUID of the notification to be shown.
#
Examplenotification show --notification-id urn:uuid:6a1f2047-8445-460d-b27c-ec5c0496b727
#
License Commandslicense show
#
Show the details of the active license.
SYNOPSYS license show [--full]
license upload
#
Upload a new license by submitting its location on the local filesystem.
SYNOPSYS license upload [--path] string
#
Examplelicense upload --path /user/hdfs/license.key
#
Connect Commandsconnect livemigrator
#
Connect to the LiveData Migrator service on your LiveData Migrator host with this command.
note
This is a manual method of connecting to the LiveData Migrator service as the livedata-migrator
command (shown in CLI - Log in) will attempt to establish this connection automatically.
SYNOPSYS connect livemigrator [--host] string [--ssl] [[--port] int] [[--timeout] integer] [[--user] string]
#
Mandatory Parameters--host
The hostname or IP address for the LiveData Migrator host.
#
Optional Parameters--ssl
Specify this parameter if you want to establish an TLS connection to LiveData Migrator. Enable Server TLS on the LiveData Migrator service before using this parameter.--port
The LiveData Migrator port to connect on (default is18080
).--timeout
Define the connection timeout in milliseconds. Set this parameter to override the default connection timeout of 5 minutes (300000ms).--user
The username to use for authenticating to the LiveData Migrator service. Used only when the LiveData Migrator instance has basic authentication enabled. You will still be prompted to provide the user password.
#
Exampleconnect livemigrator --host localhost --port 18080
connect hivemigrator
#
Connect to the HiveMigrator service on your LiveData Migrator host with this command.
note
This is a manual method of connecting to the HiveMigrator service as the livedata-migrator
command (shown in CLI - Log in section) will attempt to establish this connection automatically.
SYNOPSYS connect hivemigrator [--host] string [--ssl] [[--port] int] [[--timeout] long] [[--user] string]
#
Mandatory Parameters--host
The hostname or IP address for the LiveData Migrator host that contains the HiveMigrator service.
#
Optional Parameters--ssl
Specify this parameter if you want to establish a TSL connection to HiveMigrator.--port
The HiveMigrator service port to connect on (default is6780
).--timeout
Define the connection timeout in milliseconds. Set this parameter to override the default connection timeout of 5 minutes (300000ms).--user
The username to use for authenticating to the Hive Migrator service. Used only when Hive Migrator has basic authentication enabled. You will still be prompted to provide the user password.
#
Exampleconnect hivemigrator --host localhost --port 6780
#
Built-in Commandsclear
#
Clear the interactive action prompt screen output with the clear
command. You can also type <Ctrl-L>
to achieve the same, even while typing another command.
SYNOPSYS clear
echo
#
Prints whatever text that you write to the console. This can be used to sanity check a command before running it (for example: echo migration add --path /repl1 --target mytarget โ-migration-id myNewMigration --exclusions 100mbfiles
).
SYNOPSYS echo [--message] string
exit
or quit
#
Entering either exit
or quit
will stop operation of LiveData Migrator when it is run from the command line. All processing will cease, and you will be returned to your system shell.
If your LiveData Migrator command line is connected to a LiveData Migrator system service, this command will end your interactive session with that service, which will remain in operation to continue processing Live migrations.
If this command is encountered during non-interactive processing of input (such as when you pipe input to an instance as part of another shell script) no further commands contained in that input will be processed.
SYNOPSYS exit
ALSO KNOWN AS quit
help
#
Use the help
command to get details of all commands available from the action prompt.
SYNOPSYS help [[-C] string]
For longer commands, you can use backslashes (\
) to indicate continuation, or use quotation marks ("
) to enclose the full command. When using quotation marks, you can press Tab on your keyboard to make LiveData Migrator automatically suggest the remainder of your typed command.
See the examples below for reference.
#
Examplehelp connect
NAME connect - Connect to LiveData Migrator and HiveMigrator.
SYNOPSYS connect [[--host] string] [--ssl] [[--lm2port] int] [[--hvm-port] int] [[--timeout] integer] [[--user] string]
help hive\ migration\ add
NAME hive migration add - Create new migration.
SYNOPSYS hive migration add [--source] string [--target] string [[--name] string] [--auto-start] [--once] [--rule-names] list
help "filesystem add local"
NAME filesystem add local - Add a Local filesystem via HCFS API.
SYNOPSYS filesystem add local [--file-system-id] string [[--fs-root] string] [--source] [--scan-only] [[--properties-files] list] [[--properties] string]
history
#
Enter history
at the action prompt to list all previously entered commands.
Entering history --file <filename>
will save up to 500 most recently entered commands in text form to the file specified. Use this to record commands that you have executed.
SYNOPSYS history [[--file] file]
#
Optional Parameters--file
The name of the file in which to save the history of commands.
script
#
Load and execute commands from a text file using the script --file <filename>
command. This file should have one command per line, and each will be executed as though they were entered directly at the action prompt in that sequence.
Use scripts outside of the LiveData Migrator CLI by referencing the script when running the livedata-migrator
command (see examples).
SYNOPSYS script [--file] file
#
Mandatory Parameters--file
The name of the file containing script commands.
hive agent check --name sourceAgenthive agent check --name azureAgent
#
Examplesinfo
These examples assume that myScript
is inside the working directory.
script --file myScript
livedata-migrator --script=./myScript
stacktrace
#
Use the stacktrace
command to get full technical information about the source of an error during LiveData Migrator operation.
SYNOPSYS stacktrace
#
Action Prompt FeaturesThe action prompt provides many features to guide you during operation.
Feature | How to use it |
---|---|
Review available commands | Commands that cannot be used without creating other resources first are tagged with * in the output of the help command. |
Command completion | Hit the <tab> key at any time to get assistance or to complete partially-entered commands. |
Cancel input | Type <Ctrl-C> before entering a command to return to an empty action prompt. |
Syntax indication | Invalid commands are highlighted as you type. |
Clear the display | Type <Ctrl-L> at any time. |
Previous commands | Navigate previous commands using the up and down arrows, and use standard emacs shortcuts. |
Interactive or scripted operation | You can interact with the command line interface directly, or send it commands on standard input to incorporate it into shell scripts. See script for more information and examples. |
#
System Service CommandsThe service scripts can be used to control operation of each individual service at any time.
#
LiveData Migratorservice livedata-migrator start|stop|force-reload|restart|status
#
HiveMigratorservice hivemigrator start|stop|force-reload|restart|status
#
LiveData UIservice livedata-ui start|stop|force-reload|restart|status
#
Log CommandsThe following commands will only affect logging of the CLI terminal, and will not affect other components of LiveData Migrator:
log off
log info
log debug
log trace
#
External CommandsUse these commands outside of the LiveData Migrator CLI.
livedata-migrator
#
Launch LiveData Migrator and its connected services.
#
Optional Parameters--version
List the versions of all LiveData Migrator components without starting LiveData Migrator. Includes LiveData Migrator, LiveData UI, LiveData Migrator CLI, HiveMigrator and the HiveMigrator Azure Libraries.
#
Example Output# livedata-migrator --versionlivedata-migrator 1.12.0-1462livedata-ui 6.6.1-1914livedata-migrator-cli 1.3.0-209hivemigrator 1.3.0-514hivemigrator-azure-hdi 1.3.0-514