Frequently Asked Questions

Find answers to the most common questions asked about LiveData Migrator and LiveData Plane for Azure.

LiveData Migrator for Azure#

What are the supported operating systems for LiveData Migrator?#

LiveData Migrator supports a number of Linux-based operating systems:

  • Ubuntu 16 and 18
  • CentOS 6 and 7
  • Red Hat Enterprise Linux 6 and 7

Where should I install LiveData Migrator for production use?#

LiveData Migrator needs to be installed on an edge node in your Hadoop cluster. The edge node should have Java 1.8 and Hadoop clients (for example: HDFS client, Hive client, Kerberos client) installed but without any co-located/competing services. We recommend that the node's resources are dedicated to the running of LiveData Migrator.

Does LiveData Migrator support Kerberos authentication?#

If your Hadoop cluster has Kerberos enabled, ensure that the edge node has a valid keytab containing a suitable principal for the HDFS superuser.

Example of HDFS principal
hdfs@MYREALM.COM

If you are wanting to migrate Hive metadata from your Hadoop cluster, the edge node must also have a keytab containing a suitable principal for the Hive service.

Example of Hive service principal
hive/myLiveDataMigratorHost@MYREALM.COM

How can I test LiveData Migrator?#

If you want to test LiveData Migrator first before installing on a production environment, use our HDFS Trial Sandbox solution as the source filesystem. Your ADLS Gen2 storage account and container will be the target filesystem.

The Trial Sandbox option can be selected when creating the LiveData Migrator resource through the Azure Portal.

How do I control what is or isn't migrated?#

When migrating data, you will provide a path on your source filesystem (HDFS) and only the files and subdirectories contained within this path will be migrated to your target filesystem (ADLS Gen2 container).

Example source filesystem path
/my/migration/path

You can also exclude certain files and directories from being migrated within this path by creating exclusion templates. Exclusions templates are used to prevent files and directories being migrated based on their size, last modified date, and name.

LiveData Plane for Azure#

Replication#

What limitations are there with replication rules?

It is not possible to rename or move files between two existing replication rules.

For example, if you have two replication rules such as /repl1/dir1 and /repl2/dir1, moving /repl1/dir1/fileA to /repl2/dir1/ is not supported.

If your workflow requires files to be moved between two replicated locations, then ensure they are part of the same replicated rule.

For example, moving /repl1/dir1/fileA to /repl1/dir2/ is supported.

Can I replicate all data from Hadoop applications?

If an application writes data to a Hadoop Compatible File System (such as HDFS) then replication of data is possible. However, certain applications rely on metadata to organize and make use of the stored HCFS data.

Apache Hive is an example of this as it uses metadata to organize data sets into databases, tables, partitions, etc. This metadata is stored within its Hive Metastore.

As such, you would need to replicate both the HCFS data and Hive metadata in order to make use of it at a target storage location. The Fusion Plugin for Live Hive combined with HCFS data replication would enable this for Apache Hive.

Are there data or file replication limits?

It is possible to replicate unlimited amounts of data and files, as long as your server and storage capacity allows it.

You must consider the limits set by your license though.

Zones#

How many zones can I have?

When you install LiveData Platform for Azure, you create a zone for each file system/storage that you want to replicate either to or from. There is technically no upper limit for the number of zones, you should have as many as you need for your particular use case.

As the number of zones increase, there may be a small performance impact as the time to agree consensus for transactions between each zone may take fractionally longer.

You can create replication rules that replicate between specific zones, regardless of the number of zones you have in your group.

How do I size/spec the Fusion server(s) for a new zone?

For an Azure zone, LiveData Platform for Azure currently provides three options for VM sizes: Small, Medium, and Large. Selecting one of these will depend on the number of files and the average file size.

See the Sizing article for more information on choosing the right size for you.

Migrations#

How do I monitor the status of a migration?

You can list the status of migrations on a specified replication rule using the command line, see the Monitoring Migrations section for guidance.

Costs#

What costs would LiveDataPlatform incur on my Azure subscription during the trial?

The first 25TB of data migration is free. We'll bill you for anything over this allowance.

How can costs be minimized during/after the trial period? Is there any option to turn down compute when data is not being replicated from on-prem into Azure Data Lake Store?

Cost is calculated based on number of transactions. You won't incur costs if there is low operation or no operation at all.

Networking#

What network requirements do I need for LiveData Migrator or LiveData Platform for Azure?

Before you start installing LiveData Platform for Azure, you need to set up your network. See the Network Requirements page to learn how to set up your virtual network, see the port requirements, and more.

If you are having problems with networking, you can find solutions in the troubleshooting guide.