Prerequisites
Make sure your environment has the following prerequisites to successfully install and use LiveData Migrator for Azure.
The minimum speed for the bandwidth limit is 1024KB/s.
Machine specification
16 CPUs, 48 GB RAM (minimum 4 CPUs, 32 GB RAM)
- For Hadoop sources, install on an existing edge node. For busy clusters, it may be necessary to dedicate an edge node, use a higher spec machine, or deploy data transfer agents to spread the load across the cluster.
200 GB (minimum 100 GB)
- SSD-based storage is recommended.
2 Gbps minimum network capacity
- Your network bandwidth must be able to cope with transferring data and ongoing changes from your source filesystem.
Example specification for a deployment of [data transfer agents] with Data Migrator
For a 10 Gbps network capacity, have 8 CPUs and 16 GB RAM available. You may host more agents on smaller machines, if preferred. If your network capacity is higher than 10 Gbps, you can increase the specifications of your agent host to ensure you take advantage of available bandwidth. Eventually, increasing network capacity won’t necessarily increase your data throughput as other bottlenecks may occur.
In general, the RAM and CPU should be enough to saturate your available bandwidth for data migrations. If both source and target are HDFS which don’t use encryption and have cheap checksums, you can lower these specifications. If the machine has, for example, 25 Gbps network capacity, increase the CPUs and RAM to support it.
If you need more guidance for your specific use case, please contact WANdisco Support.
Supported operating systems
LiveData Migrator for Azure supports the following:
- Ubuntu 16 and 18
- CentOS 7
- Red Hat Enterprise Linux 6 and 7
Supported filesystem versions
Filesystems supported as sources:
- Hortonworks Data Platform (HDP) 2.6.3, 2.6.5, 3.1.0, 3.1.5
- Cloudera Distribution Hadoop (CDH) 5.13, 5.14, 5.15, 5.16, 6.2, 6.3
- Cloudera Data Platform (CDP) 7.1.4, 7.1.6, 7.1.7
- Arenadata Hadoop (ADH) 2.14
All HDFS versions above Hadoop 2.6 should work. However, if you need additional support, create a support ticket.
Filesystems supported as targets:
- ADLS Gen2
LiveData Migrator for Azure does not support ADLS Gen1 or Azure Blob Storage as target filesystems.
Supported metastores for metadata migrations:
- Azure HDI 4.0 Hive External Metastore
- Azure SQL Database
- Databricks cluster
- Snowflake warehouse
Required Azure CLI versions:
- 2.26.0 or higher
See the Microsoft documentation for instructions on how to install the Azure CLI.
We recommend you use the highest available version of the Azure CLI at all times.
Azure HDInsight version requirements:
- HDI 4.0
Databricks JDBC driver requirements:
- Databricks JDBC driver 2.6.25 or higher
HDI Enterprise Security Package (ESP) is not supported.
Source environment
Provide your own Hadoop cluster or use the HDFS Sandbox to test LiveData Migrator for Azure. You can also use a Hortonworks Data Platform (HDP) sandbox.
See Getting Started with the Sandbox to learn how to use the HDFS Sandbox with LiveData Migrator for Azure.
To use LiveData Migrator with your own Hadoop cluster, you need the following:
- Root access to the Hadoop cluster.
- All of the network requirements.
- For example, ports 5671 and 443 must be open for outbound traffic.
The edge node on your on-premises cluster needs the following:
- Hadoop client libraries installed.
- Hadoop client available within the systemd environment.
- Java 1.8+
- If Kerberos is enabled on your Hadoop cluster, a valid keytab containing a suitable principal for the HDFS superuser must be available on the edge node.
- If you want to migrate Hive metadata from your Hadoop cluster, the edge node must also have a keytab containing a suitable principal for the Hive service.
Azure resources
Make sure your selected region is supported and consistent across all your resources.
- An Azure subscription.
- Resource group to contain the Azure zone resources.
- Owner or Contributor level access to the resource group.
- Create an ADLS Gen2 storage account and container with hierarchical namespace enabled.
Resources for LiveData Migrator metadata migrations
To migrate metadata, you will require the following:
- An Azure SQL Database in the same Azure resource group as LiveData Migrator.
Supported regions
The following regions are supported:
- Australia East
- Canada Central*
- Canada East*
- East Asia
- East Japan
- East US 2
- East US
- North Europe
- Southeast Asia
- South Central US
- UK South
- West Europe
- West Central US
- West Japan
- West US
- West US 2
- West US 3
Regions marked with an asterisk (*) are a preview feature and under development.
If you can't see or select some of the above regions in the Azure Portal, you're most likely using an old resource provider. Re-register the resource provider to fix this.
Next steps
As an additional prerequisite to use LiveData Migrator for Azure, register the LiveData resource provider.