Skip to main content

Prerequisites

Make sure your environment has the following prerequisites to successfully install and use LiveData Migrator for Azure.

note

The minimum speed for the bandwidth limit is 1024KB/s.

Machine specification

  • 16 CPUs, 48 GB RAM (minimum 4 CPUs, 32 GB RAM)

    • For Hadoop sources, install on an existing edge node. For busy clusters, it may be necessary to dedicate an edge node, use a higher spec machine, or deploy data transfer agents to spread the load across the cluster.
  • 200 GB (minimum 100 GB)

    • SSD-based storage is recommended.
  • 2 Gbps minimum network capacity

    • Your network bandwidth must be able to cope with transferring data and ongoing changes from your source filesystem.
  • Example specification for a deployment of [data transfer agents] with Data Migrator

    For a 10 Gbps network capacity, have 8 CPUs and 16 GB RAM available. You may host more agents on smaller machines, if preferred. If your network capacity is higher than 10 Gbps, you can increase the specifications of your agent host to ensure you take advantage of available bandwidth. Eventually, increasing network capacity won’t necessarily increase your data throughput as other bottlenecks may occur.

    In general, the RAM and CPU should be enough to saturate your available bandwidth for data migrations. If both source and target are HDFS which don’t use encryption and have cheap checksums, you can lower these specifications. If the machine has, for example, 25 Gbps network capacity, increase the CPUs and RAM to support it.

    If you need more guidance for your specific use case, please contact WANdisco Support.

Supported operating systems

LiveData Migrator for Azure supports the following:

  • Ubuntu 16 and 18
  • CentOS 7
  • Red Hat Enterprise Linux 6 and 7

Supported filesystem versions

Filesystems supported as sources:

  • Hortonworks Data Platform (HDP) 2.6.3, 2.6.5, 3.1.0, 3.1.5
  • Cloudera Distribution Hadoop (CDH) 5.13, 5.14, 5.15, 5.16, 6.2, 6.3
  • Cloudera Data Platform (CDP) 7.1.4, 7.1.6, 7.1.7
  • Arenadata Hadoop (ADH) 2.14
note

All HDFS versions above Hadoop 2.6 should work. However, if you need additional support, create a support ticket.

Filesystems supported as targets:

  • ADLS Gen2
note

LiveData Migrator for Azure does not support ADLS Gen1 or Azure Blob Storage as target filesystems.

Supported metastores for metadata migrations:

  • Azure HDI 4.0 Hive External Metastore
  • Azure SQL Database
  • Databricks cluster
  • Snowflake warehouse

Required Azure CLI versions:

  • 2.26.0 or higher

See the Microsoft documentation for instructions on how to install the Azure CLI.

note

We recommend you use the highest available version of the Azure CLI at all times.

Azure HDInsight version requirements:

  • HDI 4.0

Databricks JDBC driver requirements:

  • Databricks JDBC driver 2.6.25 or higher

HDI Enterprise Security Package (ESP) is not supported.

Source environment

Provide your own Hadoop cluster or use the HDFS Sandbox to test LiveData Migrator for Azure. You can also use a Hortonworks Data Platform (HDP) sandbox.

info

See Getting Started with the Sandbox to learn how to use the HDFS Sandbox with LiveData Migrator for Azure.

To use LiveData Migrator with your own Hadoop cluster, you need the following:

  • Root access to the Hadoop cluster.
  • All of the network requirements.
    • For example, ports 5671 and 443 must be open for outbound traffic.

The edge node on your on-premises cluster needs the following:

  • Hadoop client libraries installed.
  • Hadoop client available within the systemd environment.
  • Java 1.8+
  • If Kerberos is enabled on your Hadoop cluster, a valid keytab containing a suitable principal for the HDFS superuser must be available on the edge node.
    • If you want to migrate Hive metadata from your Hadoop cluster, the edge node must also have a keytab containing a suitable principal for the Hive service.

Azure resources

Make sure your selected region is supported and consistent across all your resources.

Resources for LiveData Migrator metadata migrations

To migrate metadata, you will require the following:

Supported regions

The following regions are supported:

  • Australia East
  • Canada Central*
  • Canada East*
  • East Asia
  • East Japan
  • East US 2
  • East US
  • North Europe
  • Southeast Asia
  • South Central US
  • UK South
  • West Europe
  • West Central US
  • West Japan
  • West US
  • West US 2
  • West US 3

Regions marked with an asterisk (*) are a preview feature and under development.

note

If you can't see or select some of the above regions in the Azure Portal, you're most likely using an old resource provider. Re-register the resource provider to fix this.

Next steps

As an additional prerequisite to use LiveData Migrator for Azure, register the LiveData resource provider.