LiveData Migrator for Azure is an easy way of migrating your on-premises Hadoop data to Azure. You can set it up quickly in your environment on an edge node, or try it out with our HDFS Trial Sandbox solution that mimics a source filesystem.
The HDFS Trial Sandbox option is available when you create the LiveData Migrator resource in the Azure Portal.
Before you start, make sure you have all the necessary prerequisites. LiveData Migrator supports the following operating systems:
- Ubuntu 16 and 18
- CentOS 6 and 7
- Red Hat Enterprise Linux 6 and 7
The edge node on your on-premises cluster needs the following:
- Hadoop client libraries installed.
- Hadoop client available within the systemd environment.
- Minimum system memory should exceed: (4 x (CPU threads) x 8MB x (pull threads) x 3)
- Example: (4 x (16 CPU threads) x 8MB x (50 pull threads) x 3) = 76.8 GB of RAM
- Java 1.8+
- If Kerberos is enabled on your Hadoop cluster, a valid keytab containing a suitable principal for the HDFS superuser must be available on the edge node.
- If you want to migrate Hive metadata from your Hadoop cluster, the edge node must also have a keytab containing a suitable principal for the Hive service.
- Create the resource in Azure Portal
- Create the resource with the Azure CLI
If you haven't already done so, register the WANdisco resource provider.
Sign in to the Azure Portal and check your intended subscription name is correct.
Go to the Marketplace, or select Add New Resource: search for "LiveData Migrator for Azure".
Select the LiveData Migrator subscription, then select Subscribe (if in the marketplace), or Create (if adding a resource).
Complete the Basic details to create the LiveData Migrator resource:
- Choose to use an existing resource group, or create a new one as part of your setup process.
- Select a supported region in the Instance details.
- Provide a name for the migrator resource.
- Select Yes if you want to use the Hadoop test cluster in the HDFS Trial Sandbox as your source environment for testing, or No if you are using your own Hadoop cluster.
- If Yes is selected, provide the Cluster Admin Username and Password (with confirmation) that you will use to sign in to the test cluster.
Select Review + create once the details have been provided.
If prompted, fill the I agree to the terms of service checkbox to consent to give the WANdisco resource provider access to your subscription. This will register the resource provider.
Select Create after reviewing the summary.
Run the following command to create the migrator resource:
-g: The name of the resource group in which your migrator resource will be created.
--migrator-name*: The name to use for your migrator.
-l: The Azure region to create the migrator resource in. See the Supported regions list before entering your chosen Azure region.
If you're having problems creating a migrator resource, you can find solutions in the troubleshooting guide.
We recommend you make the following configuration changes to your HDFS cluster environment to prepare for data migrations.
You can adjust several properties on the HDFS NameNode to prevent data migrations from stalling due to an excess of notifications, or from operating too slowly.
Configure these properties in the
hdfs-site.xml for the cluster. This will vary depending on your distribution:
- Custom hdfs-site
- Advanced hdfs-site
- Custom hdfs-site
Filesystem Checkpoint Transaction Threshold
NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml
HDFS Client Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml
The HDFS Client entries are required for the LiveData Migrator host to register and confirm the configuration.
After configuring HDFS properties, you must restart all cluster services that rely on HDFS configuration for their function (including the HDFS service) for the changes to apply.
This value dictates the maximum number of events the NameNode can send to an inotify client in a single RPC response. By default, this is set to
1000, which should consume no more than 1MB of memory.
You may increase this value to allow iNotify clients (such as LiveData Migrator) to receive larger batches of event notifications in a single RPC, at the cost of higher memory use.
We recommend setting this value to
100000 for production use. By increasing this, your NameNode should be capable of allocating at least an additional 100MB of memory from its maximum heap capacity to deliver these larger batches of events.
This value determines the threshold of which the number of namespace transactions will trigger a checkpoint, updating the filesystem metadata. If this threshold is reached, the checkpoint will be triggered regardless of whether the
dfs.namenode.checkpoint.period has expired.
The default value for this is
1000000, but we recommend increasing it to
10000000 for production use.
Once your environment is ready and you've created a LiveData Migrator resource, you're ready to download and install LiveData Migrator.