Skip to main content

Configure your HDFS cluster

LiveData Migrator for Azure reads events from a HDFS cluster's NameNode to track changes to data on the filesystem. The NameNode properties on this page affect how quickly LiveData Migrator for Azure can process changes and recover from network or storage device failures during a migration.

To optimize migration performance, use the recommended values on this page in your HDFS cluster's hdfs-default.xml configuration file.

Select a property below for more information.

The number of transactions the NameNode retains. LiveData Migrator for Azure reads these transactions to track filesystem activity.1,000,000Increase this value to 25,000,000 to minimize the risk of losing necessary edit logs during a migration.This setting doesn't impact cluster performance, but you need a few gigabytes of extra storage for the edits.
The maximum number of events the NameNode can send to inotify clients (including LiveData Migrator for Azure) in one Remote Procedure Call (RPC) response.1,000Increase this value to 100,000 to let migrations process more events with every RPC, increasing your data migrations' maximum data transfer rate.The increased number of RPC events increases NameNode memory consumption by 1MB.
The number of transactions after which the NameNode will create a checkpoint, splitting the filesystem load by letting it read multiple, smaller checkpoints of events.10,000Use the default value.No change.
The maximum number of edit checkpoints, which contain transactions retained by the number of extra edits, that the NameNode will maintain. You typically won't need to change this.1,000,000Use the default value.No change.

Configuration properties explained

Number of extra edits retained

This is the number of additional edits (also called "events") on the system that the NameNode records in edit log files on the disk.

The NameNode creates a log of every file edit and creates checkpoints periodically to prevent these records stacking up indefinitely. After each checkpoint, the NameNode stashes its edits as a checkpoint file and deletes the original edit logs. However, it stores the most recent edits in a log file, up to the number of edits specified in this property.

LiveData Migrator for Azure reads edits to keep track of filesystem activity for replication on the target filesystem during a migration. If LiveData Migrator for Azure can't access the expected edit logs past its current point in a migration, the migration will fail and will return the exception org.apache.hadoop.hdfs.inotify.MissingEventsException on each of its org.apache.hadoop.hdfs.DFSInotifyEventInputStream calls.

If LiveData Migrator for Azure loses access to the HDFS for a long time during a migration, it may try to resume reading deleted edits and fail.

The recommended value is suitable for most large-scale use. If you expect extremely high data edit rates or lengthy outages during migrations, increase this property's value further.

Maximum inotify events from each RPC

This is the maximum number of events the NameNode can send to LiveData Migrator for Azure and other inotify clients in a single Remote Procedure Call (RPC) response.

LiveData Migrator for Azure sends RPCs to read events on the filesystem, which it uses to detect data changes that need migrated. The filesystem returns the same number of events as this property's value. On filesystems with lots of activity, the default maximum of 1,000 means the NameNode sends events more slowly than they happen to the filesystem, which causes migrations will progress much slower than filesystem changes.

Maximum extra edits segments retained

This is the number of files containing logged edits that the NameNode will retain on the filesystem at any given time.

Each of these edit log files contains the same number of edits equal to dfs.namenode.num.extra.edits.retained. Increase the number of edits retained instead to preserve the edit logs for LiveData Migrator for Azure, and keep this property at its default value.

NameNode checkpoint transactions

This is the number of transactions (events) after which the NameNode will create a checkpoint, splitting the filesystem load by letting it read multiple, smaller checkpoints of events instead of a single, oversized checkpoint which could harm performance. In most cases, no modification is necessary.

Learn more

See the Hadoop documentation for more information about each of these NameNode configuration properties.