Skip to main content
Version: 1.18.1 (latest)

Optimize your HDFS NameNode configuration

Optimal configuration recommendations#

LiveData Migrator tracks events and transactions on your Hadoop Distributed File System (HDFS) NameNode to migrate data. Some default NameNode configuration settings can slow down migrations with heavier data loads, so we recommend configuring each of these properties in the hdfs-site.xml file on your HDFS cluster:

Maximum events per Remote Procedure Call#

Recommendation#

Set dfs.namenode.inotify.max.events.per.rpc to 100000 in your hdfs-site.xml file.

Explanation#

This is the maximum number of events the NameNode can send to LiveData Migrator and other inotify clients in a single Remote Procedure Call (RPC) response. LiveData Migrator sends RPCs to read events on the filesystem, which it uses to detect data changes that need to be migrated. On filesystems with lots of activity, the default maximum of 1000 may mean the NameNode sends events more slowly than they happen to the filesystem, and migrations will therefore fall behind changes made.

This value is 1000 by default. Increasing it to 100000 lets migrations process more events with each RPC, increasing the rate of data migration.

Impact#

Changing dfs.namenode.inotify.max.events.per.rpc to 100000 will only consume an additional 1MB of NameNode memory.

Number of extra edits retained#

Recommendation#

Set dfs.namenode.num.extra.edits.retained to 25000000 in your hdfs-site.xml file.

Explanation#

This is the maximum number of transactions the NameNode retains. Every time a new transaction is made past this number, the oldest one is deleted.

LiveData Migrator uses these transactions to detect and migrate changes to the filesystem. If the next queued transaction to read is deleted while a migration is underway or paused, the migration won't be able to continue. This will usually only happen on a very transaction-heavy filesystem if LiveData Migrator is manually turned off or undergoing an update, where it will restart and try to continue migrations from events that have already been deleted.

This value is 1000000 by default. Increasing it to 25000000 eliminates the risk of transaction loss with no impact on performance.

Impact#

Changing dfs.namenode.num.extra.edits.retained to 25000000 affect cluster performance. The extra transaction retention will only consume extra storage space of approximately several gigabytes.

HDFS properties and what they do#

Below is a glossary of the NameNode configuration properties that are most relevant to LiveData Migrator:

PropertyDescriptionDefault valueRecommended value
dfs.namenode.inotify.max.events.per.rpcThe maximum number of events the NameNode can send to inotify clients (including LiveData Migrator) in one Remote Procedure Call (RPC) response.1000100000
dfs.namenode.num.extra.edits.retainedThe number of transactions the NameNode retains. LiveData Migrator reads these transactions to track filesystem activity.100000025000000
dfs.namenode.checkpoint.txnsThe number of transactions after which the NameNode will create a checkpoint, splitting the filesystem load by letting it read multiple, smaller checkpoints of events instead of a single, oversized checkpoint which could harm performance. In most cases, no modification is necessary.10000001000000
dfs.namenode.max.extra.edits.segments.retainedThe maximum number of extra edit checkpoints, which contain transactions retained by the number of extra edits, that the NameNode will maintain. In most cases, no modification is necessary.1000010000
note

Restart all cluster services that rely on HDFS configuration (including the HDFS service) to apply your configuration changes.

Next steps#

If you made these recommended configuration changes in advance of installing LiveData Migrator, you can now set up your network and then download and install.