Skip to main content
Version: 2.0

Configure your HDFS cluster

Data Migrator reads events from a HDFS cluster's NameNode to track changes to data on the filesystem. The NameNode properties on this page affect how quickly Data Migrator can process changes and recover from network or storage device failures during a migration.

To optimize migration performance, follow these steps:

  1. Navigate to your cluster manager.
  2. Set the properties with the appropriate values detailed in the table below.
  3. Restart the recommended services.
  4. Restart Data Migrator.
  5. Navigate to the WANdisco® UI, then remove and add the HDFS source again.

Select a property below for more information.

PropertyDescriptionDefaultRecommendationImpact
dfs.namenode.num.extra.
edits.retained
The number of transactions the NameNode retains. Data Migrator reads these transactions to track filesystem activity.1,000,000Increase this value to 25,000,000 to minimize the risk of losing necessary edit logs during a migration.This setting doesn't impact cluster performance, but you need a few gigabytes of extra storage for the edits.
dfs.namenode.inotify.
max.events.per.rpc
The maximum number of events the NameNode can send to inotify clients (including Data Migrator) in one Remote Procedure Call (RPC) response.1,000Increase this value to 100,000 to let migrations process more events with every RPC, increasing your data migrations' maximum data transfer rate.The increased number of RPC events increases NameNode memory consumption by 1MB.
dfs.namenode.max.extra.
edits.segments.retained
The number of files containing logged edits that the NameNode retains on the filesystem at any given time.10,000Use the default value.No change.
dfs.namenode.
checkpoint.txns
The number of transactions (events) after which the NameNode creates a checkpoint, splitting the filesystem load by letting it read multiple, smaller checkpoints of events.1,000,000Use the default value.No change.

Configuration properties explained

Number of extra edits retained

This is the number of additional edits (also called "events") on the system that the NameNode records in edit log files on the disk.

The NameNode creates a log of every file edit and creates checkpoints periodically to prevent these records stacking up indefinitely. After each checkpoint, the NameNode stashes its edits as a checkpoint file and deletes the original edit logs. However, it stores the most recent edits in a log file, up to the number of edits specified in this property.

Data Migrator reads edits to keep track of filesystem activity for replication on the target filesystem during a migration. If Data Migrator can't access the expected edit logs past its current point in a migration, the migration will fail and will return the exception org.apache.hadoop.hdfs.inotify.MissingEventsException on each of its org.apache.hadoop.hdfs.DFSInotifyEventInputStream calls.

If Data Migrator loses access to the HDFS for a long time during a migration, it may try to resume reading deleted edits and fail.

The recommended value is suitable for most large-scale use. If you expect extremely high data edit rates or lengthy outages during migrations, increase this property's value.

Maximum inotify events from each RPC

This is the maximum number of events the NameNode can send to Data Migrator and other inotify clients in a single Remote Procedure Call (RPC) response.

Data Migrator sends RPCs to read events on the filesystem, which it uses to detect data changes that need migrated. The filesystem returns the same number of events as this property's value. On filesystems with lots of activity, the default maximum of 1,000 means the NameNode sends events more slowly than they happen to the filesystem, which causes migrations to progress more slowly than filesystem changes.

Maximum extra edits segments retained

This is the number of files containing logged edits that the NameNode retains on the filesystem at any given time.

The edit log is rolled periodically. The interval is defined by dfs.ha.log-roll.period. The default interval is 120 seconds which means each rolled edit segment log contains as many edits as occurred during a period of 120 seconds.

Increase the number of edits retained instead to preserve the edit logs for Data Migrator, and keep this property at its default value.

NameNode checkpoint transactions

This is the number of transactions (events) after which the NameNode creates a checkpoint, splitting the filesystem load by letting it read multiple, smaller checkpoints of events instead of a single, oversized checkpoint which could negatively affect performance. In most cases, no modification is necessary.

Learn more

See the Hadoop documentation for more information about each of these NameNode configuration properties. See this Knowledge base article for additional information on events, actions and queues in Data Migrator.