Skip to main content
Version: 2.4.3 (latest)

Configure data transfer agent properties

See below for details on configuring the following:

Data transfer agent - application properties

PropertyDescriptionDefault value
dataagent.portThe port on which the data transfer agent is running.1433
dataagent.callback-period-msThe period in milliseconds for sending the callback from the data transfer agent to Data Migrator with the progress of the file transfer.1000
dataagent.fs-timeout-secThe period in seconds after which an unused filesystem is eligible for removal from the filesystem cache on the data transfer agent side. The filesystem cache for reusing filesystem representation for file transfer.600
dataagent.cache-cleanup-period-secThe period in seconds for running a cleaning cache job. If the job detects filesystem in cache isn't used during dataagent.fs-timeout-sec or more, the job deletes the filesystem from the cache.1800
dataagent.grpc-pool-max-threadsThe maximum number of threads that can be used for the gRPC connection, or the maximum number of files that can be transferred at the same time by a given data transfer agent.150
dataagent.grpc-pool-keep-alive-time-secThe period in seconds after which unused threads can be removed from the gRPC pool. The number of threads in the gRPC pool is managed automatically. We don't recommend changing it to a lower value (lower than 10)60
dataagent.grpc-callback-thread-countThe number of threads responsible for sending progress callbacks from the data transfer agent to the server. Can be increased if transferring a very large number of small files.5
dataagent.grpc-max-inbound-message-size-kbThe maximum message size in kilobytes that the data transfer agent can receive. Don't set the value to less than the default.4096
dataagent.thread-dump-dirThe directory contains thread dump files.${log.dir:./logs}/threads
dataagent.thread-dump-period-secThe frequency in seconds at which a new thread dump is created. We don't recommend reducing the value to below 60 seconds (one minute).3600
dataagent.thread-dump-number-filesThe maximum number of thread dump files in the thread dump directory. If the limit is exceeded, old files are deleted.24

Security properties

Security properties are generated when you install data transfer agents. Don't change the property values.

PropertyDescription
dataagent.grpc.security.client-secretThe secret key of the installed data transfer agent.
dataagent.grpc.security.keystorePath to a Java keystore file with secrets.
dataagent.grpc.security.keystore-passwordPassword for a Java keystore access.

Data Migrator - application properties

The following application properties allow Data Migrator to communicate with the data transfer agents.

PropertyDescriptionDefault value
hdfs.fs.delegationtoken.renew.period.secThe frequency in seconds that HDFS delegation tokens need to be renewed. This value should be less than dfs.namenode.delegation.token.renew-interval from the configuration of the HDFS source/target, and converted from milliseconds to seconds. Delegation tokens are used to submit file transfer tasks to the data transfer agent for Kerberos-enabled HDFS filesystems.3600
hdfs.fs.delegationtoken.refresh.factorThe value ranges from 0 to 1. This means the percentage of the remaining lifetime of the token, after which this token should be replaced with the new one. The default value is 0.85. This means that after passing 15% of the overall lifetime of the delegation token, this token won't be used for new file transfers, and a new token will be issued. This is important for migrating extremely large files. If, for instance, the total lifetime of the token is 7 days, and the transfer of each file takes 5 days, every subsequent file transfer should be started with a new token with enough remaining lifetime. We don't recommend setting this value any lower than 0.15.0.85
hdfs.fs.delegationtoken.cleanup.period.secThe period in seconds to run delegation token cache cleaning. If the job detects the delegation token in cache isn't being used by any data transfer agents and the lifetime of the token exceeds hdfs.fs.delegationtoken.refresh.factor, the job deletes the delegation token from the cache.60
dataagent.healthcheck.timeout.secThe period in seconds for a data transfer agent to respond before it is marked as unhealthy.60
dataagent.healthcheck.period.secThe frequency in seconds that Data Migrator checks the health status of data transfer agents.60
dataagent.healthcheck.thread.countThe number of threads used to check the health status of data transfer agents. You can increase this value if you're using a lot of data transfer agents or if there are frequent connection issues between Data Migrator and the data transfer agent servers.5
dataagent.grpc.thread.count.maxThe maximum number of threads used for the gRPC connection. Generally, this value should be the same as pull.threads.100
dataagent.grpc.thread.keepalive.time.secThe period in seconds, after which unused threads can be removed from the gRPC pool. The number of threads in the gRPC pool is managed automatically. We don't recommend changing it to a lower value (lower than 10).60
dataagent.loadbalancer.timeout.secThe maximum period in seconds for searching for the next available data transfer agent to submit a file transfer. If Data Migrator can't find an active data transfer agent after the specified period, an exception is thrown. We don't recommend setting the value to lower than 5.60
dataagent.transfer.attempts.maxThe number of attempts sending a file transfer task to the next data transfer agents when a data transfer fails. This doesn't affect the migration.file.max.retries property. It means that if customer has 10 data transfer agents, the property set migration.file.max.retries is set to 5, and the property dataagent.transfer.attempts.max is set to 7, then Data Migrator attempts to transfer the file using 7 data transfer agents every migration.file.max.retries attempt (5*7=35 times). This also depends on any exceptions.5
dataagent.stats.collect.period.secThe frequency in seconds that Data Migrator gathers and updates statistical information about data transfer agents (for example, the number of bytes migrated, files migrated, and so on).5

Secure communication

S3 target filesystems

To enable secure communication between the data transfer agent and an S3 target filesystem type, you need to use delegation tokens. There are three types of token that give agents access to S3 buckets:

  • Session token
    This token expires and you can’t renew it. It's useful for services that run for a short time.

  • Role token
    This token is available with a AWS account specific role for a short time.

  • (Recommended) Full delegation token
    This token contains the AWS access and secret keys needed to access a bucket. It doesn't expire.

For more information about delegation tokens, see Apache Hadoop - Working with Delegation Tokens.