Metadata Verifications and Repairs
Verification repairs are a preview feature; therefore, they must be enabled before they can be used.
You can execute multiple metadata verifications in parallel. When verifications have completed, you can also run multiple repairs in parallel. The default settings are one concurrent verification and five concurrent repairs. You can modify these default values using the API to better align with the specific resource availability and requirements of your environment. To generate curl commands or find out more information, see the API documentation.
Setting Concurrency for Verifications
Memory Management: Before increasing the concurrent verification limit, ensure the HiveMigrator heap size is increased to support the higher memory demand.
Metastore Impact: A higher number of verification threads can increase the load on the Hive Metastore. To maintain high availability for other clients, limit the number of verification threads to a level that balances verification speed with Metastore responsiveness.
To modify the verification concurrency limits, send a POST request to the /verification/config endpoint.
Configuration Parameters
CONCURRENT_VERIFICATIONS
Description: The maximum number of verification jobs allowed to run at once.
Guidance: Supports running verifications for multiple migrations in parallel. Requires increasing the HiveMigrator heap size to support the added load.
VERIFICATION_THREADS
Description: The number of worker threads assigned to each verification job.
Guidance: Controls the processing speed of each individual verification. Higher values increase the load on the Hive Metastore.
Curl command example
curl -X 'POST' \
'http://localhost:6780/verification/config' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"configs": [
{
"configKey": "CONCURRENT_VERIFICATIONS",
"value": 4
},
{
"configKey": "VERIFICATION_THREADS",
"value": 8
}
]
}'
Setting Concurrency for Repairs
To modify the repair concurrency limits, send a POST request to the /verification/config endpoint.
Configuration Parameters
CONCURRENT_VERIFICATIONS_REPAIRS
Description: The maximum number of simultaneous repair tasks allowed across all completed verifications. Guidance: Verification repair jobs utilize the same thread pool as hivemigrator.migrationWorkerThreads. Increasing the number of concurrent repairs may reduce overall migration throughput as these processes share the same resources
Curl command example
curl -X 'POST' \
'http://localhost:6780/verification/config' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"configs": [
{
"configKey": "CONCURRENT_VERIFICATION_REPAIRS",
"value": 2
}
]
}'