Skip to main content
Version: 3.4 (latest)

Metadata Verifications and Repairs

note

Verification repairs are a preview feature; therefore, they must be enabled before they can be used.

You can execute multiple metadata verifications in parallel. When verifications have completed, you can also run multiple repairs in parallel. The default settings are one concurrent verification and five concurrent repairs. You can modify these default values using the API to better align with the specific resource availability and requirements of your environment. To generate curl commands or find out more information, see the API documentation.

Setting Concurrency for Verifications

caution

Memory Management: Before increasing the concurrent verification limit, ensure the HiveMigrator heap size is increased to support the higher memory demand.
Metastore Impact: A higher number of verification threads can increase the load on the Hive Metastore. To maintain high availability for other clients, limit the number of verification threads to a level that balances verification speed with Metastore responsiveness.

To modify the verification concurrency limits, send a POST request to the /verification/config endpoint.

Configuration Parameters

CONCURRENT_VERIFICATIONS

Description: The maximum number of verification jobs allowed to run at once.
Guidance: Supports running verifications for multiple migrations in parallel. Requires increasing the HiveMigrator heap size to support the added load.

VERIFICATION_THREADS

Description: The number of worker threads assigned to each verification job.
Guidance: Controls the processing speed of each individual verification. Higher values increase the load on the Hive Metastore.

Curl command example

Example
curl -X 'POST' \
'http://localhost:6780/verification/config' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"configs": [
{
"configKey": "CONCURRENT_VERIFICATIONS",
"value": 4
},
{
"configKey": "VERIFICATION_THREADS",
"value": 8
}
]
}'

Setting Concurrency for Repairs

To modify the repair concurrency limits, send a POST request to the /verification/config endpoint.

Configuration Parameters

CONCURRENT_VERIFICATIONS_REPAIRS

Description: The maximum number of simultaneous repair tasks allowed across all completed verifications. Guidance: Verification repair jobs utilize the same thread pool as hivemigrator.migrationWorkerThreads. Increasing the number of concurrent repairs may reduce overall migration throughput as these processes share the same resources

Curl command example

Example
curl -X 'POST' \
'http://localhost:6780/verification/config' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"configs": [
{
"configKey": "CONCURRENT_VERIFICATION_REPAIRS",
"value": 2
}
]
}'