Skip to main content
Version: 3.2 (latest)

Release notes

Product VersionLiveData UILiveData MigratorHive MigratorCLI
3.2.014.10.33.2.22.6.92.1.3

Release Highlights

Databricks Ingest Performance

In this release, we've significantly enhanced Data Migrator's ability to migrate and convert Hive formatted tables to Delta format on Databricks by introducing cross-partition batching for COPY INTO commands.

Previously, when converting partitioned Hive tables into Delta format, Data Migrator would process each partition independently. This meant issuing a separate COPY INTO command for every single partition, which resulted in an excessive number of operations and inefficient use of resources on Databricks.

Now, instead of processing partitions in isolation, Data Migrator batches files across multiple partitions into fewer, larger COPY INTO operations. By grouping partitions this way, we significantly reduce the number of commands issued and enable Databricks to handle ingestion workloads more efficiently. This leads to massive performance improvements, especially for datasets with a large number of small or unevenly populated partitions. The result is faster overall migration times and better scalability.

More information on Databricks prerequisites can be found here.

Parallel Scan

Scanning files and directories is central to the data migration process. Challenges with this scanning include where there are large filesystems, with high latency, and significant client activity. To meet these challenges we have re-engineered our scanning approach and introduced Parallel Scan.

In this release we have made improvements to Parallel Scan, which can be enabled as a beta feature. Parallel Scan can improve scanning performance, reduce migration times, and enhance the management of your system resources through the introduction of a Global Scan Pool for Target Match migrations. Any improvements will depend on your particular environment, the number of migrations, and the structure of your data in terms of files and directories. We will be targeting additional improvements around tuning capabilities in a future release.

The Parallel Scan beta feature can be enabled for testing and evaluation. More information can be found here. Please consult with support for additional guidance.

Recheck and Repair options for Verification now available from UI

You can now benefit from the verification repair and recheck options being available in the UI. It is important to understand how these features work. A verification repair will add a pending region to be rescanned provided the migration is using Target Match, and the verification itself will likely complete before the result of the repair has been effected. This means that once the effect of the repair has taken place, you can execute a verification recheck to review any inconsistencies that may have been resolved via the repair. Therefore this is a two-stage process of verification repair and then verification recheck. Of course a verification recheck can be performed at any time provided there is at least one inconsistency in the last verification report which was executed.

For more information on verification repair see here.

note

The verification report generation process does not wait for the action triggered by the "repair" to complete. Most likely we need to recheck after verification repair has both generated the verification report and any changes made by the "repair" have been effective.

Hive Remote Agents - Multi-Filesystem Support

This release introduces multi-filesystem support for Hive remote agents. On S3 based environments, you can now use a single hive remote agent to cater for multiple S3 target filesystems. Once the remote agent is installed and configured on your target, you can identify which filesystem you want a dataset to be associated with per metadata migration via the default fs override parameter.

Other Improvements

UI Performance

The UI H2 database has been upgraded in this release and will require users to take note of the space requirements during the upgrade process - see Storage Requirements for the UI database upgrade in Release 3.2. The headline here is that during the upgrade you will need to ensure that there is at least ten times the size of your current UI H2 database available as free space on your filesystem to allow this H2 database upgrade process to complete successfully. This upgrade will greatly enhance the performance and maintainability of the UI going forward.

UI Improvements

We have consolidated the Metrics and Diagnostics pages for a migration into a single Metrics page for simplicity and ease of use. Within this page we have added a full history for the migration file size distribution and also introduced a separate page called Active Transfers which displays the active file transfers per migration.

Several user experience areas have also been improved, notably when creating metadata rules from the metadata migrations page, when resetting or deleting a migration to warn that the associated verification reports will be deleted, and on bulk reset to provide the same advice that any associated verification reports will be removed.

CLI Enhancements

In this release, we have taken the opportunity to enhance the CLI in several key areas. The following commands transform the information available to users and provide an insight into the progress of your migrations:

  • migration stats now in a much improved format, including detailed information on the migration and filesystem scanner
  • migration list interactive table with options to sort, filter and see migration stats for a selected migration
  • file transfers new command introduced to improve the feedback to users on the transfer of files during a migration

In addition to these quantum leaps for the CLI commands, we have also enhanced the information displayed for the migration verification show command on tab completion so that it functions correctly and it is much easier to distinguish between verification reports.

More information on the CLI commands used to manage migrations can be found here. Further information on all of the CLI commands is available here.

Resolved Issues

Data Migrator Core

LM2-8419 Introduce parallelism into Scanning (Global Scan Pool)

LM2-8593 Can't update LDAP config because license is invalid

LM2-8607 Removal of Legacy verification reports

LM2-8640 REST API - New metrics and logging for Parallel Scan

LM2-8665 Target match stats counted in non-target match migration

LM2-8674 Unable to create GCS FS

LM2-8680 ORC files failing to be enriched for Iceberg Migrations

LM2-8682 Attempting to create s3a source when one already exists can fail

LM2-8695 Parallel Scan file status cache does not get updated

LM2-8720 Handle case when root of Source Filesystem is unavailable

LM2-8733 Invalid caching of rename pending region child paths

LM2-8734 Manage multiple CurrentRegions for parallel scan

LM2-8738 AWS SDK Deprecation WARN message in logs after adding S3 filesystem

LM2-8745 Recurring parallel scan migration stats

LM2-8747 "Internal Server Error", while adding AWS endpoint

LM2-8754 Repair failing due to non Target Match for Parallel scan labelled Migration

LM2-8756 DanglingPathAction not included in the TRANSFERRING set

LM2-8760 Hadoop client library update for aws

LM2-8763 Cancel verification api throws exception

LM2-8767 Two Way scan should not abandon window if source path does not exist

LM2-8776 Exception adding S3 compatible storage due to missing region

LM2-8791 DanglingPathActions should attempt to clean up parent directory structure

LM2-8810 Parallel Scan - Global ThreadPool Allocation

LM2-8820 Log Errors : waiting for leases but the lease queue is empty

LM2-8823 Scanner should count invalid regions towards the Iteration Limit

LM2-8824 Optimize scanning specific method so that it scales

LM2-8826 Parallel scan deletion of currently scanning region parent

LM2-8828 LDM fails to start if S3 target is inaccessible

LM2-8829 Removing unused source FS interrupts connection to unrelated source FS

LM2-8830 CRC Files generated on localFS target

LM2-8832 Validation only works on localFS with opening /

LM2-8833 Files missing from Target incorrectly reported

LM2-8870 Live Migrator "Requested array size exceeds VM limit" error

Hive Migrator

HVM-5244 Logging from the calls for the token-exchange call can become heavy

HVM-5249 Insert timestamp not working to Unity Databricks

HVM-5257 Renaming a partitioned table and inserting data duplicates table contents

HVM-5281 Slow API calls when dealing with remote agent config

HVM-5292 OpenAPI definitions are broken

HVM-5317 Ability to add key/value pairs as additional properties for REST catalog

HVM-5318 Iceberg double slash in the manifestList path of the metadata JSON

HVM-5365 Remote Hive Agents require further SDK implementation/Investigation

HVM-5368 Databricks alter table can force unnecessary drop/create

HVM-5383 Checksum comparison between HDP 2.6.5 and HDP 3.1.5 client

HVM-5384 Enable Kerberos Support in Iceberg Hive Agent

HVM-5385 Reset migration removes targetLocation value

HVM-5388 Databricks : Cluster usage vs threadcount limit

HVM-5392 Path mappings evaluated for agent's filesystem

HVM-5393 Efficient Data Event Handling

HVM-5394 Multithreaded File Scanner on migration restart

HVM-5395 Batch Copy-into stats

HVM-5397 HiveAgent alter table fallback to drop/create does not batch partitions

HVM-5405 Remote agent dropCreateTable does not re-add partitions

HVM-5406 Remote agent does internal restart to update defaultFsOverride value

UI

ONEUI-7349 Update 'Migrator' to 'Instance'

ONEUI-8073 UI can cut off the right hand side of the screen

ONEUI-8248 Upgrade PMD to 7.x.x

ONEUI-8282 Migration file size distribution - Full History

ONEUI-8287 Combine Metrics and diagnostics page into one

ONEUI-8290 Active file transfers per migration

ONEUI-8312 Display HVM Remote Server Versions on the UI

ONEUI-8317 LDMFailedPathsSyncDataProvider dumps migrations to logs

ONEUI-8353 OAuth Server URI should be specifically describing the Token Endpoint

ONEUI-8354 Remove info level logging in the LMV2MigrationTableSummaryResource

ONEUI-8360 Discrepancies with the license usage

ONEUI-8364 Checksum action policy cannot be selected when S3 is a source or target

ONEUI-8369 Enable recheck for verification report

ONEUI-8377 Warning for delete of verification reports on migration reset/delete

ONEUI-8378 Unable to delete FS if instance has any running migrations

ONEUI-8384 Add ability to create metadata rule in metadata migrations page

ONEUI-8386 Enable repair option for verification via UI

ONEUI-8387 H2 database upgrade should force compaction after ingest from dump

ONEUI-8389 H2 upgrade service improvement to manage file lock

ONEUI-8390 Update and simplify the AWS Auth options on AWS S3 Filesystems

ONEUI-8398 Bulk reset action shows migrations which can't be reset

ONEUI-8403 Include migration-ids in bulk reset with verifications alert text

ONEUI-8404 Migration overview can report 'Active for Invalid Date'

ONEUI-8409 Warning level logs displaying as Errors/Failures in the UI notifications

ONEUI-8410 Retries screen keeps collapsing itself

ONEUI-8416 Active File Transfers screen pagination issue

ONEUI-8417 Filters on Bulk Actions screens not filtering results

ONEUI-8428 Table page doesn't update when data removed

ONEUI-8432 Values persist in UI after verifications have been deleted

ONEUI-8433 Unable to bulk reset migrations on upgrade

CLI

LDMC-567 Migration Stats enhancements

LDMC-568 LDM File Transfers enhancements

LDMC-619 Warning for delete of verification reports on migration reset/delete

LDMC-635 Tab auto complete issues with migration verification show command

LDMC-642 More info for migration verification show command on tab completion

LDMC-647 Must supply a value for --fs-root when creating a local filesystem