Skip to main content
Version: 3.1.1 (latest)

Iceberg Agent Overview

Data Migrator provides support for Iceberg to migrate your source Hive metadata to a target Iceberg supported catalog. Review the following prerequisites and functionality before configuring your agent.

Prerequisites

  • Connection to an Apache Iceberg Hive Catalog on watsonx.data or a REST Catalog.
  • For Hive Catalog on watsonx.data, the target filesystem must be S3 compatible targets.
  • If your migration includes column addition operations, ensure hive.metastore.disallow.incompatible.col.type.changes is set to false on your target Hive Metastore configuration, either a hive-site.xml or a metastore-site.xml

Limitations

The following source table formats are supported:

  • Parquet.
  • ORC Hive.

With regard to transaction support: Full ACID transactions are not currently supported. Insert only transactions are supported.

Historical metadata retention limit:

  • The default and recommended maximum number of previous metadata versions to retain is 200 snapshots. Increasing beyond this recommended value may cause errors and undesired behaviour.

Hive Compaction:

  • Using Hive compaction results in Data Migrator removing those files from the target, this means time travel queries will no longer work correctly on the Iceberg target as the old files no longer exist and so cannot be included in a manifest list for an earlier snapshot.

Unsupported migration functionality

Functionality
ORC files generated by Hive versions pre 2.0.0
Hive 3.x ACID transactional tables
Hive constraints.
Indexes
Functions
Views
Materialized Views
Schema evolution involving column renames or data type changes, either in the past or while migrating. (Schema evolution involving add, drop or reordering columns is supported if supported on source.)
TBLPROPERTIES are not migrated from Hive to Iceberg
Target snapshot expiry and Garbage collection are not migrated by Hivemigrator, and should be configured on the target if required
info

Regarding drop-create rename operations. See the following Known issue for more information.

caution

Resetting an Iceberg migration will cause all tables to remigrate.

Supported partition column types

Partition column type
boolean
integer
bigint
float
double
string* (converts to varchar)
binary
decimal
date
  • STRING type columns/partitions will be migrated to Iceberg, but will be converted to VARCHAR type

Next steps

Configure your Iceberg Metadata Agent for Hive Catalog or Iceberg Metadata Agent for REST Catalog.