Skip to main content
Version: 1.21.0


Ready to start? Follow these steps to get ready to use Data Migrator.

Read the release notes to get the latest information about the current version of Data Migrator.

System administration

  • Linux operating system installation
  • Disk management
  • Memory monitoring and management
  • Command line administration and manually editing configuration files
  • Service configuration and management


  • IP address allocation
  • TCP/IP ports and firewall setup or server certificates (for TLS)

Cloud storage technologies

You need to be proficient with your intended target storage technologies and metadata technologies, such as:

  • AWS
    • For Amazon Web Services, this includes:
      • Knowledge of AWS Marketplace, Amazon Simple Storage Service (Amazon S3), AWS Glue Data Catalog, and AWS Command Line Tool.
      • Understanding any storage persistence and related costs.
      • Ability to monitor and troubleshoot AWS services.
      • AWS S3
      • AWS Glue
  • Azure
    • Azure Data Lake Storage (ADLS) Gen2
    • Azure SQL DB
  • Databricks
  • Google Cloud Platform
    • Google Cloud Storage
    • Dataproc
  • Hadoop
    • Hadoop Distributed File System (HDFS)
    • Hive
  • Snowflake

Activate your data

You need to understand the installation procedures described in this guide for your platform.

If you’re not confident about meeting the requirements, you can discuss a supported installation by contacting WANdisco.


  • Linux host
  • Java 1.8 64-bit (Latest LTS version).
  • Network connectivity from your Data Migrator host to your target filesystem. For example, ADLS Gen2 endpoint or S3 bucket.
  • Port 8081 accessible on your Linux host (to access the UI through a web browser).
  • Ensure suitable ulimit settings for your deployment. See Increasing ulimits for Data Migrator and Hive Migrator.
  • If migrating from Hadoop Distributed File System (HDFS):
    • Hadoop client libraries must be installed on the Linux host.
    • Ability to authenticate as the HDFS superuser. For example, hdfs.
    • If Kerberos is enabled on your Hadoop cluster, a valid keytab containing a suitable principal for the HDFS superuser must be available on the Linux host.
  • If you want to migrate metadata to or from Apache Hive:
    • The Hive service must be present on the cluster.
    • SSH/CLI access to the cluster.
    • If Kerberos is enabled on your Hadoop cluster, a valid keytab containing a suitable principal for the Hive service user must be available. The host for the keytab will depend on whether you deploy locally, remotely, or both (see the hive agent add hive section for more information).
      • The keytab must be owned by the same user running Data Migrator's metadata migration component.
    • Ensure that the Hive metastore database (such as MySQL and PostgreSQL) can be accessed from the Data Migrator host through the JDBC connection URL. For more information, see Configure CDP target for Hive metadata migrations.

Machine specification

  • 16 CPUs, 48 GB RAM (minimum 4 CPUs, 32 GB RAM)
    • For Hadoop sources, install on an existing edge node. For busy clusters, it may be necessary to dedicate an edge node, use a higher spec machine, or deploy multiple Data Migrator instances to spread the load across the cluster.
  • 200 GB (minimum 100 GB)
    • SSD-based storage is recommended.
  • 2 Gbps minimum network capacity
    • Your network bandwidth must be able to cope with transferring data and ongoing changes from your source filesystem.

Production use configuration

We recommend you configure data migration properties on your Hadoop Distributed File System to ensure smooth operation.

Next steps

Once you have all the prerequisites, set up your network and then install Data Migrator. You can also configure your HDFS NameNode for optimal performance.