Skip to main content
Version: 1.19.1

Prerequisites

Ready to start? Follow these steps to get ready to use LiveData Migrator.

Read the release notes to get the latest information about the current version of LiveData Migrator.

Recommended technical knowledge#

System administration#

  • Linux operating system installation
  • Disk management
  • Memory monitoring and management
  • Command line administration and manually editing configuration files
  • Service configuration and management

Networking#

  • IP address allocation
  • TCP/IP ports and firewall setup or server certificates (for TLS)

Cloud storage technologies#

You need to be proficient with your intended target storage technologies, such as:

  • Azure Data Lake Storage (ADLS) Gen2
  • Google Cloud Storage
  • Hadoop Distributed File System (HDFS)
  • S3
    • For Amazon Web Services, this includes:
      • Knowledge of AWS Marketplace, Amazon Simple Storage Service (Amazon S3), AWS Glue Data Catalog, and AWS Command Line Tool.
      • Understanding any storage persistence and related costs.
      • Ability to monitor and troubleshoot AWS services.

LiveData#

You need to understand the installation procedures described in this guide for your platform.

If you’re not confident about meeting the requirements, you can discuss a supported installation by contacting WANdisco.

Prerequisites#

  • Linux host
  • Java 1.8 64-bit.
  • Network connectivity from your LiveData Migrator host to your target filesystem. For example, ADLS Gen2 endpoint or S3 bucket.
  • Port 8081 accessible on your Linux host (to access the UI through a web browser).
  • Ensure suitable ulimit settings for your deployment. See Increasing ulimits for Data Migrator and Hive Migrator.
  • If migrating from Hadoop Distributed File System (HDFS):
    • Hadoop client libraries must be installed on the Linux host.
    • Ability to authenticate as the HDFS superuser. For example, hdfs.
    • If Kerberos is enabled on your Hadoop cluster, a valid keytab containing a suitable principal for the HDFS superuser must be available on the Linux host.
  • If you want to migrate metadata to or from Apache Hive:
    • The Hive service must be present on the cluster.
    • SSH/CLI access to the cluster.
    • If Kerberos is enabled on your Hadoop cluster, a valid keytab containing a suitable principal for the Hive service user must be available. The host for the keytab will depend on whether you deploy locally, remotely, or both (see the hive agent add hive section for more information).
      • The keytab must be owned by the same user running LiveData Migrator's metadata migration component.
    • Ensure that the Hive metastore database (such as MySQL and PostgreSQL) can be accessed from the LiveData Migrator host through the JDBC connection URL. For more information, see Configure CDP target for Hive metadata migrations.

Machine specification#

  • 16 CPUs, 48 GB RAM (minimum 4 CPUs, 32 GB RAM)
    • For Hadoop sources, install on an existing edge node. For busy clusters, it may be necessary to dedicate an edge node, use a higher spec machine, or deploy multiple LiveData Migrator instances to spread the load across the cluster.
  • 200 GB (minimum 100 GB)
    • SSD-based storage is recommended.
  • 2 Gbps minimum network capacity
    • Your network bandwidth must be able to cope with transferring data and ongoing changes from your source filesystem.

Production use configuration#

We recommend you configure data migration properties on your Hadoop Distributed File System to ensure smooth operation.

Next steps#

Once you have all the prerequisites, set up your network and then install LiveData Migrator. You can also configure your HDFS NameNode for optimal performance.