Skip to main content
Version: 1.19.1

Configure an HDFS source filesystem

You can migrate data from a Hadoop Distributed File System (HDFS) by configuring it as your source filesystem for LiveData Migrator. After you install LiveData Migrator to an HDFS, it automatically configures that HDFS as a source for data migrations. However, if you have Kerberos enabled or if your Hadoop configuration isn't in a default location, you'll have to manually configure your HDFS source using the steps below.

You can also use these steps to set up a custom HDFS source instead of the default.

note

If you deleted the HDFS source that LiveData Migrator detected automatically, and you want to detect it again, run the command filesystem auto-discover-hdfs in the CLI.

Prerequisites#

You need the following:

  • An HDFS cluster running Hadoop 2.6 or above.
  • If Kerberos is enabled on your filesystem, a valid keytab containing a suitable principal for the HDFS superuser must be available on the Linux host.

Configure an HDFS source filesystem with the UI#

  1. Select your LiveData Migrator product from the Products panel.

  2. Select Add Source Filesystem > Hadoop Distributed File System (HDFS)

  3. Enter the following details:

    • Display Name - Enter a name for your source filesystem.
    • Default FS - Enter the default filesystem from your HDFS configuration.
    • Kerberos Configuration
      • Kerberos Principal - Enter a principal that will map to the HDFS superuser using auth_to_local rules, or add the LiveData Migrator user principal to the superuser group on the Hadoop cluster you're using.
        • For example: Create the Kerberos principal ldmuser@realm.com. Using auth_to_local rules, ensure the principal maps to the user hdfs, or that the user ldmuser is explicitly added to the superuser group.
      • Kerberos Keytab Location - Enter the path to the Kerberos keytab file containing the Kerberos Principal. The keytab file must be accessible to the local system user running the LiveData Migrator service (the default is hdfs) and must be accessible from the edge node where LiveData Migrator is installed.
        • For example: Copy the ldmuser.keytab file (where ldmuser is your intended user) containing the Kerberos principal into the /etc/security/keytabs/ directory on the edge node running LiveData Migrator, make its permissions accessible to the HDFS user running LiveData Migrator, and enter the /etc/security/keytabs/ldmuser.keytab path during Kerberos configuration for the filesystem.
    • Advanced Configuration (optional) - Enter override properties or enter additional properties by adding key/value pairs.
      • Configuration Property File Paths - Enter the directory or directories containing your HDFS configuration (such as the core-site.xml and hdfs-site.xml) on your LiveData Migrator host's local filesystem. This is required if you have Kerberos or a High Availability (HA) HDFS.
      • Configuration Property Overrides (Optional) - Enter override properties or additional properties for your HDFS filesystem by adding key/value pairs.
    • Success File - Enter the file name or glob pattern that LiveData Migrator will use to recognize client application success files when they are created in migration directories. These files will be migrated last after all other data in the directory has been successfully migrated. You can use these files to confirm the directory they're in has finished migrating.
    • Filesystem Options
      • Live Migration - Changes to the underlying source filesystem are detected and handled in real-time. If you deselect this checkbox, this filesystem becomes a static source for one-time migrations.
  4. Select Save.

    You can now migrate data from the HDFS source.

For more information about configuring Kerberos, see the section below. If you problems configuring Kerberos, see the troubleshooting section.

Configure Kerberos#

If you only have Kerberos on the source filesystem, you can use the parameters above to give LiveData Migrator access to your HDFS. If you want to migrate data from a Kerberos-enabled source to a Kerberos-enabled target filesystem with cross-realm trust, use a Kerberos configuration on the source that has the cross-realm trust set up. See the links below for guidance for common Hadoop distributions:

See Configure Kerberos to learn more about configuring Kerberos or to set up a Kerberos-enabled source and Kerberos-enabled target without cross-realm authentication.

Next steps#

Configure a target filesystem to migrate data to. Then create a migration.