Skip to main content
Version: 2.4.3 (latest)

Configure a HDFS target

You can migrate data to a Hadoop Distributed File System (HDFS) by configuring it as a target filesystem for Data Migrator.

Follow these steps to create a HDFS target:

Configure a HDFS target filesystem

Prerequisites

You need the following:

  • A HDFS cluster running Hadoop 2.6 or above.
  • If Kerberos is enabled on your filesystem, a valid keytab containing a suitable principal for the HDFS superuser must be available on the host machine for Data Migrator. See Configure Kerberos.
  • Oracle Big Data Services (BDS) - If running with Oracle's Distribution of Apache Hadoop (ODH), Data Migrator must provide fully qualified hostnames for DNS. Ensure that the following configuration property overrides are added for the target filesystem:
    Configuration property overrides are an option under Advanced Configuration on the Target Filesystem Configuration screen.
    dfs.client.use.datanode.hostname=true
    dfs.datanode.use.datanode.hostname=true

Configure a HDFS target filesystem with the UI

  1. Connect to the UI.

  2. From the Dashboard, select an instance under Instances.

  3. In the Filesystems & Agents menu, select Filesystems.

  4. Select Add target filesystem.

  5. Enter the following details:

    • Filesystem Type - The type of filesystem target. Select Hadoop Distributed File System (HDFS).
    • Display Name - Enter a name for your target filesystem.
    • Default FS - Enter the fs.defaultFS value from your HDFS configuration. For example, hdfs://nameservice:8020.
    • User - If you're not running Kerberos on your target HDFS cluster, enter the name of the filesystem user you want to migrate data with.
    • Kerberos Configuration - The details of your Kerberos configuration. You can authenticate with Kerberos using multi-realm Kerberos, cross-realm trust or target-only Kerberos. See Configure Kerberos.
      • Kerberos Principal - Enter a principal that will map to a HDFS superuser.
      • Kerberos Keytab Location - Enter the path to the Kerberos keytab file containing the Kerberos Principal. The keytab file must be accessible from the edge node where Data Migrator is installed.
    • Advanced Configuration
      • Configuration Property File Paths - Enter the directory or directories containing your target filesystem's HDFS configuration (such as the core-site.xml and hdfs-site.xml) on your Data Migrator host's local filesystem. You require this if you have Kerberos or a High Availability (HA) HDFS.
        note

        Data Migrator reads core-site.xml and hdfs-site.xml once, during filesystem creation, applying any configuration within paths added under Configuration Property File Paths. After creation, these paths are no longer visible in the UI. You can see all filesystem properties using the API.

      • Configuration Property Overrides (Optional) - Enter override properties or additional properties for your HDFS filesystem by adding key/value pairs.
  6. Select Save. You can now use your HDFS target in data migrations.

Configure Kerberos

From the tabs below, select which Kerberos use case you want to set up to view the relevant instructions.

Use Kerberos on the source filesystem only

To set up Kerberos on a source filesystem, enter the Kerberos details for your HDFS source during the HDFS source creation process.

Next steps

If you haven't already, configure a source filesystem from which to migrate data. Then, you can create a migration to migrate data to your new HDFS target.