Skip to main content
Version: 1.18.1

Configure source filesystems

Configure source filesystems for each product to migrate your data from depending on what your environment is:

  • Hadoop Distributed File System (HDFS) - Add one source filesystem only for each product.
  • S3 sources (IBM Cloud Object Storage, Amazon S3) - Add one or more source filesystems.

Configure source filesystems with the UI#

The Filesystems panel shows the source and target filesystems LiveData Migrator can use for data migrations.

Use the Filesystems panel to:

  • View and configure source and target filesystems.
  • Add or remove targets.
  • Configure Amazon S3-compatible targets using the Hadoop S3A configuration available in the UI.
  • Connect to additional LiveData Migrator instances and configure their respective filesystems.

Add a source filesystem#

To add a source filesystem from your LiveData Migrator dashboard, select the following:

  1. The relevant instance from the Products panel.
  2. Add source filesystem in the Filesystem Configuration page.
info

If you have Hadoop Distributed File System (HDFS) in your environment, LiveData Migrator automatically detects it as your source filesystem. However, if Kerberos is enabled, or if your Hadoop configuration doesn't contain the configuration file information required for LiveData Migrator to connect to Hadoop, configure the source filesystem manually with additional Kerberos configuration settings.

If you want to configure a new source manually, delete any existing source first, and then manually add a new source.

note

If you deleted the HDFS source that LiveData Migrator detected automatically, and you want to redetect it, go to the CLI and run the command filesystem auto-discover-hdfs.

Configure Hadoop Distributed File System (HDFS) as a source#

If Kerberos is enabled, enter the following information:

Configure your source filesystem if Kerberos is enabled or if your Hadoop configuration isn't in a default location.

In the Filesystems panel, enter the following information:

  • Filesystem Type - The type of filesystem source. Select Hadoop Distributed File System (HDFS).
  • Display Name - Enter a name for your source filesystem.
  • Default FS - Enter the fs.defaultFS value from your HDFS configuration.
  • Kerberos Configuration
    • Kerberos Principal - Enter a principal that will map to the HDFS super user using auth_to_local rules, or add the LiveData Migrator user principal to the super-user group on the Hadoop cluster you're using.
      • For example: Create the Kerberos principal ldmuser@realm.com. Using auth_to_local rules, ensure the principal maps to the user hdfs, or that the user ldmuser is explicitly added to the super-user group.
    • Kerberos Keytab Location - Enter the path to the Kerberos keytab file containing the Kerberos Principal. The keytab file must be accessible to the local system user running the LiveData Migrator service (default is hdfs), and must be accessible from the edge node where LiveData Migrator is installed.
      • For example: Copy the ldmuser.keytab file (where ldmuser is your intended user) containing the Kerberos principal into the /etc/security/keytabs/ directory on the edge node running LiveData Migrator, make its permissions accessible to the HDFS user running LiveData Migrator, and enter the /etc/security/keytabs/ldmuser.keytab path during Kerberos configuration for the filesystem.
  • Additional Configuration
    • Provide a path to files - Enter the directory or directories containing your HDFS configuration (such as the core-site.xml and hdfs-site.xml) on your LiveData Migrator host's local filesystem. This is required if you have Kerberos or a HA HDFS.
    • Additional Configuration (Optional) - Enter override properties or specify additional properties by adding key/value pairs.

For more information about configuring Kerberos, see the troubleshooting section.

Configure source filesystems with the CLI#

LiveData Migrator migrates data from a single source filesystem. LiveData Migrator automatically detects the Hadoop Distributed File System (HDFS) it's installed on and configures it as the source filesystem. If it doesn't detect the HDFS source automatically, you can validate the source. You can override auto-detection of any HDFS source by manually adding a source filesystem.

note

At this time, Azure Data Lake Storage (ADLS) Gen2 source filesystems can only be used for one-time migrations.

Use the following CLI commands to add source filesystems:

CommandAction
filesystem add adls2 oauthAdd an ADLS Gen 2 filesystem resource using a service principal and oauth credentials
filesystem add adls2 sharedKeyAdd an ADLS Gen 2 filesystem resource using access key credentials
filesystem add gcsAdd a Google Cloud Storage filesystem resource
filesystem add hdfsAdd a HDFS resource
filesystem add s3aAdd an S3 filesystem resource (choose this when using Amazon S3 and IBM Cloud Object Storage)

Validate your source filesystem#

Verify that the correct source filesystem is registered or delete the existing one (you'll define a new source in the Add File Systems step).

If Kerberos is enabled or your Hadoop configuration does not contain the information needed to connect to the Hadoop file system, use the filesystem auto-discover-source hdfs command to provide your Kerberos credentials and auto-discover your source HDFS configuration.

note

If Kerberos is disabled, and Hadoop configuration is on the host, LiveData Migrator will detect the source filesystem automatically on startup.

Manage your source filesystem#

Manage the source filesystem with the following commands:

CommandAction
source clearDelete all sources
source delDelete one source
[filesystem auto-discover-hdfs]Automatically detect an HDFS source
source showView the source filesystem configuration
filesystem auto-discover-source hdfsEnter your Kerberos credentials to access your source HDFS configuration
note

To update existing filesystems, first stop all migrations associated with them. After saving updates to your configuration, you'll need to restart the LiveData Migrator service for your updates to take effect. In most supported Linux distributions, run the command service livedata-migrator restart.