Configure filesystems
The first step in the process of migrating data is to configure your filesystems, which defines where data is being migrated from (the source) and to (the target).
- Supported sources are: HDFS, Amazon Simple Storage Service (Amazon S3) and local storage
- Supported targets are: ADLS Gen2, Amazon Simple Storage Service (Amazon S3), Google Cloud Storage, IBM Cloud Object Storage (S3) and HDFS
Configure filesystems with either the UI or the CLI.
#
Configure filesystems with the UIThe Filesystem panel shows the filesystems LiveData Migrator uses as either a source or target for data migrations. LiveData Migrator supports one source and one or more targets. Each filesystem displays its associated processes, such as which LiveData Migrator is used to access it.
Use the Filesystem panel to:
- View and configure the source and target filesystems.
- Add or remove targets.
- Add additional LiveData Migrator servers and LiveData Plane servers.
- Configure Amazon S3-compatible targets using the Hadoop S3A configuration exposed in the UI.
- Connect to additional LiveData Migrator or LiveData Plane instances and configure their respective filesystems.
#
Configure source filesystemYou can can add a new source by performing one of the below actions:
- Configure the Unknown source on the LiveData Migrator dashboard
- Click the add prompt under "Products" on the LiveData Migrator dashboard
- Click the add source prompt under the LiveData Migrator overview page
You can create a source of one of the following three types:
- HDFS
- Amazon S3 bucket
- Local filesystem
info
By default, LiveData Migrator will normally detect the HDFS source filesystem (if available) on startup. It will not be detected automatically if Kerberos is enabled or your Hadoop configuration does not contain the information needed to connect to the Hadoop file system.
If the automatic detection does not work, configure the HDFS source filesystem manually.
If you want to manually configure a source for LiveData Migrator to use, you must first delete any existing source and add your own.
note
If you have deleted the automatically discovered HDFS source but want to restore it, you can run service livedata-migrator restart
. Upon restarting, LiveData Migrator will automatically attempt to discover the HDFS source again.
#
Source HDFS configurationIf Kerberos is enabled, provide the following details:
Configure your source filesystem if Kerberos is enabled or Hadoop configuration is in a non-default location.
In the Filesystem panel, select to configure your Unknown source and provide your source HDFS configuration:
- Filesystem ID - Provide a name for your source filesystem.
- Filesystem Type - The type of filesystem source. Choose HDFS.
- Default FS - Provide the
fs.defaultFS
value from your HDFS configuration. - Kerberos Configuration
- Kerberos Principal - Provide a principal that will map to the HDFS super user using
auth_to_local
rules, or add the LiveData Migrator user principal to the super-user group on the Hadoop cluster you're using.- For example: Create the Kerberos principal ldmuser@realm.com. Using
auth_to_local
rules, ensure the principal maps to the userhdfs
, or that the userldmuser
is explicitly added to the super-user group.
- For example: Create the Kerberos principal ldmuser@realm.com. Using
- Kerberos Keytab Location - Provide the path to the Kerberos keytab file containing the Kerberos Principal. The keytab file must be accessible to the local system user running the LiveData Migrator service (default is
hdfs
), and must be accessible from the edge node where LiveData Migrator is installed.- For example: Copy the
ldmuser.keytab
file (whereldmuser
is your intended user) containing the Kerberos principal into the/etc/security/keytabs/
directory on the edge node running LiveData Migrator, make its permissions accessible to the HDFS user running LiveData Migrator, and enter the/etc/security/keytabs/ldmuser.keytab
path during Kerberos configuration for the filesystem.
- For example: Copy the
- Kerberos Principal - Provide a principal that will map to the HDFS super user using
- Additional Configuration
- Provide a path to files - Provide the directory or directories containing your HDFS configuration (such as the
core-site.xml
andhdfs-site.xml
) on your LiveData Migrator host's local filesystem. - Additional Configuration (Optional) - Override properties or specify additional properties by adding Key/Value pairs.
- Provide a path to files - Provide the directory or directories containing your HDFS configuration (such as the
For more detailed assistance with configuring Kerberos, see the troubleshooting section.
#
Source Amazon S3 bucket configurationTo configure an Amazon S3 bucket source for use with LiveData Migrator, provide the following details:
Filesystem ID - Provide a name for your source filesystem.
Filesystem Type - The type of filesystem source. Choose Amazon S3.
Bucket Name - The reference name of the Amazon S3 bucket you are using.
Credentials Provider - The Java class name of a credentials provider for authenticating with the S3 endpoint. This is not a required parameter when adding an IBM COS bucket through the UI. The Provider options available include:
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
Use this provider to offer credentials as an access key and secret access key with the
--access-key
and--secret-key
Parameters.com.amazonaws.auth.InstanceProfileCredentialsProvider
Use this provider when running LiveData Migrator on an EC2 instance that has been assigned an IAM role with policies that allow it to access the S3 bucket.
com.amazonaws.auth.DefaultAWSCredentialsProviderChain
A commonly-used credentials provider chain that looks for credentials in this order:
- Environment Variables -
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
, orAWS_ACCESS_KEY
andAWS_SECRET_KEY
. - Java System Properties -
aws.accessKeyId
andaws.secretKey
. - Web Identity Token credentials from the environment or container.
- Credential profiles file at the default location (
~/.aws/credentials
) shared by all AWS SDKs and the AWS CLI. - Credentials delivered through the Amazon EC2 container service if
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
environment variable is set and security manager has permission to access the variable. - Instance profile credentials delivered through the Amazon EC2 metadata service.
- Environment Variables -
Access Key (Optional) - When using the
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
credentials provider, specify the access key with this parameter. This is a required parameter when adding an IBM COS bucket.Secret Key (Optional) - When using the
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
credentials provider, specify the secret key using this parameter. This is a required parameter when adding an IBM COS bucket.S3A Properties (Optional) - Override properties or specify additional properties by adding Key/Value pairs.
Migrate Live Events - Enabled by default, this setting will allow LiveData Migrator to automatically migrate changes from this source's data to the target filesystem during a migration. See One-Time Migration for more information.
note
As an additional step, ensure your account has the necessary SQS permissions to access the bucket as filesystem.
For example, configuring an allow rule for sqs:*
will allow all organization users configured with SQS to perform the necessary actions with LiveData Migrator.
#
Local filesystem source configurationTo configure a local filesystem source for use with LiveData Migrator, provide the following details:
- File System ID - Provide a name for your source filesystem.
- Filesystem Type - The type of filesystem source. Choose Local Filesystem.
- Mount Point - The directory within the local filesystem to use as the source filesystem. You can migrate any data contained within the Mount Point directory.
note
If Kerberos is disabled, and Hadoop configuration is on the host, LiveData Migration will automatically detect the source filesystem on startup.
Hadoop should be installed globally on the filesystem to allow LiveData Migrator to access Hadoop configuration during automatic detection. Alternatively, if you're running LiveData Migrator for a single user's environment, Hadoop should be made available to the agent running the service on the PATH environment variable:
Systemctl
sudo systemctl set-environment PATH=$PATH
#
Add target filesystemsSelecting to configure your Target filesystem on the Filesystem panel, see the links below for the configuration needed for each platform:
- ADLS Gen2 - The configuration needed will depend on the Authentication Type chosen:
- S3 / IBM Cloud Object Storage (S3)
- Google Cloud Storage
- HDFS
#
Update target filesystem configurationsUpdate a target filesystem's configuration in its Filesystem Configuration panel by clicking on it in the Filesystems & Agents list in the LiveData Migrator dashboard. Update the filesystem details as required and click Save to confirm the changes.
The following details can be reconfigured:
Filesystem type | Details |
---|---|
Google Cloud Storage | The credentials file type and credentials file path can be changed. Cannot change bucket name. |
ADLS Gen2 | Authentication fields can be changed, but not authentication type or container name. |
HDFS Target | Only configuration files can be changed. |
Amazon S3 | Authentication fields can be changed, but not the authentication type, privateLink or bucket name. |
IBM Cloud Object Storage | Authentication fields and configuration files can be changed. The endpoint is not reconfigurable. |
note
You must first stop all migrations using a filesystem to update it.
#
Delete target filesystemsDelete a target filesystem from its Filesystem Configuration panel. Click the Delete Filesystem button. Before you can delete a target filesystem you must ensure that you first delete any associated migrations.
#
Configure filesystems with the CLI#
Validate your sourceLiveData Migrator migrates data from a source filesystem. Verify that the correct source filesystem is registered or delete the existing one (you'll define a new source in the Add File Systems step).
If Kerberos is enabled or your Hadoop configuration does not contain the information needed to connect to the Hadoop file system, use the filesystem auto-discover-source hdfs
command to provide your Kerberos credentials and auto-discover your source HDFS configuration.
note
If Kerberos is disabled, and Hadoop configuration is on the host, LiveData Migration will automatically detect the source filesystem on startup.
Manage the source filesystem with the following commands:
Command | Action |
---|---|
source clear | Delete all sources |
source del | Delete a source |
source fs show | Show the source FileSystem configuration |
filesystem auto-discover-source hdfs | Provide your Kerberos credentials to access your source HDFS configuration |
#
Add file systemsAdd file systems to provide LiveData Migrator with the information needed to read content from your source and migrate content to your target.
A range of different file system types are supported as targets, including ADLS Gen 2, HDFS, GCS, and S3A.
note
LiveData Migrator currently supports HDFS as a migration source.
If your source file system was not discovered automatically or you wish to assign a new source file system, use the --source
parameter with the filesystem add hdfs
command to add a suitable HDFS source file system.
You can define multiple target file systems, which you can migrate to at the same time.
Command | Action |
---|---|
filesystem add adls2 oauth | Add an ADLS Gen 2 filesystem resource using a service principal and oauth credentials |
filesystem add adls2 sharedKey | Add an ADLS Gen 2 filesystem resource using access key credentials |
filesystem add gcs | Add a Google Cloud Filesystem filesystem resource |
filesystem add hdfs | Add an HDFS resource |
filesystem add s3a | Add an S3 filesystem resource (choose this when using IBM COS) |
#
Manage file systemsCommand | Action |
---|---|
filesystem clear | Delete all target file systems |
filesystem del | Delete a target file system |
filesystem list | List of target file systems |
filesystem show | Get target file system details |
filesystem types | List the types of target file systems available |
#
Check path statusCheck the status of a path on your source filesystem to view any scheduled work to be performed on it.
#
Check path status in the UI- From the main LiveData Migrator dashboard, click the triple dot button next to one of your filesystems
- In the menu that appears, select Path Status
- Select a source filesystem from the Select a source filesystem dropdown menu
- Enter the full path of a file on the source filesystem
- Click Search
You will be shown information about the file, such as the migration it's associated with, the target and file path it's expected to migrate to, and whether or not any work is scheduled on the file.
#
Check path status through the CLIUse the migration path status
command to view information about a file path, such as the migration it's associated with, the target and file path it's expected to migrate to, and whether or not any work is scheduled on the file.
#
Configure filesystems for one-time migrationsIt's possible to create a source filesystem that is not tracked by LiveData Migrator for changes during a migration. Migrations created from this type of source will become one-time migrations by default. Note that it is not necessary to create a filesystem of this type to create a one-time migration.
#
Create filesystems for one-time migrations with the UITo create a source filesystem for a one-time migration, uncheck the Migrate Live Events box when you configure the filesystem. When creating a migration from the UI from the filesystem created, the UI will uncheck the live migration option and prevent it from being enabled.
#
Create filesystem for one-time migrations with the CLILiveData Migrator will only perform read tasks on a source filesystem created for one-time migrations. It will not check the source filesystem for modifications to data during transfer. Any migration that uses the source filesystem will automatically become a one-time migration, and will have the scanOnly
flag applied.
To create a source for one-time migrations, add the scanOnly
flag during source creation:
filesystem add hdfs --source --scanOnly ...
note
The account used to connect to a source filesystem intended for one-time migrations only requires read access. Write access is not necessary.
#
Next StepsOnce you have your source and target filesystem configured, you're ready to migrate data. If you want migrate data to a different path on your target filesystem, create path mappings first.
If you want to exclude specific file sizes or file names from your data migrations, define exclusions.