Configure source filesystems
Configure source filesystems for each product to migrate your data from depending on what your environment is:
- Hadoop Distributed File System (HDFS) - Add one source filesystem only for each product.
- S3 sources (IBM Cloud Object Storage, Amazon S3) - Add one or more source filesystems.
#
Configure source filesystems with the UIThe Filesystems panel shows the source and target filesystems LiveData Migrator can use for data migrations.
Use the Filesystems panel to:
- View and configure source and target filesystems.
- Add or remove targets.
- Configure Amazon S3-compatible targets using the Hadoop S3A configuration available in the UI.
- Connect to additional LiveData Migrator instances and configure their respective filesystems.
#
Add a source filesystemTo add a source filesystem from your LiveData Migrator dashboard, select the following:
- The relevant instance from the Products panel.
- Add source filesystem in the Filesystem Configuration page.
info
If you have Hadoop Distributed File System (HDFS) in your environment, LiveData Migrator automatically detects it as your source filesystem. However, if Kerberos is enabled, or if your Hadoop configuration doesn't contain the configuration file information required for LiveData Migrator to connect to Hadoop, configure the source filesystem manually with additional Kerberos configuration settings.
If you want to configure a new source manually, delete any existing source first, and then manually add a new source.
note
If you deleted the HDFS source that LiveData Migrator detected automatically, and you want to redetect it, go to the CLI and run the command filesystem auto-discover-hdfs
.
- HDFS
- Amazon S3
- IBM Cloud Object Storage (preview)
- Local Filesystem
- ADLS Gen2 (preview)
#
Configure Hadoop Distributed File System (HDFS) as a sourceIf Kerberos is enabled, enter the following information:
Configure your source filesystem if Kerberos is enabled or if your Hadoop configuration isn't in a default location.
In the Filesystems panel, enter the following information:
- Filesystem Type - The type of filesystem source. Select Hadoop Distributed File System (HDFS).
- Display Name - Enter a name for your source filesystem.
- Default FS - Enter the
fs.defaultFS
value from your HDFS configuration. - Kerberos Configuration
- Kerberos Principal - Enter a principal that will map to the HDFS super user using
auth_to_local
rules, or add the LiveData Migrator user principal to the super-user group on the Hadoop cluster you're using.- For example: Create the Kerberos principal ldmuser@realm.com. Using
auth_to_local
rules, ensure the principal maps to the userhdfs
, or that the userldmuser
is explicitly added to the super-user group.
- For example: Create the Kerberos principal ldmuser@realm.com. Using
- Kerberos Keytab Location - Enter the path to the Kerberos keytab file containing the Kerberos Principal. The keytab file must be accessible to the local system user running the LiveData Migrator service (default is
hdfs
), and must be accessible from the edge node where LiveData Migrator is installed.- For example: Copy the
ldmuser.keytab
file (whereldmuser
is your intended user) containing the Kerberos principal into the/etc/security/keytabs/
directory on the edge node running LiveData Migrator, make its permissions accessible to the HDFS user running LiveData Migrator, and enter the/etc/security/keytabs/ldmuser.keytab
path during Kerberos configuration for the filesystem.
- For example: Copy the
- Kerberos Principal - Enter a principal that will map to the HDFS super user using
- Additional Configuration
- Provide a path to files - Enter the directory or directories containing your HDFS configuration (such as the
core-site.xml
andhdfs-site.xml
) on your LiveData Migrator host's local filesystem. This is required if you have Kerberos or a HA HDFS. - Additional Configuration (Optional) - Enter override properties or specify additional properties by adding key/value pairs.
- Provide a path to files - Enter the directory or directories containing your HDFS configuration (such as the
For more information about configuring Kerberos, see the troubleshooting section.
#
Configure an Amazon S3 bucket as a sourceTo configure an Amazon S3 bucket as a source for LiveData Migrator, enter the following:
Display Name - A name for your source filesystem.
Filesystem Type - The type of filesystem source. Select Amazon S3.
Bucket Name - The reference name of the Amazon S3 bucket you are using.
Authentication Method - The Java class name of a credentials provider for authenticating with the S3 endpoint. This isn't a required parameter when adding an IBM Cloud Object Storage bucket through the UI.
The Authentication Method options available include:
Access Key and Secret
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
Use this provider to enter credentials as an access key and secret access key with the following entries:
Access Key - Specify the AWS access key. For example,
RANDOMSTRINGACCESSKEY
.Secret Key - Specify the secret key that corresponds with your Access Key. For example,
RANDOMSTRINGPASSWORD
.
AWS Identity and Access Management
com.amazonaws.auth.InstanceProfileCredentialsProvider
Use this provider if you're running LiveData Migrator on an EC2 instance that has been assigned an IAM role with policies that allow it to access the S3 bucket.
AWS Hierarchical Credential Chain
com.amazonaws.auth.DefaultAWSCredentialsProviderChain
A commonly used credentials provider chain that looks for credentials in this order:
- Environment Variables -
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
, orAWS_ACCESS_KEY
andAWS_SECRET_KEY
. - Java System Properties -
aws.accessKeyId
andaws.secretKey
. - Web Identity Token credentials from the environment or container.
- Credential profiles file at the default location (
~/.aws/credentials
) shared by all AWS SDKs and the AWS CLI. - Credentials delivered through the Amazon EC2 container service if the
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
environment variable is set and security manager has permission to access the variable. - Instance profile credentials delivered through the Amazon EC2 metadata service.
- Environment Variables -
Environment Variables
com.amazonaws.auth.EnvironmentVariableCredentialsProvider
Use this provider to enter an access key and a secret access key as either
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
, orAWS_ACCESS_KEY
andAWS_SECRET_KEY
.EC2 Instance Metadata Credentials
com.amazonaws.auth.InstanceProfileCredentialsProvider
Use this provider if you need instance profile credentials delivered through the Amazon EC2 metadata service.
Custom Provider Class
Use this if you want to enter your own class for the credentials provider.
JCEKS Keystore
fs.s3a.security.credential.provider.path
Use this if you want to authenticate with your Amazon S3 source filesystem using a JCEKS keystore. Enter a path, for example, --properties fs.s3a.security.credential.provider.path=jceks://hdfs@jceks://hdfs@nameservice::8020/credentials/aws/aws.jceks --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
The keystore needs to contain values for fs.s3a.access.key and fs.s3a.secret.key.
The access key and secret key are already in the keystore properties file so you don't need to enter them once you've saved the path.
If you use a JCEKS file stored on a Hadoop Distributed File System (HDFS), ensure you add that HDFS as a source or target filesystem, otherwise LiveData Migrator won't be able to find the file.
Profile Credentials Provider
com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider
Use this if you want to authenticate with your Amazon S3 source filesystem using multiple sets of credentials, such as when targeting multiple AWS S3 buckets, each requiring its own credentials.
AWS Named Profile - A name for the AWS profile.
Credentials File Path - The path to the AWS profile credentials file. The default path is set to
~/.aws/credentials
, if not specified. This will resolve to the home directory of the user running LiveData Migrator. For example, the path for a system user "wandisco" would be/home/wandisco/.aws/credentials
.S3 Service Endpoint - The endpoint for the target AWS S3 bucket. See
--endpoint
in the S3A parameters.
For more information, see Using the AWS Credentials File and Credential Profiles.
Simple Queue Service (SQS) Endpoints (Optional)
LiveData Migrator listens to the event queue to continually migrate changes from source file paths to target filesystem(s).
If you add an S3 source, you have 3 options regarding the queue:Add the source without a queue. LiveData Migrator create a queue automatically.
If you want LiveData Migrator to create its own queue, ensure your account has the necessary permissions to create and manage SQS queues and attach them to S3 buckets.Add the source and specify a queue but no endpoint. This allows you to use a queue that exists in a public endpoint.
If you define your own queue, the queue must be attached to the S3 bucket.
For more information about adding queues to buckets, see the AWS documentation.Add the source and specify a queue and a service endpoint. The endpoint can be a public or a private endpoint.
For more information about public endpoints, see the Amazon SQS endpoints documentation.Queue - Enter the name of your SQS queue. This field is mandatory if you enter an SQS endpoint.
Endpoint - Enter the URL that you want LiveData Migrator to use.
S3A Properties (Optional) - Override properties or specify additional properties by adding key/value pairs.
Live Migration - Enabled by default, this setting allows LiveData Migrator to migrate ongoing changes automatically from this source to the target filesystem during a migration. If you deselect the checkbox, or if the source filesystem doesn't allow live migrations to take place, LiveData Migrator uses one-time migration.
#
Configure IBM Cloud Object Storage as a source (preview)To configure an an IBM Cloud Object Storage bucket as a source filesystem, select IBM Cloud Object Storage in the Filesystem Type dropdown list.
Enter the following information:
- Filesystem Type - The type of filesystem source. Select IBM Cloud Object Storage.
- Display Name - The name given for your IBM Cloud Object Storage.
- Access Key - The access key for your authentication credentials, associated with the fixed authentication credentials provider
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
.
note
Although IBM Cloud Object Storage can use other providers (for example InstanceProfileCredentialsProvider, DefaultAWSCredentialsProviderChain), they're only available in the cloud, not for on-premises. As on-premises is currently the expected type of source, these other providers have not been tested and are not currently selectable.
- Secret Key - Specify the secret key using this parameter, used for the
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
credentials provider. - Bucket Name - The name of your Cloud Object Store bucket.
- Topic - The name of the Kafka topic to which the notifications will be sent.
- Endpoint - An endpoint for a Kafka broker, in a host/port format.
- Bootstrap Servers - A comma-separated list of host and port pairs that are addresses for Kafka brokers on a "bootstrap" Kafka cluster that Kafka clients use to bootstrap themselves.
- Port - The TCP port used for connection to the IBM Cloud Object Storage bucket. Default is 9092.
note
Migrations from IBM Cloud Object Storage use Amazon S3, along with its file system classes. The main difference between IBM Cloud Object Storage and Amazon S3 is in the messaging services: SQSQueue for Amazon S3, and Kafka for IBM Cloud Object Storage.
#
Configure notifications for migrationMigrating data from IBM Cloud Object Storage requires that filesystem events are fed into a Kafka-based notification service. Whenever an object is written, overwritten, or deleted using the S3 protocol, a notification is created and stored in a Kafka topic - a message category under which Kafka publishes the notifications stream.
#
Configure Kafka notificationsEnter the following information into the IBM Cloud Object Storage Manager web interface.
- Select the Administration tab.
- In the Notification Service section, select Configure.
- On the Notification Service Configuration page, select Add Configuration.
- In the General section, enter the following:
- Name: A name for the configuration, for example "IBM Cloud Object Storage Notifications".
- Topic: The name of the Kafka topic to which the notifications will be sent.
- Hostnames: List of Kafka node endpoints (host:port) format. Note that larger clusters may support multiple nodes.
- Type: Type of configuration.
(Optional) In the Authentication section, select Enable authentication and enter your Kafka username and password.
(Optional) In the Encryption section, select Enable TLS for Apache Kafka network connections.
- If the Kafka cluster is encrypted using a self-signed TLS certificate, paste the root CA key for your Kafka configuration in the Certificate PEM field.
Select Save.
- A message appears confirming that the notification was created successfully and the configuration is listed in the Notification Service Configurations table.
Select the name of the configuration (defined in step 4) to assign vaults.
In the Assignments section, select Change.
In the Not Assigned tab, select vaults and select Assign to Configuration. Filter available vaults by selecting or typing a name into the Vault field.
note
Notification configurations can't be assigned to containers vaults, mirrored vaults, vault proxies, or vaults that are migrating data. Once a notification is assigned to configuration, an associated vault can't be used in a mirror, with a vault proxy, or for data migration.
Only new operations that occur after a vault is assigned to the configuration will trigger notifications.
- Select Update.
note
For more information, see the Apache Kafka documentation.
#
Configure a local filesystem as a sourceTo configure a local filesystem as a source, enter the following information:
- Display Name - Enter a name for your source filesystem.
- Filesystem Type - The type of filesystem source. Select Local Filesystem.
- Mount Point - The directory in the local filesystem to use as the source filesystem. You can migrate any data in the Mount Point directory.
note
Local filesystems don't provide change notifications, so Live Migration isn't enabled for local filesystem sources.
#
Configure Azure Data Lake Storage (ADLS) Gen2 as a source (preview)note
ADLS Gen2 as a source is currently a preview feature, and is subject to change.
You can use them for one-time migrations only - not for live migrations.
In the Filesystem panel, enter the following information:
- Filesystem Type - The type of filesystem source. Select Azure Data Lake Storage (ADLS) Gen2.
- Display Name - Enter a name for your source filesystem.
- Authentication Type - The authentication type to use when connecting to your filesystem. Select either Shared Key or Service Principal (OAuth2).
- You'll be asked to provide the security details of your Azure storage account. These will vary depending on which Authentication Type you select. See below.
- Use Secure Protocol - This checkbox determines whether to use TLS encryption in communication with ADLS Gen2. This is enabled by default.
note
As ADLS Gen2 source filesystems can only be used for one-time migrations, the Migrate live events checkbox isn't enabled.
The Azure storage account details necessary will vary depending on whether you selected Shared Key or Service Principal (OAuth2):
#
Shared key- Account Name - The Microsoft Azure account name that owns the data lake storage.
- Access Key - The access key associated with the Microsoft Azure account.
- Container Name - The ADLS Gen2 container you want to migrate data from.
#
Service principal (OAuth2)- Account Name - The Microsoft Azure account name that owns the data lake storage.
- Container Name - The ADLS Gen2 container you want to migrate data from.
- Client ID - The client ID (also known as application ID) for your Azure service principal.
- Secret - The client secret (also known as application secret) for the Azure service principal.
- Endpoint - The client endpoint for the Azure service principal. This is referenced in the UI as Endpoint. This will often take the form of https://login.microsoftonline.com/{tenant}/oauth2/v2.0/token where {tenant} is the directory ID for the Azure service principal. You can specify a custom URL (such as a proxy endpoint that manually interfaces with Azure Active Directory).
Select Save once you've entered all necessary configuration information.
#
Configure source filesystems with the CLILiveData Migrator migrates data from a single source filesystem. LiveData Migrator automatically detects the Hadoop Distributed File System (HDFS) it's installed on and configures it as the source filesystem. If it doesn't detect the HDFS source automatically, you can validate the source. You can override auto-detection of any HDFS source by manually adding a source filesystem.
note
At this time, Azure Data Lake Storage (ADLS) Gen2 source filesystems can only be used for one-time migrations.
Use the following CLI commands to add source filesystems:
Command | Action |
---|---|
filesystem add adls2 oauth | Add an ADLS Gen 2 filesystem resource using a service principal and oauth credentials |
filesystem add adls2 sharedKey | Add an ADLS Gen 2 filesystem resource using access key credentials |
filesystem add gcs | Add a Google Cloud Storage filesystem resource |
filesystem add hdfs | Add a HDFS resource |
filesystem add s3a | Add an S3 filesystem resource (choose this when using Amazon S3 and IBM Cloud Object Storage) |
#
Validate your source filesystemVerify that the correct source filesystem is registered or delete the existing one (you'll define a new source in the Add File Systems step).
If Kerberos is enabled or your Hadoop configuration does not contain the information needed to connect to the Hadoop file system, use the filesystem auto-discover-source hdfs
command to provide your Kerberos credentials and auto-discover your source HDFS configuration.
note
If Kerberos is disabled, and Hadoop configuration is on the host, LiveData Migrator will detect the source filesystem automatically on startup.
#
Manage your source filesystemManage the source filesystem with the following commands:
Command | Action |
---|---|
source clear | Delete all sources |
source del | Delete one source |
[filesystem auto-discover-hdfs ] | Automatically detect an HDFS source |
source show | View the source filesystem configuration |
filesystem auto-discover-source hdfs | Enter your Kerberos credentials to access your source HDFS configuration |
note
To update existing filesystems, first stop all migrations associated with them.
After saving updates to your configuration, you'll need to restart the LiveData Migrator service for your updates to take effect. In most supported Linux distributions, run the command service livedata-migrator restart
.