Skip to main content
Version: 2.2

Configure Kerberos

Data Migrator supports a Kerberos-enabled Hadoop Distributed File System (HDFS) environment as a source filesystem. Data Migrator runs as a service and uses a Kerberos keytab for authentication. It doesn't use credential caches or passwords for the principal.

Kerberos use cases

Data Migrator supports the following Kerberos use cases:

  • Kerberos is only active on the source filesystem
  • Kerberos is active on both the source and target filesystems, with cross-realm trust enabled
  • Kerberos is only active on the target filesystem
    • Install Data Migrator on the intended target filesystem and set up the source cluster as a HDFS source filesystem.
    • Set up krb5.conf on the source filesystem with the necessary configuration and keytabs to give it access to the target filesystem through Kerberos.
  • Kerberos is active on both the source and target filesystems, but cross-realm trust is not enabled

Kerberos is active on both the source and target filesystems, but cross-realm trust is not enabled

Data Migrator supports migration between clusters that run independent Kerberos implementations.

Prerequisites

The Data Migrator service needs access to:

  • A copy of the Kerberos configuration file (krb5.conf) with both realms defined.
  • The domain, Key Distribution Center (KDC), and admin server mappings set.
  • A keytab with the principal for the HDFS superuser of the target filesystem.

Network communication must also be available from the Data Migrator service to the KDC and the admin server of the target filesystem.

Set up Kerberos without cross-realm trust

  1. Copy the target cluster keytab and sites files, and place them on the source filesystem:

    1. On the source filesystem, create a folder for the keytab and Hadoop sites configuration files:

      mkdir -p /path/keytabs
      mkdir -p /path/sites
    2. Copy the keytab files from the target filesystem to the source:

      For Data Migrator, copy your HDFS keytab.

      scp root@<target_host>:/etc/security/keytabs/hdfs.headless.keytab /path/keytabs/

      For Hivemigrator, copy your Hive service keytab.

      scp root@<target_host>:/etc/security/keytabs/hive.service.keytab /path/keytabs/
    3. Ensure the keytab files have the correct owner and group (For example, your service user):

      For Data Migrator,

      chown hdfs:hadoop /path/keytabs/hdfs.headless.keytab

      For Hivemigrator,

      chown hive:hadoop /path/keytabs/hive.service.keytab
    4. Copy the core-site.xml and hdfs-site.xml files from the target to the source:

      scp root@<target_host>:/etc/hadoop/conf/core-site.xml /path/sites/
      scp root@<target_host>:/etc/hadoop/conf/hdfs-site.xml /path/sites/
  2. Create a copy of the Kerberos configuration file on all Data Migrator instances. For example:

    copy krb5.conf using superuser privileges
    cp -p /etc/krb5.conf /etc/remote/krb5.conf
  3. Adjust the service variables to use the Kerberos configuration file:

    For Data Migrator, open /etc/wandisco/livedata-migrator/vars.env and add the following LDM_EXTRA_JVM_ARGS:

     LDM_EXTRA_JVM_ARGS="-Djava.security.krb5.conf=/etc/remote/krb5.conf"

    For Hivemigrator, open /etc/wandisco/hivemigrator/vars.sh and add the following LDM_EXTRA_JVM_ARGS:

    HVM_EXTRA_JVM_ARGS="-Djava.security.krb5.conf=/etc/remote/krb5.conf"
  4. In /etc/remote/krb5.conf, add the realm from the target to the source, and map [domain-realm] from the target domain to the target realm.

    Add the target realm, copied from target /etc/remote/krb5.conf:

    Source /etc/remote/krb5.conf
    [realms]
    SOURCE_REALM = {
    kdc = <source-host-domain.com>
    admin_server = <source-host-domain.com>
    } //Source realm
    TARGET_REALM = {
    kdc = target-host-domain.com
    admin_server = target-host-domain.com
    }
  5. Insert the following mapping into the [domain_realm] section of the /etc/remote/krb5.conf:

    Source /etc/remote/krb5.conf
    [domain_realm]
    .wandisco.hadoop1 = SOURCE_REALM
    wandisco.hadoop1 = SOURCE_REALM
    .host-domain.com = SOURCE_REALM
    host-domain.com = SOURCE_REALM
    target-host2-domain.com = TARGET_REALM
    .target-host2-domain.com = TARGET_REALM

    The explicit address mapping of target-host2 stops the target realm from being mapped as the source realm if their domain patterns match.

  6. Restart Data Migrator services. For example:

    systemctl restart livedata-migrator
    systemctl restart hivemigrator

Create the source and target filesystems

You can create the source and target filesystems in the WANdisco CLI or the WANdisco UI.

Create the filesystems in the CLI

Run the following commands in the CLI:

  1. Use the local keytab location and local Kerberos principal in the creation of the source filesystem:

    filesystem add hdfs --file-system-id sourceHdfs --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal <source_principal> --source --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml --user hdfs
  2. Use the downloaded keytab and Kerberos principal from the target cluster in the creation of the target filesystem:

    filesystem add hdfs --file-system-id targetHdfs --kerberos-keytab /path/keytabs/hdfs.headless.keytab --kerberos-principal <target_principal> --properties-files /path/sites/core-site.xml,/path/sites/hdfs-site.xml --user hdfs

    See the Command Reference for more information.

  3. Create a test migration to validate that the Kerberos configuration is working correctly.

Create the filesystems in the UI

For the source filesystem, auto-discovery picks up the Kerberos configuration details automatically. If you manually add a source filesystem, enter the same Kerberos Configuration as shown in the following steps.

Add a target filesystem in the UI and complete the following steps:

  1. Select the applicable Data Migrator instance from the Products panel.

  2. Add an Apache Hadoop target filesystem in the Filesystem Configuration page.

  3. Under Kerberos Configuration, enter the Kerberos Principal and the Kerberos Keytab Location.

  4. Under Advanced Configuration, enter the paths for core-site.xml and hdfs.xml into the Configuration Property Files Paths entry field. The default path is:

    Example path containing source cluster configuration
    /etc/hadoop/conf

Configuration steps for multi-realm Kerberos

By default, a Kerberos principal must match against a rule that transforms the principal to a short form, such as a user account name (without '@' or '/'). Otherwise, a principal won't be authorized. The default_realm on both the target and source filesystems must match to prevent the following error:

Example error
javax.security.auth.login.LoginException:
java.lang.IllegalArgumentException: Illegal principal name hdfs@WANDISCO.SOURCE:
org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule:
No rules applied to hdfs@WANDISCO.SOURCE"

For example, in the following scenario, the DEFAULT rule won't work because the source default_realm [WANDISCO.SOURCE] can't be applied to the target principal [WANDISCO.TARGET]:

Source principal: hdfs@WANDISCO.SOURCE
Target principal: hdfs@WANDISCO.TARGET

To make sure the default_realm on the target and the source match, apply these steps:

  1. Open the core-site.xml file and modify the hadoop.security.auth_to_local property as follows:

    Example error
    <property>
    <name>hadoop.security.auth_to_local</name>
    <value>RULE:[1:$1@$0](.*@\QWANDISCO.TARGET\E$)s/@\QWANDISCO.TARGET\E$//
    DEFAULT</value>
    </property>

    Construct the local name using this expression:

    [n:string](regexp)s/pattern/replacement/g.
    (.*@\QWANDISCO.TARGET\E$)
    • Applies the following rule: If the principal matches principals with realm "@WANDISCO.TARGET", apply the following rule in the next line:
    s/@\QWANDISCO.TARGET\E$//
    • Applies the following rule: Use a regular expression replacement to match the entire principal (hdfs@WANDISCO.TARGET) and replace it with the user "hdfs".
  1. Save the file. You don't need to restart Data Migrator to apply the changes.

Cross-realm Hadoop now functions as intended because the default_realm configuration matches both realms.

info

For more information, see the Apache documentation: Mapping from Kerberos principals to OS user accounts.

Troubleshooting

See the Kerberos troubleshooting section if you have any problems setting up Kerberos.