2.11 Installing into IBM Openstack/Swift storage

This section runs through the installation of WD Fusion into an IBM Openstack environment using Swift storage. Currently this deployment is limited to an active-passive configuration that would be used to ingest data from your on-premises cluster to your Swift storage.

Pre-requisites

Before you begin an installation to an Openstack Swift cluster make sure that you have the following directories created and suitably permissioned. Examples:

Important!
For installations to IBM Openstack/Swift storage, we currently only support Keystone 3.0.
In previous releases we supported Keystone 2.0, from WD Fusion 2.9.1 this is no longer supported. FUS-3025

Overview

The installation process runs through the following steps:

  1. On-premise installation - installing a WD Fusion node on your cluster
  2. Swift storage node installation - the second node can be installed onto a VM situated on OpenStack, or an
  3. Setting up replication - Configure the nodes to ingest data from the on-premises cluster to the OpenStack Swift storage.
  4. Silent Installation - Notes on automating the installation process.
  5. Parallel Repairs - Running initial repairs in parallel.

Known Issues using Fusion with IBM Openstack/Swift storage

The following issues should be considered before you start an installation.

Install Node for Swift storage

Follow this section to complete the installation by configuring WD Fusion on a server that will place data that is replicated from your on-premises cluster to your OpenStack Swift storage. This second node can also be on-premises or co-located with your OpenStack platform.

Open a web browser and point it at the provided URL. e.g

http://<YOUR-SERVER-ADDRESS>.com:8083/
  1. In the first "Welcome" screen you're asked to choose between Create a new Zone and Add to an existing Zone.
    Make your selection as follows:
    Adding a new WD Fusion cluster
    Select Add Zone.
    Adding additional WD Fusion servers to an existing WD Fusion cluster
    Select Add to an existing Zone.

    WD Fusion Deployment

    Welcome screen.

  2. Run through the installer's detailed Environment checks. For more details about exactly what is checked in this stage, see Environmental Checks in the Appendix.
    WD Fusion Deployment

    Environmental checks.

    On clicking validate the installer will run through a series of checks of your system's hardware and software setup and warn you if any of WD Fusion's prerequisites are not going to be met.

    WD Fusion Deployment

    Example check results.

    Address any failures before you continue the installation. Warnings may be ignored for the purposes of completing the installation, especially if the installation is only for evaluation purposes and not for production. However, when installing for production, you should address all warnings, or at least take note of them and exercise due care if you continue the installation without resolving and revalidating.

  3. Upload the license file.
    WD Fusion Deployment

    Upload your license file.

  4. The conditions of your license agreement will be presented in the top panel, including License Type, Expiry data, Name Node Limit and Data Node Limit.
    WD Fusion Deployment

    Verify license and agree to subscription agreement.

    Click on the I agree to the EULA to continue, then click Next Step.
  5. Enter settings for the WD Fusion server. See WD Fusion Server for more information about what is entered during this step.
    WD Fusion Deployment

    Screen 4 - Server settings

  6. In step 5 the zone information is added.
    WD Fusion Deployment

    Swift Install 1

    Zone Information

    Fully Qualified Domain Name
    the full hostname for the server.
    Node ID
    A unique identifier that will be used by WD Fusion UI to identify the server.
    DConE Port
    TCP port used by WD Fusion for replicated traffic.
    Zone Name
    The name used to identify the zone in which the server operates.

    Swift Information

    WD Fusion Deployment

    Swift Install 1

    Some of the required information can be gathered from the Bluemix UI, in the Service Credentials section:
    WD Fusion Deployment

    User ID
    The unique ID for the Swift user
    Password
    The password for the Swift user
    Swift password changes
    During installation, the Swift password is encrypted for use with WD Fusion. This process doesn't require any further interaction except for the case where the Swift password is changed. If you change your Swift password you need to do the following:
    1. Open a terminal to the WD Fusion node and navigate to /opt/wandisco/fusion/server.
    2. Run the following script:
      ./encrypt-password.sh
        Please enter the password to be encrypted
      Enter your Swift password and press return:
      > password
      eCefUDtgyYczh3wtX2DgKAvXOpWAQr5clfhXSm7lSMZOwLfhG9YdDflfkYIBb7psDg3SlHhY99QsHlmr+OBvNyzawROKTd/nbV5g+EdHtx/J3Ulyq3FPNs2xrulsbpvBb2gcRCeEt+A/4O9K3zb3LzBkiLeM17c4C7fcwcPAF0+6Aaoay3hug/P40tyIvfnVUkJryClkENRxgL6La8UooxaywaSTaac6g9TP9I8yH7vJLOeBv4UBpkm6/LdiwrCgKQ6mlwoXVU4WtxLgs4UKSgoNGnx5t8RbVwlrMLIHf/1MFbkOmsCdij0eLAN8qGRlLuo4B4Ehr0mIoFu3DWKuDw==
      [ec2-user@ip-172-29-0-158 server]$
    3. Place the re-encrypted password in core-site.xml and application.properties.
    Auth URL
    The URL required for authenticating against Swift.
    Swift Container Name
    The name of the Swift storage container that Fusion will be connecting to.
    Project Id
    The Bluemix project ID.
    Domain Name
    The Swift Domain Name.
    Segment Container
    The name of the Segment container. The Segment container is used where large files break Swift's 5GB limit for object size. Objects that exceed 5GB are broken into segments and get stored in here.
    Region
    The Swift Object Storage Region. Not to be confused with the Bluemix region.
    WD Fusion Deployment

    Swift Validation

    WD Fusion Deployment

    Swift Validation

    Account valid
    The installer checks that the Swift account details are valid. If the validation fails, you should recheck your Swift account credentials.
    Container valid
    The installer confirms that a container with the provided details exists. If the validation fails, check that you have provided the right container name.
    Container readable
    The container is checked to confirm that it can be read. If the validation fails, check the permissions on the container.
    Container writable
    The container is checked to confirm that the container can be written to. If the validation fails, check the permissions on the container.
    WD Fusion Deployment

    Swift Segment Container Validation

    Segment Account valid
    The installer checks that the Swift account details are valid for accessing the segment container. If the validation fails, you should recheck your Swift account credentials.
    Segment Container valid
    The installer confirms that a segment container with the provided details exists. If the validation fails, check that you have provided the right segment container name.
    Segment Container readable
    The container is checked to confirm that it can be read. If the validation fails, check the permissions on the segment container.
    Segment Container writable
    The container is checked to confirm that the container can be written to. If the validation fails, check the permissions on the segment container.
  7. In step 6 authentication credentials that will be used to access the WD Fusion UI. When deloying WD Fusion under a Hadoop management layer such as Cloudera Manager or Ambari, you would use the same credentials as the said manager. In this case we're running without a seperate manager, so we need to provide our own username and password. WD Fusion Deployment

    Security

    Username
    A username that will be used for accessing the WD Fusion
    Password
    The corresponding password for use with the username, when logging into the WD Fusion UI.
  8. The summary screen lists all the configuration that has been entered so far, during the installation. You can check your entries by clicking on each category on the left-side menu. Click Next Step.
    WD Fusion Deployment

    Summary

  9. You can ignore the next step. CLick Next Step. This step is reserved for deployments where hdfs clients need to be installed. These are not required when using WD Fusion to replicate data into a cloud storage solution.
    WD Fusion Deployment

    Startup

  10. It's now time to Start up the WD Fusion server. Click Start WD Fusion. WD Fusion Deployment

    Clients

    The WD Fusion server will now start up.

  11. The final step is Induction. This will connect this second node to your existing "on-premises" node. When adding a node to an existing zone, users will be prompted for zone details at the start of the installer and induction will be handled automatically. Nodes added to a new zone will have the option of being inducted at the end of the install process where the user can add details of the remote node.
    WD Fusion Deployment

    Induction

    Enter the following details then Click Start Induction.

    Fully Qualified Domain Name
    The full address of the existing on-premises node.
    Fusion Server Port
    The TCP Port on which the on-premises node is running. Default:8082

Setting up replication

It's now time to demonstrate data replication between the on-premises cluster and the IBM OpenStack / Swift storage. First we need to perform a synchronization to ensure that the data stored in both zones is in exactly the same state.

Synchronization

You can synchronize data in both directions:

Synchronize from on-premises to the Swift node zone
Login to the on-premises WD Fusion UI.
The following guide covers the replication from on-premises to the OpenStack/Swift node.

  1. Login to the on-premises WD Fusion UI and click on the Replicated Folders tab.
    WD Fusion tree

    OpenStack/Swift storage - Fusion installation figure 09.

  2. Click on the Create button to set up a folder on the local system.
    WD Fusion tree

    OpenStack/Swift storage - Fusion installation figure 10.

    Navigate the HDFS File Tree (1), on the right-hand side of the New Rule panel to select your target folder, created in the previous step. The selected folder will appear in the Path entry field. You can, instead, type or copy in the full path to the folder in the Path directory.

    Next, select both zones from the Zones list (2). You can leave the default membership in place. This will replicate data between the two zones.

    More about Membership
    Read about Membership in the WD Fusion User Guide - 4. Managing Replication.

    IMPORTANT: files not appearing in the Swift store file tree
    If you upload files to a Swift store using the Swift client, it is possible to exploit Swift's pseudo-file structure, placing a file in a subdirectory that isn't mapped to the file system. While this works internally, folders that exist in this state will not be visible to WD Fusion and so can't be viewed in the WD Fusion Rule file tree or set for replication.

    workaround
    When uploading files using the Swift client, ensure that you add a trailing slash, e.g.

    swift upload [container name] [directory name]"/"
    Folders that are uploaded in this way will be visible in the File Tree.

    Recommendation: Use platforms like Openstack or Bluemix, instead.

    Click Create to continue.

  3. When you first create the folder you may notice status messages for the folder indicating that the system is preparing the folder for replication. Wait until all pending messages are cleared before moving to the next step.
    WD Fusion tree

    OpenStack/Swift storage - Fusion installation figure 11.

  4. Now that the folder is set up it is likely that the file replicas between both zones will be in an inconsistent state, in that you will have files on the local (on-premises) zone that do not yet exist in the Swift store. Click on the Inconsistent link in the Fusion UI to address these.
    WD Fusion tree

    OpenStack/Swift storage - Fusion installation figure 12.

    The consistency report will show you the number of inconsistencies that need correction. We will use bulk resolve to do the first replication.

    See the Appendix for more information on improving performance of your first synch and resolving individual inconsistencies if you have a small number of files that might conflict between zones - Running initial repairs in parallel

  5. Click on the dropdown selector entitled Bulk resolve inconsistencies to display the options that determine synch direction. Choose the zone that will be used for the source files. Tick the check box Preserve extraneous file so that files are not deleted if they don't exist in the source zone. The system will begin the file transfer process.
    WD Fusion tree

    OpenStack/Swift storage - Fusion installation figure 13.

  6. We will now verify the file transfers. Login to the WD Fusion UI on the HDI instance. Click on the Replicated Folders tab. In the File Transfers column, click the View link.
    WD Fusion tree

    OpenStack/Swift storage - Fusion installation figure 14.

    By checking off the boxes for each status type, you can report on files that are:

    • In progress
    • Incomplete
    • Complete

    No transfers in progress?
    You may not see files in progress if they are very small, as they tend to clear before the UI polls for in-flight transfers.

  7. Congratulations! You have successfully installed, configured, replicated and monitored data transfer with WANdisco Fusion.
    WD Fusion tree

    OpenStack/Swift storage - Fusion installation figure 15.

Swift Silent Installation

You can complete an IBM Swift installation using the Silent Installation procedure, putting the necessary configuration in the swift_silent_installer.properties as described in the section that covers Silent Installation.

Swift-specific settings

Environment Variables Required for Swift deployments:

###############################
# Swift Configuration
###############################
 
#Swift installation mode
# REQUIRED for Swift Installations. Defaults to false
swift.installation.mode=true
 
#The Swift container name to use
# REQUIRED for Swift installations.
swift.containerName=
 
#The Swift username to use
# REQUIRED for Swift installations.
swift.username=
 
#The Swift password to use
# REQUIRED for Swift installations.
swift.password=
 
#The Swift auth URL to use for authenticating access to the storage
# REQUIRED for Swift installations.
swift.auth.url=
 
# The Swift tenant name to use
# Optional, for Swift installations.
swift.tenantName=
 
# The Swift tenant id to use
# Optional, for Swift installations.
swift.tenantId=
  • FUSIONUI_MANAGER_TYPE=UNMANAGED_SWIFT
  • FUSIONUI_INTERNALLY_MANAGED_USERNAME
  • FUSIONUI_INTERNALLY_MANAGED_PASSWORD
  • FUSIONUI_FUSION_BACKEND_CHOICE
  • FUSIONUI_USER
  • FUSIONUI_GROUP
  • SILENT_PROPERTIES_PATH

silent_installer.properties File additional settings or specific required values listed here:

swift.installation.mode=true
swift.containerName=container1etc
kerberos.enabled=false (or unspecified)

Example Installation

As an example (as root), running on the installer moved to /tmp.

# If necessary download the latest installer and make the script executable
 chmod +x /tmp/installer.sh
# You can reference an original path to the license directly in the silent properties but note the requirement for being in a location that is (or can be made) readable for the $FUSIONUI_USER
# The following is partly for convenience in the rest of the script 
cp /path/to/valid/license.key /tmp/license.key
 
# Create a file to encapsulate the required environmental variables (example is for emr-4.0.0): 
cat <<EOF> /tmp/swift_silent_installer_env.sh
export FUSIONUI_MANAGER_TYPE=UNMANAGED_SWIFT
export FUSIONUI_INTERNALLY_MANAGED_USERNAME=admin
export FUSIONUI_FUSION_BACKEND_CHOICE=
export FUSIONUI_USER=hdfs
export FUSIONUI_GROUP=hdfs
export SILENT_PROPERTIES_PATH=/tmp/swift_silent.properties
export FUSIONUI_INTERNALLY_MANAGED_PASSWORD=admin
EOF
 
 # Create a silent installer properties file - this must be in a location that is (or can be made) readable for the $FUSIONUI_USER:
cat <<EOF > /tmp/swift_silent_installer_env.sh
existing.zone.domain=
existing.zone.port=
license.file.path=/tmp/license.key
server.java.heap.max=4
ihc.server.java.heap.max=4
server.latitude=54
server.longitude=-1
fusion.domain=my.s3bucket.fusion.host.name
fusion.server.dcone.port=6444
fusion.server.zone.name=twilight
swift.installation.modetrue
swift.container.name=container-name
induction.skip=false
induction.remote.node=my.other.fusion.host.name
induction.remote.port=8082
EOF

# If necessary, (when $FUSIONUI_GROUP is not the same as $FUSIONUI_USER and the group is not already created) create the $FUSIONUI_GROUP (the group that our various servers will be running as):
[[ "$FUSIONUI_GROUP" = "$FUSIONUI_USER" ]] || groupadd hadoop

#If necessary, create the $FUSIONUI_USER (the user that our various servers will be running as):
useradd hdfs

# if [[ "$FUSIONUI_GROUP" = "$FUSIONUI_USER" ]]; then
  useradd $FUSIONUI_USER
else
  useradd -g $FUSIONUI_GROUP $FUSIONUI_USER
fi

# silent properties and the license key *must* be accessible to the created user as the silent installer is run by that user
chown hdfs:hdfs $FUSIONUI_USER:$FUSIONUI_GROUP /tmp/s3_silent.properties /tmp/license.key

# Give s3_env.sh executable permissions and run the script to populate the environment
. /tmp/s3_env.sh

# If you want to make any final checks of the environment variables, the following command can help - sorted to make it easier to find variables!
env | sort
 
# Run installer:
/tmp/installer.sh

Running initial repairs in parallel

If you have a large folder you can parallelize the initial repair using the Fusion API. This can be accomplished on a single file or a whole directory. Choosing a directory will push all files from the source to the target regardless of existence at the target.

Consider the following directory structure for a fusion replicated folder /home

/home
/home/fileA
/home/fileB
/home/userDir1
/home/userDir2
/home/userDir3

We could run a bulk resolve in the UI against the /home directory, however, to provide parallelism of the repair operations we can use the Fusion API to issue repairs against each folder and the individual files in the /home folder.

REST API Call

"FUSION_NODE:PORT/fusion/fs/repair?path=SYSTEMPATH&recursive=true&src=ZONENAME"

Example - Multiple API Calls using curl

curl -X PUT "FUSION_NODE:8082/fusion/fs/repair?path=/home/userDir1&recursive=true&src=LocalFS"
curl -X PUT "FUSION_NODE:8082/fusion/fs/repair?path=/home/userDir2&recursive=true&src=LocalFS"
curl -X PUT "FUSION_NODE:8082/fusion/fs/repair?path=/home/userDir3&recursive=true&src=LocalFS"
curl -X PUT "FUSION_NODE:8082/fusion/fs/repair?path=/home/fileA&recursive=false&src=LocalFS"
curl -X PUT "FUSION_NODE:8082/fusion/fs/repair?path=/home/fileB&recursive=false&src=LocalFS"

This will spawn simultaneous repairs increasing the performance of the initial synchronization. This is especially helpful when you have small file sizes to better saturate the network.

For files, the recursive parameter is ignored
You can use the file transfers view in the Fusion UI on the OpenStack-replicating node to monitor the incoming files.

Repairing individual folders in the UI

You can use the Fusion Web UI to selectively choose which files to repair in the UI when you have a small number of files that exists on both sides and a decision needs to be made as to which one is the source of truth.

  1. In the UI on the Replicated Folders tab click the Inconsistent link in the Consistency column to get to the Consistency Report.
    WD Fusion tree

    LocalFS figure 49.

  2. If the list of files is small you'll be presented with a list. If it is longer than 100 files you will need to click Show All Inconsistencies. Note that you can still bulk resolve these.
    WD Fusion tree

    LocalFS figure 50.

  3. For each file, you can choose the Zone that is the source and click resolve.
    WD Fusion tree

    LocalFS figure 51.

  4. You will be prompted with a confirmation button.
    WD Fusion tree

    LocalFS figure 52.

  5. After clicking resolve, you will see a message Fix Requested. You can check the UI in the target zone file transfers if you want to verify the repair.
    WD Fusion tree

    LocalFS figure 53.