2.7 Orchestrated Installation

In some situations it may be preferable to install WD Fusion using our orchestration script, instead of using the Installer which automatically handles or ignores some areas of configuration. To complete an installation using the installer, see Running the installer.

Installation requirements

Time requirements

The time required to complete a deployment of WD Fusion will in part be based on its size, larger deployments with more nodes and more complex replication rules will take correspondingly more time to set up. Use the guide below to help you plan for for deployments.

  • Run through this document and create a checklist of your requirements. 1-2 hours
  • Complete the WD Fusion server installations (20 minutes per node, or 1 hour for a test deployment)
  • Install WD Fusion UI (30 minutes)
  • Complete client installations and complete basic tests 1-2 hours.

Of course, this is a guideline to help you plan your deployment. You should think ahead and determine if there are additional steps or requirements introduced by your organization's specific needs.

Network requirements

See the deployment checklist for a list of the TCP ports that need to be open for WD Fusion.

Kerberos Security

If you are running Kerberos on your cluster you should consider the following requirements:

  • Kerberos is already installed and running on your cluster
  • Fusion-Server is configured for Kerberos as described in Setting up Kerberos.
  • We will be using the same keytab and principal we generated for fusion-server. Assume it's in /etc/hadoop/conf/fusion.keytab

Update Fusion-UI configuration:

  • Copy DC's /etc/hadoop/conf/hdfs-site.xml into fusion UI's lib folder: /opt/wandisco/fusion-ui-server/lib/
  • Copy DC's /etc/hadoop/conf/core-site.xml into fusion UI's lib folder: /opt/wandisco/fusion-ui-server/lib/
  • Add core-site.xml and hdfs-site.xml path to the configuration file:
    client.core.site=/etc/hadoop/conf/core-site.xml
    client.hdfs.site=/etc/hadoop/conf/hdfs-site.xml
    
  • Enable kerberos in fusion-ui configuration (/opt/wandisco/fusion-u-server/properties/ui.properties):
    kerberos.enabled=true
    kerberos.generated.config.path=/opt/wandisco/fusion-ui-server/properties/kerberos.cfg
    kerberos.keytab.path=/etc/hadoop/comf/fusion.keytab
    kerberos.principal=fusion/${hostname}@${krb_realm}
    

WD Fusion server installation

Have a previous WD Fusion installation?
Before installing the latest version of WD Fusion, you need to run through the WD Fusion cleanup section.
  1. Before you start, make sure that you've run through the installation requirements. Take special note that you are running the right version of Java and that you have passwordless ssh enabled between your installation machine and every other machine in your cluster. See JAVA requirements and Passwordless SSH.
  2. Download the WD Fusion archive file and save it onto your 'install' server.
  3. Extract the archive. This creates the following files and directories:
    [root@vmhost08-vm0 orch]# ls -l
    -rw-rw-r-- 1 hdfs hdfs  447 Mar 17 17:00 mydefines.sh.example
    -rwxr-xr-x 1 hdfs hdfs 8473 Mar 20 10:36 orchestrate-fusion.sh
    drwxrwxr-x 4 hdfs hdfs 4096 Mar 21 08:14 rpms		
    

    Edit the mydefines.sh.example file and save it as "mydefines.sh". Enter the hostnames for your namenode, WD Fusion node and all your DataNodes. Take note of the comments that indicate properties that are specific to the data center's Hadoop version:

    License key location		
    LICENSEKEY=/path/to/license.key You must enter the path to the license key file.
    FUSIONENV=/orch/fusion_env.sh Path env variables script. No need to edit this.
    ############## ZONE1 ##############
    # Distribution for ZONE1
    ZONENAME1="Zone1"
    ZONE1=cdh-5.3.0
    # Currently you need to round down manager version
    # e.g. CDH 5.3.1 becomes cdh-5.3.0
    
    # Manager Hosts/Configs for ZONE1
    ZONEMGR1="vmhost08-vm0.host.domain.com"
    ZONECLUSTERNAME1="hdfs"
    ZONEUSER1="admin"
    ZONEPW1="admin"
    ZONEAPI1="v9"
    # CDH5.2 use "v8"
    # CDH5.3.x use "v9"
    # HDP 2.1/2.2 use "v1"
    
    ZONEYARNSERVICENAME1="yarn"
    # "YARN" for HDP
    # "yarn" for CDH
    
    # Fusion Server and IHC for ZONE1
    ZONEFS1="vmhost08-vm1.host.domain.com"
    ZONEIHC1[0]="vmhost08-vm1.host.domain.com"
    ZONEIHC1[1]="vmhost08-vm2.host.domain.com" - You can add additional IHC servers like so See below. 
    ZONEIHC1[2]="vmhost08-vm3.host.domain.com"
    
    
    # Filesystem Hosts for ZONE1
    ZONENN1="vmhost08-vm0.host.domain.com"
    ZONEDN1[0]="vmhost08-vm1.host.domain.com"
    ZONEDN1[1]="vmhost08-vm2.host.domain.com"
    ZONEDN1[2]="vmhost08-vm3.host.domain.com"
    
    ############## ZONE2 ##############
    # Distribution for ZONE2
    ZONENAME2="Zone2"
    ZONE2=hdp-2.2.0
    
    # Manager Hosts/Configs for ZONE2
    ZONEMGR2="vmhost08-vm0.host.domain.com"
    ZONECLUSTERNAME2="hdp22"
    ZONEUSER2="admin"
    ZONEPW2="admin"
    ZONEAPI2="v1"
    ZONEYARNSERVICENAME2="YARN"
    
    # Fusion Server and IHC Host for ZONE2
    ZONEFS2="vmhost08-vm1.host.domain.com"
    ZONEIHC2[0]="vmhost08-vm1.host.domain.com"
    ZONEIHC2[1]="vmhost08-vm2.host.domain.com
    
    # Filesystem Hosts for ZONE2
    ZONE2NN="vmhost08-vm0.host.domain.com"
    ZONEDN2[0]="vmhost08-vm1.host.domain.com"
    ZONEDN2[1]="vmhost08-vm2.host.domain.com"
    ZONEDN2[2]="vmhost08-vm3.host.domain.com"
    
    ############## ZONE3 ##############
    # Distribution for ZONE3
    ZONENAME3="Zone3"
    ZONE3=hdp-2.2.0
    
    # Manager Hosts/Configs for ZONE3
    ZONEMGR3="vmhost08-vm0.host.domain.com"
    ZONECLUSTERNAME3="hdp22"
    ZONEUSER3="admin"
    ZONEPW3="admin"
    ZONEAPI3="v1"
    ZONEYARNSERVICENAME3="YARN"
    
    # Fusion Server and IHC Host for ZONE3
    ZONEFS3="vmhost08-vm1.host.domain.com"
    ZONEIHC3[0]="vmhost08-vm1.host.domain.com"
    
    # Filesystem Hosts for ZONE3
    ZONE3NN="vmhost08-vm0.host.domain.com"
    ZONEDN3[0]="vmhost08-vm1.host.domain.com"
    ZONEDN3[1]="vmhost08-vm2.host.domain.com"
    ZONEDN3[2]="vmhost08-vm3.host.domain.com"
    
    The definitions for the required Hortonworks/Cloudera RPM packages are already set in the mydefines.sh script, you don't need to edit this.
  4. Renaming your Zones

    The default names for your zones, i.e. "Zone1", "Zone2", "Zone3", etc, can be changed in the mydefines file using the following property:

    ############## ZONE3 ##############
    # Distribution for ZONE3
    ZONENAME3="anyNameThatYouLike"
    ZONE3=hdp-2.2.0
    

    Running with multiple IHC servers per node

    A user may now declare multiple IHC servers in their defines file. For example, if I wanted two IHCs for zone1 and three IHCs for zone2 my defines would contain the following:

    {code}
    ...
    zone1ihc[0]=node1@example.com
    zone1ihc[1]=node2@example.com
    ...
    zone2ihc[0]=node3@example.com
    zone2ihc[1]=node4@example.com
    zone2ihc[2]=node5@example.com
    ...
    {code}
    

    When pulling, the fusionServer will choose a random IHC of the location its pulling from to connect to. If the connection or write fails, it will try a next random one until all nodes are exhausted for that location. Once all nodes are exhausted, the exception is thrown.

    Ensure system user "hdfs"
    Ensure that you have user "hdfs" on your Fusion Servers. This is because the WD Fusion server is started as user "hdfs" and will fail otherwise.
  5. Create the hdfs directories that you want to replicate between both data centers, e.g.
    /repl1 		
    
    To ensure that you avoid permission issues, have the owning Linux user created in HDFS, e.g.
    a. hadoop fs -mkdir /user/replicator1
    b. hadoop fs -chown -R replicated1:groupname /user/replicated1		
    
  6. As system user "hdfs", run the following commands in each data center: -+
    hadoop fs -mkdir /repl1
    hadoop fs -chown replicator1:replicator1 /repl1		
    
  7. Run the command:
    sudo ./orchestrate-fusion.sh ./mydefines.sh installrpms
    The RPM packages will now get installed.
    Run with a minimum of Bash 4
    The orchestration-fusion.sh script needs the WD Fusion server and client RPMs arrays to be associated arrays, which was introduced in Bash version 4.
  8. Run the command:
    sudo ./orchestrate-fusion.sh ./mydefines.sh configure /repl1
    In this example /repl1 is set up as the replicated directory.
  9. Using your management tool (Ambari or Cloudera Manager etc), add the following entries to the cluster-wide core-site.xml. The values are specific to each Data Center, i.e. you need one set of values for DC1 and another for DC2.
    <property>
           <name>fs.fusion.impl</name>
           <value>com.wandisco.fs.client.FusionFs</value>
    </property>
    <property>
            <name>fs.fusion.server</name>
            <value>YOUR.WDFUSION.COM:8023</value>
    </property>
    Replace YOUR.WDFUSION.COM with your Fusion server that you configured in your mydefines.sh. Please use port :8023.
    <property>
            <name>fusion.underlyingFs</name>
            <value>hdfs://vmhost08-vm0.host.domain.com:8020</value>
    </property>
    Replace the URL with your own cluster's underlying files system, make sure that you don't leave a trailing splash.		
    
    <property>
            <name>fs.AbstractFileSystem.fusion.impl</name>
            <value>com.wandisco.fs.client.FusionAbstractFs</value>
    </property>						
    

    Notes
    Replace the hdfs://vmhost08-vm0.host.domain.com:8020 URL with your own cluster's underlying filesystem.
    Important! Take care not to add a trailing slash "/" at the end, it won't work.

    <property>
            <name>fs.AbstractFileSystem.fusion.impl</name>
            <value>com.wandisco.fs.client.FusionAbstractFs</value>
    </property>
  10. Log in as the system user account that owns the replicated directory, e.g. "replicator1", run the command:
    hadoop fs -copyFromLocal /etc/hosts fusion:///repl1
    You'll then find the file installed in /repl1 in both data centers.
  11. Perform a test by running terasort and teragen on the same replicated folder.

Running WD Fusion on multi-homed servers

The following guide runs through what you need to do to correctly configure a WD Fusion deployment if the nodes are running with multiple network interfaces.

Overview

  1. A file is created in DC1. A Client writes the Data.
  2. Periodically after the data is written, a proposal is sent by the WD Fusion Server in DC1, telling the WD Fusion server in DC2 to pull the new file. This proposal includes the map of IHC server public IP addresses, in this case, listening at <Public-IP>:7000 (Fusion Server in DC1 read this from
    /etc/wandisco/fusion/server/ihcList)
  3. Fusion Server in DC2 gets this agreement, connects to <Public-IP>:7000 and pulls the data.

Procedure

  1. Stop all WD Fusion services.
  2. Reconfigure your IHCs to your preferred address in /etc/wandisco/ihc/*.ihc for each IHC node.
  3. For the WD Fusion servers, delete all files in /etc/wandisco/fusion/server/ihclist/*.
  4. Copy zone1 IHC's /etc/wandisco/ihc/*.ihc files to zone1 Fusion-Server /etc/wandisco/fusion/server/ihcList
  5. Copy zone2 IHC's /etc/wandisco/ihc/*.ihc files to zone2 Fusion-Server /etc/wandisco/fusion/server/ihcList
  6. Restart all services

Troubleshooting

First, ensure that DC2's WD Fusion server can connect to DC1's IHC server. You can quickly test this by running

 nc  <DC1's IHC Public-IP>:7000
If you get a 'connection refused' or 'no route to host' message, you'll have a networking problem that will need to be fixed.

WD Fusion UI Installation

Control of your WD Fusion server is done through a separate browser-based management console that we refer to as the WD Fusion UI. The WD Fusion UI can be installed on the same servers as the WD Fusion Server, although it is possible to install them on dedicated servers. This procedure covers a manual installation that requires user entry of configuration.

When you're ready to install and configure WD Fusion UI, go through this procedure. If you experience any difficulties don't hesitate to contact WANdisco's Support team.

Before you start, get this information:
During the installation you will be asked for various configuration. The installer has a short time-out so it's a good idea to have the following things ready:

    UI Hostname/Port:
    Host and TCP Port for the Hadoop WD Fusion UI server.
    Target Hostname/ Delegate Port:
    Hostname and TCP port used by the WD Fusion server. We currently run the WD Fusion server and UI on the same host. As this doesn't need to be the case, you can specify a different hostname, here.
    Manager hostname/port
    Hostname and TCP port for the Hadoop manager's (Ambari, etc) server.
  1. Download the installer script to the WD Fusion server.
  2. Open a terminal session, navigate to the installer script, make it executable and then run it, i.e.
    chmod +x fusion-ui-server_rpm_installer.sh
    sudo ./fusion-ui-server_rpm_installer.sh
  3. The installer will start by checking its file integrity, confirming where you want the installation to be placed.
    Verifying archive integrity... All good.
    Uncompressing WANdisco Fusion UI Server..........
    
    
        ::   ::  ::     #     #   ##    ####  ######   #   #####   #####   ##### 
       :::: :::: :::    #     #  #  #  ##  ## #     #  #  #     # #     # #     # 
      ::::::::::: :::   #  #  # #    # #    # #     #  #  #       #       #     # 
     ::::::::::::: :::  # # # # #    # #    # #     #  #   #####  #       #     # 
      ::::::::::: :::   # # # # #    # #    # #     #  #        # #       #     # 
       :::: :::: :::    ##   ##  #  ## #    # #     #  #  #     # #     # #     # 
        ::   ::  ::     #     #   ## # #    # ######   #   #####   #####   #####  
    
    
    Press return to install to the default location.
  4. The installer confirms which version of WD Fusion UI will be installed. You then need to confirm that you wish to continue with the installation.
    Welcome to the WANdisco Fusion UI installation
    
    You are about to install WANdisco Fusion UI version 1.0.0-76
    
    Do you want to continue with the installation? (Y/n) Y
    
  5. The installer checks for supporting software that you need to have in place before you can complete the installation:
    Checking prerequisites: 
    
    Checking for perl: OK
    Checking for java: OK	
    
  6. The installer will now allocate the Java heap space for the JVM running the UI. The ammount of memory allocated should be estimated as part of the evaluation process. The default values will be enough to run the software but won't take into account the other JVMs running on the server (WD Fusion server, IHC servers, etc).
    INFO: Using the following Memory settings:
    
    INFO: -Xms128m -Xmx512m
    
    Do you want to use these settings for the installation? (Y/n)
    
    Click enter to continue.
  7. The installer will now get you to select the manager type that you are running in the current your data center:
    Please specify the type of Manager from the list below.
    		1) Ambari
    		2) Cloudera
    		Choose a Manager Type (answer 1 or 2): 1
  8. Enter the following hostnames and ports:
    Please specify the Manager hostname: 10.2.212.3 
    		Which port is the Manager listening on? 8080
    		Please specify the HDFS hostname: 10.2.212.3
    Manager hostname
    The hostname/IP of your Hadoop manager, e.g. Ambari.
    Manager listening port
    The TPC port used for your Hadoop mananger, e.g. Ambari's standard port 8080 for http or 8440 for https.
    HDFS hostname
    The hostname/IP of the NameNode. WD Fusion uses the NameNode to grab the hdfs filetree for picking folders for replication.
  9. The installer will now capture the system user/group that will be used to run the application.
    We strongly advise against running Fusion UI as the root user.
    
    Which user should Fusion UI run as? hdfs
    Which group should Fusion UI run as? hdfs
    You should never run applications in a production environment using the root account. We recommend that you create a specific account with suitable write permissions for running hadoop applications.
  10. You will now see a summary of your entries, allowing you to check over them before continuing:
    Installing with the following settings:
    
    UI Hostname:                 redhat6.3-64bit
    UI Port:                     8083
    Target Hostname:             redhat6.3-64bit
    Target Port:                 8082
    Target Delegate Port:        9999
    Manager Type:                AMBARI
    Manager Hostname:            10.2.212.3
    Manager Port:                8080
    HDFS Hostname:               10.2.212.3
    HDFS Port:                   50070
    Application Minimum memory:  128
    Application Maximum memory:  512
    
    Do you want to continue with the installation? (Y/n) 
    
    Click enter to continue.
  11. The installer now gives you the option to set up WD Fusion UI to start on boot. Click enter to get this in place.
    Would you like WD Fusion UI to start automatically when the system boots? (Y/n) y
    
  12. The installation is now complete, WD Fusion UI will now start up.
    Starting delegate:[  OK  ]
    Starting ui:[  OK  ]
    Checking if the GUI is listening on port 8083: ......Done
    
    Please visit http://<thisHost>:8083/ to access the WANdisco Fusion UI Server
    Installation Complete
    
    You are now directed to the browser UI. Open a browser and enter the address along with the UI port that you selected, 8082 is the default. The WD Fusion UI Dashboard will appear in the browser.
    dashboard

    Installation complete.

  13. You should hold off interacting with the WD Fusion UI until you have all WD Fusion servers installed. Once WD Fusion is successfully installed on all nodes you should proceed to set up set up data replication. See Replication overview in the Admin Section.

Induction

Use the following procedure to complete an induction, where additional nodes are connected together to form a replication network

  • curl -v -X PUT -d@induction.xml -H "Content-Type: application/xml" http://<new server hostname>:8082/fusion/node/fusion01

    induction.xml

    <inductionTicket>
    	<inductorNodeId>wdfs1</inductorNodeId>
    		<inductorLocationId>location1</inductorLocationId>
    		<inductorHostName>vmhost07-vm2.host.domain.com</inductorHostName>
    	<inductorPort>6789</inductorPort>
    </inductionTicket>

The details in the ticket should reflect one of the nodes already in existing 2-zone membership.

Configuration

First Steps

Assumes no security/HA configuration

Prerequisites

  • Hadoop cluster running, WD Fusion components installed.
  • Induction of 2 or more nodes completed (See Inducting nodes).

Procedure

  1. Manually place configuration files. Copy the .ihcs that configure-fusion-ihc-server creates to the fusion-server's ihcList directory.

  2. Edit the core-site.xml in Ambari/Cloudera Mananger.

  3. Restart the cluster.
  4. Copy the edited core-site.xml file into place. Take the core-site.xml file that you edited in the previous step and place it in /opt/fusion/ihc/, then restart the IHC server by navigating to /etc/init.d and running
    sudo ./fusion-ihc-server-cdh-<versions> restart

Create a membership

  1. The list located /etc/wandisco/server/ihcList needs to be created. There should be a file with something like this inside:
    "ihcList"
    ihc.server=vmhost07-vm2.host.domain.com:7000
    http.server=vmhost07-vm2.host.domain.com:9001

Install clients

The WD Fusion UI currently does not provide a way to install clients using stacks or parcels so the user will have to manually install them. You can grab the list of client and client rpm during the install process or you can find the RPM located here : /opt/wandisco/fusion-ui-server/ui/client_packages

Configure the Hadoop manager (Ambari/Cloudera Mananger)

<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://nn1.pldc2.wandisco.com:8020/</value>
		<final>true</final>
	</property>
	<property>
		<name>fs.fusion.impl</name>
		<value>com.wandisco.fs.client.FusionFs</value>
	</property>
	<property>
		<name>fs.fusion.server</name>
		<value>fus1.pldc2.wandisco.com:8023</value>
	</property>
</configuration>

Once WD Fusion has been installed on all data centers you can proceed with setting up replication on your HDFS file system. You should plan your requirements ahead of the installation, matching up your replication with your cluster to maximise performance and resiliance. The next section will take a brief look at a example configuration and run through the necessary steps for setting up data replication between two data centers.

  • Copy the core-site.xml
  • Replication Overview

    Example Deployment

    Example WD Fusion Deployment in a 3 data center deployment.

    In this example, each one of three data centers ingests data from it's own datasets, "Weblogs", "phone support" and "Twitter feed". An administrator can choose to replicate any or all of these data sets so that the data is replicated across any of the data centers where it will be available for compute activiities by the whole cluster. The only change required to your Hadoop applications will be the addition of a replication specific URI. You can read more about adapting your Hadoop applications for replication.

    Setting up Replication

    The following steps are used to start replicating hdfs data. The detail of each step will depend on your cluster setup and your specific replication requirements, although the basic steps remain the same.

    1. Create a membership including all the data centers that will share a particular directory. See Create Membership
    2. Create and configure a Replicated Folder. See Replicated Folders
    3. Perform a consistency check on your replicated folder. See Consistency Check
    4. Configure your Hadoop applications to use WANdisco's protocol. See Configure Hadoop for WANdisco replication
    5. Run Tests to validate that your replicated folder remains consistent while data is being written to each data center. See Testing replication

    Appendix

    The appendix section contains extra help and procedures that may be required when running through a WD Fusion deployment.

    Cleanup WD Fusion

    The following section is used when preparing to install WD Fusion on system that already has an earlier version of WD Fusion installed. Before you install an updated version of WD Fusion you need to ensure that components and configurartion for an earlier installation have been removed. Go through the following steps before installing a new version of WD Fusion:

    Cleanup WD Fusion / IHC Server processes

    Ensure that there are no WD Fusion / IHC server processes running. The WD Fusion orchestration script has a cleanup option for this. Run:

    sudo ./orchestrate-fusion.sh ./mydefines.sh cleanup
    Instead, you can use the Java Virtual Machien Process Status Tool to list running processes:
    jps -l
    Check that none of your WD Fusion machines are not running. 'com.wandisco.fs.ihc.server.Main' or 'com.wandisco.fs.server.Main' If required "kill -9 the processes. Then clean up the DConE databased with the following command:
    rm -rf /opt/fusion-server/dcone/db/*/*

    Cleanup WD Fusion packages

    If you have a previous install of WD Fusion on your clusters, run 'removerpms' first to remove the RPMs.

    In case RPM file names have been changed, you should use the orchestration script that corresponds with your old version, rather than the latest version.

    sudo ./orchestrate-fusion.sh ./mydefines.sh removerpms

    Continue with your new installation

    You can now continue with the installation of the latest version of WD Fusion. Remember to make any necessary changes to the mydefines.sh file to include the new packages.

    Return to the WD Fusion installation instructions

    orchestrate-fusion.sh script commands

    Below are the available commands for running

    Install all RPMs

    sudo ./orchestrate-fusion.sh ./mydefines.sh installrpms

    Configure replication directory

    sudo ./orchestrate-fusion.sh ./mydefines.sh configure /repl1

    Completely uninstall Fusion

    Note:
    Before installing new Fusion build with orch.tar.gz.2.XX-YYY user needs to uninstall old build from existing /orch dir
    sudo ./orchestrate-fusion.sh ./mydefines.sh removerpms

    Stop all Fusion services

    sudo ./orchestrate-fusion.sh ./mydefines.sh stopservices

    Start all Fusion services

    sudo ./orchestrate-fusion.sh ./mydefines.sh startservices

    Remove configs and Dcone DBs

    sudo ./orchestrate-fusion.sh ./mydefines.sh cleanup

    Removing WD Fusion UI

    If you need to remove WD Fusion UI from a system, follow these steps:

    1. Open a terminal session and run the following package removal command.
      sudo yum erase fusion-ui-server
    2. Remove the install files with:
      sudo rm -rf /opt/wandisco/fusion-ui-server
      WD Fusion will now be completely removed from the server.