3. Administration Guide

This Admin Guide describes how to set up and use WANdisco's WD Fusion.

3.1 Housekeeping

This section covers basic operations for running a WD Fusion deployment, including commands and tooks that help you set up and maintain replicated directories.

Starting up

To start WD Fusion UI:

  1. Open a terminal window on the server and log in with suitable file permissions.
  2. Run the fusion-ui-server service from the /etc/init.d folder:
    rwxrwxrwx  1 root root    47 Apr 10 16:05 fusion-ui-server -> /opt/wandisco/fusion-ui-server/bin/fusion-ui-server 
  3. Run the script with the start command:
    [root@localhost init.d]#  ./fusion-ui-server start
    Starting fusion-ui-server:. [ OK ]
    WD Fusion starts. Read more about the fusion-ui-server init.d script.
  4. Also you can invoke the service directly. e.g.
    service fusion-ui-server stop/start

Shutting down

To shut down:

  1. Open a terminal window on the server and log in with suitable file permissions.
  2. Run the WD Fusion UI service, located in the init.d folder:
    rwxrwxrwx  1 root root    47 Dec 10 16:05 fusion-ui-server -> /opt/wandisco/fusion-ui-server/bin/fusion-ui-server
  3. Run the stop script:
    [root@redhat6 init.d]#  ./fusion-ui-server stop
    stopping fusion-ui-server:                                   [  OK  ]
    [root@redhat6 init.d]#
    The process shuts down.
Shutdowns take some time

The shutdown script attempts to stop proceses in order before completing, as a result you may find that (from WD Fusion 2.1.3) shutdowns may take up to a minute to complete.

init.d management script

The start-up script for persistent running of WD Fusion is in the /etc/init.d folder. Run the script with the help command to list the available commands:

[root@redhat6 init.d]# service fusion-ui-server help
  usage: ./fusion-ui-server (start|stop|restart|force-reload|status|version)

start Start Fusion services
stop Stop Fusion services
restart Restart Fusion services
force-reload Restart Fusion services
status Show the status of Fusion services
version Show the version of Fusion

Check the running status (with current process ID):

[root@redhat6 init.d]# service fusion-ui-server status
Checking delegate:not running                              [  OK  ]
Checking ui:running with PID 17579                         [  OK  ]

Check the version:

[root@redhat6 init.d]# service fusion-ui-server  version
1.0.0-83

Managing Services through the WD Fusion UI

Providing that the UI service is running, you can stop and start WD Fusion through the Fusion Nodes tab.

WD Fusion UI Login

The UI for managing WD Fusion can be accessed through a browser, providing you have network access and the port that the UI is listening on is not blocked.

http://<url-for-the-server>:<UI port>

e.g.
http://wdfusion-static-0.dev.organisation.com:8083/ui/

You should not need to add the /ui/ at the end, you should be redirected there automatically.

dashboard

Login using your Hadoop platform's manager credentials.

Login credentials

Currently you need to use the same username and password that are required for your platform manager, e.g. Cloudera Manager or Ambari. In a future release we will separate WD Fusion UI from the manager and use a new set of credentials.

LDAP/Active Directory and WD Fusion login

If your Cloudera-based cluster uses LDAP/Active Directory to handle authentication then please note that a user that is added to an LDAP group will not automatically be assigned the corresponding Administrator role in the internal Cloudera Manager database. A new user is LDAP that is assigned an Admin role will, by default, not be able to login to WD Fusion. To be allowed to login, they must first be changed to an administrator role type from within Cloudera Manager.

No sync between CM and LDAP
There is no sync between Cloudera Manager and LDAP in either direction, so a user who loses their Admin privileges in LDAP will still be able to login to WD Fusion until their role is updated in Cloudera Manager. You must audit WD Fusion users in Cloudera Manager.

Administrators will need to change any user in the Cloudera Manager internal database (from the Cloudera Manager UI) to the required access level for WD Fusion. Please note the warning given above, that changing access levels in LDAP will not be enough to change the admin level in WD Fusion.

Authentication misalignment

There are four possible scenarios concerning how LDAP authentication can align and potentially misalign with the internal CM database:

User has full access in CM, denied access in WD Fusion UI
  • User is in the Full Administrator group in LDAP
  • User is left as the default read-only in the internal Cloudera Manager database
User has full access in CM, full access in WD fusion UI
  • User is in the Full Administrator group in LDAP
  • User is changed Full Administrator in the internal Cloudera Manager database
User has read-only access in CM, denied access to WD Fusion UI
  • User is removed from the Full Administrator group in LDAP and added to the read-only group
  • User is left as the default read-only in the internal Cloudera Manager database
User has read-only access to CM, Full access to WD Fusion UI
  • User is removed from the Full Administrator group in LDAP and added to the read-only group
  • User is set as Full Administrator in the internal Cloudera Manager database
Clearly this scenario represents a serious access control violation, administrators must audit WD Fusion users in Cloudera Manager.

Checking cluster status on the dashboard

The WD Fusion UI dashboard provides a view of WD Fusion's status. From the world map you can identify which data centers are experiencing problems, track replication between data centers or monitor the usage of system resources.

For more details on what each section of the Dashboard, see the Reference section for the Dashboard.

dashboard

UI Dashboard will indicate if there are problems with WD Fusion on your cluster.

Server Logs Settings

The WD Fusion logs that we display in the WD Fusion UI are configured by properties in the ui.properties file.

membership

Logging

Default paths:

logs.directory.fusion /var/log/fusion/server/
Logs.directory.ihc /var/logs/fusion/ihc
logs.directory.uiserver /var/log/fusion/ui

Configure log directory

By default the log location properties are not exposed in the ui.properties file. If you need to update the UI server to look in different locations for the log files then you can add the following properties (in ui.properties). To be clear these entries do not set alternate locations for WD Fusion to write its logs, it only ensures that the UI server can still read the logs in the event that they are moved.:

logs.directory.fusion
sets the path to the WD Fusion server logs.
logs.directory.uiserver
sets the path to the UI server logs.
logs.directory.ihc
sets the path to the ihc server logs.

The file is read by the UI server on start up so you will need to restart the server for changes to take affect. The ui.properties file is not replicated between nodes so you must currently set it manually on each node.

Logging at startup

At startup the default log location is /dev/null. If there's a problem before log4j has initialised this will result in important logs getting lost. You can set the log location to a filespace that preserve early logging.

Edit fusion_env.sh adding paths to the following properties:

SERVER_LOG_OUT_FILE
Path for WD Fusion server log output
IHC_LOG_OUT_FILE
Path for IHC server log output

Induction

Induction is the process used to incorporate new nodes into WANdisco's replication system. The process is run at the end of a node installation, although it is also possible to delay the process, then use the + Induct link on the Fusion Nodes tab.

Use this procedure if you have installed a new node but did not complete its induction into your replication system at the end of the installation process.

  1. Login to one of the active nodes, clicking on the Fusion Nodes tab. Click the + Induction button.
    WD Fusion Deployment
  2. Enter the fully qualified domain name of the new node that you wish to induct into your replication system.
    WD Fusion Deployment
    Fully Qualified Domain Name
    The full domain name for the new node that you will induct into your replication system.
    Fusion Server Port
    The TCP port is used by the WD Fusion application for configuration and reporting, both internally and via REST API. The port needs to be open between all WD Fusion nodes and any systems or scripts that interface with WD Fusion through the REST API.
    Click Start Induction.
  3. When the induction process completes, the Fusion Node tab will refresh with the new node added to the list.

Induction Failure

The induction process performs some validation before running. If this validation failures you will quickly see a warning messages appear.
WD Fusion Deployment

Automatic Induction Failure
If the induction process can't connect to the new new using the details provided, a failure will happen instantly. This could happen because of an error in the new node's installation, however it could also be caused by the node being kerberized.
We also could not reach any of our standard ports
If connections can't be made on specific Fusion ports, they will be listed here. If none of the standard ports are reachable then you will be warned that this is the case.
Induction problems
For help troubleshooting problems, see Handling Induction Failure.

Additional entry fields are shown, so that you can retry the induction using a wider selection of node properties:

Fully Qualified Domain Name
the full hostname for the server.
Node ID
A unique identifier that will be used by WD Fusion UI to identify the server.
Location ID
This is the unique string (e.g. "db92a062-10ea-11e6-9df2-4ad1c6ce8e05") that appears on the Node screen (see below).
DConE Port
The TCP port used by the replication system. It needs to be open between all WD Fusion nodes. Nodes that are situated in zones that are external to the data center's network will require unidirectional access through the firewall.

Node properties

WD Fusion Deployment
If you click on an individual node on the Fusion Nodes tab, you will drill down to a Node screen that displays all the node's settings.

3.2 Troubleshooting

This section details with how to diagnose and fix problems that many occur in deployment. It's important that you check the Release Notes for any Known issues in the release that you are using. See Release Notes.

Troubleshooting Overview

  1. Read the logs
  2. Run Talkback then send the results to WANdisco's support team
  3. Common Problems
  4. Kerberos Troubleshooting

Read the logs

There are a number of log files that provide information that will be necessary in finding the cause of many problems.

The log files for WD Fusion are spread over three locations. Some processes contain more than one log file for the service. All pertinent log files are captured by running the WANdisco talkback shell script that is covered in the next section.

WD Fusion Server Logs

The logs on the WD Fusion server record events that relate to the data replication system.

Log locations:
/var/log/fusion/server
Primary log(s)
fusion-dcone.log.0
- this is the live log file for the running WD Fusion server process.
Historical logs:
The following logs are listed for completeness but are not generally useful for monitoring purposes.
fusion dcone.log.x
- the log file is rotated once its file size reaches 200MB. By default, the last 100 log files are stored. The "x" represents an incrementing number, starting at 1.
Filenames are appended with an incrementing number starting at 1.
Rotation is presently defaulted at 200MB with a retention of 100 files, although this can be customised.

fusion-server.log
- a log of the application-level events, such as kerberos authentication, license validation.
fusion-server.log.yyyy-mm-dd
log_out.log
- this is the output redirected from STDOUT and STDERR that invoked java. This is used to capture exceptions that occur before logging could start.

WD Fusion UI Server Logs

The WD Fusion user interface layer, responsible for handling interactions between the administrator, WD Fusion and the Hadoop Management layer.

Log locations:
/var/log/fusion/ui/
Primary log(s):
fusion-ui.log
Historical logs:
fusion-ui.log.x

THe UI logs will contain errors such as failed access to the user interface, connectivity errors between the user interface and WD Fusion Server REST API and other syntax errors between the user interface and the WD Fusion server's REST API and other syntax errors whilst performing administrative actions across the UI.

Inter-Hadoop Connect (IHC) Server Logs

Responsible for streaming files from the location of the client write to the WD Fusion server process in any remote cluster to which hadoop data is replicated.

Log location
/var/log/fusion/ihc
/var/log/fusion/ihc/server
Primary log(s):
server/fusion-ihc-ZZZ-X.X.X.log
- THe live IHC process log files. The components of the filename are as follows:
ZZZ - Hadoop distribution marker (hdp, cdh, phd, etc). This will be "hdp" for a Hortonworks integrated cluster.
X.X.X - A matching cluster version number. This will be "2.2.0" for a Hortonworks 2.2 cluster.
Historical logs
server/fusion-ihc-ZZZ-X.X.X.log.yyy-mm-dd
log_out.log
This log file contains details of any errors by the process when reading from HDFS in the local cluster, such as access control violations, or network write errors when streaming to the WD Fusion server in any remote cluster.

Log analysis

This is the standard format of the WANdisco log messages within Fusion. It includes an ISO8601 formatted timestamp of the entry, the log level / priority, followed by the log entry itself. Log levels we provide in order of severity (highest to lowest) that you may observe:
  • PANIC
  • SEVERE
  • ERROR
  • WARNING
  • INFO

For log analysis and reporting, logs with at the PANIC, SEVERE and ERROR levels should be investigated. The warning level messages indicate an unexpected result has been observed but one that hasn't impacted the system's continued operation. Additional levels may exist, but are used in cases when the logging level has been increased for specific debug purposes. At other times, other levels should be treated as informational (INFO).

Quickly picking out problems

One simple thing that can be done is to grep the log file for any instance of "exception" and/or "PANIC" - this will tell the administrator a great deal without much effort. Using something like:

cat /var/log/fusion/server/fusion-dcone.log.0 | egrep -i "exception|panic"

Talkback

Talkback is a bash script that is provided in your WD Fusion installation for gathering all the logs and replication system configuration that may be needed for troubleshooting problems. Should you need assistance from WANdisco's support team, they will ask for an output from Talkback to begin their investigation.

Talkback location

You can find the talkback script located on the WD Fusion server's installation directory:

$ cd /opt/wandisco/fusion/server/
You can run talkback as follows:
$ sudo talkback.sh

If a cluster has Kerberos security enabled (Talkback will detect this from WD Fusion's configuration), you may be asked for Kerberos details needed to authenticate with the cluster.

You will be asked to complete the following details:

  • Location to store the talkback to. Suggest /tmp if acceptable disk space is available.
    Reserve plenty of storage
    Note, WD Fusion talkbacks can exceed 300MB compressed, but well over 10GB uncompressed (due to logs). /tmp may or may not be suitable.
  • Kerberos keytab location.
  • User to perform kinit with when obtaining kerberos ticket.
  • Whether you wish to perform a HDFS fsck, or not. Option 1 for yes, option 2 for no.

Running talkback

To run the talkback script, follow this procedure:

  1. Log into the Fusion server, If you're not logged in as root, use sudo to run the talkback script, e.g.
    [vagrant@supp26-vm1 ~]$ sudo /opt/wandisco/fusion/server/talkback.sh 
        #######################################################################
        # WANdisco talkback - Script for picking up system & replicator       #
        # information for support                                             #
        #######################################################################
     
        To run this script non-interactively please set following environment vars:
     
        ENV-VAR:
        FUSION_SUPPORT_TICKET          Set ticket number to give to WANdisco support team
        FUSION_TALKBACK_DIRECTORY      Set the absolute path directory where the tarball will be saved
        FUSION_KERBEROS_ENABLED        Set to "true" or "false"
        FUSION_PERFORM_FSCK            Set to "true" or "false" to perform a file system
                                       consistency check
     
    Which directory would you like the talkback tarball saved to? /tmp
     
          ===================== INFO ========================
          The talkback agent will capture relevant configuration
          and log files to help WANdisco diagnose the problem
          you may be encountering.
      
    Retrieving current system state information
    Kerberos is enabled
    Kerberos is enabled. Please provide the absolute path to the keytab you wish to use to obtain a ticket:
    /etc/security/keytabs/hdfs.headless.keytab
    Please provide the corresponding username for the keytab located /etc/security/keytabs/hdfs.headless.keytab:
    hdfs
    Performing kinit as user:  hdfs
    Gathering information from Fusion endpoints
    Protocol is:  http
    Hostname is:  supp26-vm1dddd
    Port is:  8082
    retrieving details for node "supp26-vm0_2"
    retrieving details for node "supp25-vm1_59"
    retrieving details for node "supp25-vm0_61"
    retrieving details for node "supp26-vm1_20"
    Copying Fusion server log files, this can take several minutes.
    Copying Fusion IHC log files, this can take several minutes.
    Would you like to include hadoop fsck? This can take some time to complete and may drastically increase the size of the tarball.
    1) Yes
    2) No
    #? 2
    Running sysinfo script to capture maximum hardware and software information...
    Gathering Summary info....
    Gathering Kernel info....
    Gathering Hardware info....
    Gathering File-Systems info....
    Gathering Network info....
    Gathering Services info....
    Gathering Software info....
    Gathering Stats info....
    Gathering Misc-Files info....
    THE FILE sysinfo/sysinfo_supp26-vm1-20160428-132245.tar.gz HAS BEEN CREATED BY sysinfo
    tar: Removing leading `/' from member names
     
    TALKBACK COMPLETE
     
    ---------------------------------------------------------------
     Please upload the file:
     
         /tmp/talkback-201604281321-supp26-vm1.lcx.tar.gz
     
     to WANdisco support with a description of the issue.
     
     Note: do not email the talkback files, only upload them
     via ftp or attach them via the web ticket user interface.
    --------------------------------------------------------------
      
  2. Follow the instructions for uploading the output on WANdisco's support website.

We list any current known issues here, along with advice on fixes or workarounds:

Moving objects between mismatched filesystems

If you move objects onto the distributed file system you must make sure that you use the same URI on both the originating and destination paths. Otherwise you'd see an error like this:

[admin@vmhost01-vm1 ~]$ hadoop fs -mv /repl2/rankoutput1 fusion:///repl2/rankoutput2/
15/05/13 21:22:40 INFO client.FusionFs: Initialized FusionFs with URI: fusion:///, and Fs: hdfs://vmhost01-vm1.cluster.domain.com:8020. FileSystem: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-721726966_1, ugi=admin@DOMAIN.EXAMPLE (auth:KERBEROS)]]
mv: `/repl2/rankoutput1': Does not match target filesystem
If you use the fusion:/// on both paths it will work, E.g.
[admin@vmhost01-vm1 ~]$ hadoop fs -mv fusion:///repl2/rankoutput1 fusion:///repl2/rankoutput1
15/05/13 21:23:27 INFO client.FusionFs: Initialized FusionFs with URI: fusion:///, and Fs: hdfs://vmhost01-vm1.cluster.domain.com:8020. FileSystem: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-1848371313_1, ugi=admin@DOMAIN.EXAMPLE (auth:KERBEROS)]]
Note that since the non-replicated directory doesn't yet exist in ZONE2 it will get created without the files it contains on the originating zone. When running WD Fusion using the fusion:///, moving non-replicated directory to replicated directory will not work unless you use of the fusion:/// URI.

You can't move files between replicated directories
Currently you can't perform a straight move operation between two seperate replicated directories.

Handling file inconsistencies

WD Fusion's replication technology ensures that changes to data are efficiently propagated to each zone. There are, however, a few cases where the consistency of objects in the distributed file system lose consistency. WD Fusion can be set to schedule periodic consistency checks, or an administrator can trigger a check from the Admin UI or via the REST API.

If an inconsistency is found then the administrator needs to use the repair functions available through the WDFusion UI or manually repair the issue using whatever system tools correspond with the Hadoop application. This may require that up-to-date files are manually copied over from one zone to overwrite the corrupted version of the files. In some cases files will need to be deleted/removed in order to restore consistency. You will need to follow the guidelines and documentation that correspond with your underlying applications, e.g. MapR, Hive etc.

Consistency Checks look at file size, not content
The current implementation of the Consistency Check tool compares the size of files between zones. We're looking carefully at how we can implement a qualitative check that can specifically identify file corruption while not greatly impacting performance.
Repairs on large files
Please note that when very large files are repaired, it may appear that the process has stalled with different numbers of appends getting reported, post-completion. We recommend that you allow repair operations plenty of time to complete.

Username Translation
If any nodes that take part in a consistency check have the Username Translation feature enabled, then inconsistencies in the "user" field will be ignored.

Transfer reporting

When looking at the transfer reporting, note that there are situations in which HFlush/early file transfer where transfer logs will appear incorrect. For example, the push threshold may appear to be ignored. This could happen if an originating file is closed and renamed before pulls are triggered by the HFlush lookup. Note that although this results in confusing logs, those logs are in fact correct; you would see only two appends, rather than the number determinded by your push threshold - one in the very beginning, and one from the rename, which pulls the remainder of the file. What is happening is optimal; all the data is available to be pulled at that instant, so we might as well pull all of it at once instead of in chunks.

Fine-tuning Replication

WANdisco's patented replication engine, DConE, can be configured for different use cases, balancing between performance and resource costs. The following section looks at a number of tunable properties that can be used to optimize WD Fusion for your individual deployment.

Increasing thread limit

WD Fusion processes agreements using a set number of threads, 20 by default, which offers a good balance between performance and system demands.

It is possible, in cases where there are many Copy agreements arriving at the same time, that all available threads become occupied by the Copy commands. This will block the processing of any further agreements.

You can set WD Fusion to reserve more threads, to protect against this type of bottleneck situation:

Increase executor.threads property

  1. Make a backup copy of WD Fusion's applications config file /opt/wandisco/fusion-server/applications.properties, then open the original in your preferred text editor.
  2. Modify the property executor.threads.
    Property Description Permitted Values Default Checked at...
    executor.threads The number of threads executing agreements in parallel. 1-Integer.MAX_VALUE 20 Startup

    WD Fusion Server snippet

    Don't go alone
    Any upward adjustment will clearly increase the resourcing costs. Before you make any changes to DConE properties, you should open up discussions with WANdisco's support team. Applying incorrect or inappropriate settings to the replication system may result in hard to diagnose problems.
  3. Save your edited applications.properties file, then restart WD Fusion.

Tuning Writer Re-election

Only one WD Fusion node per zone is allowed to write into a particular replicated directory. The node that is assigned to do the writing is called the writer. See more about the role of the writer.

Should the current writer suddenly become unavailable, then a re-election process begins for assigning the role to one of the remaining nodes. Although the re-election process is designed to balance speed against and system resource usage, there may be deployments where the processing speed is critical. For this reason, the reelection timing can be tuned with the following system:

Tunable properties

writerCheckPeriod
The period of time (in seconds) between writer check events. Default: 60.
writerCheckMultiple
The number of check events that will fail before initiating an election. Default: 3.

Setting the writer re-election period

Period of time between a writer going off-line and another writer is elected and starts picking up = writerCheckPeriod * writerCheckMultiple. i.e.
 i.e., the default is 3 minutes ( writerCheckPeriod 60s x writerCheckMultiple 3)

If you feel these default settings create cause the system to wait too long before kicking off a re-election then you can update them using an API call:

curl -X POST http://.../fusion/fs/properties/global?path=<mapped path>&writerCheckPeriod=<new period>&writerCheckMultiple=<new multiple>

You can adjust these properties to be optimal for your deployment. However, consider the following pointers:

  • Setting the properties so that the period is very short will ensure that if a writer is lost, a new writer will be brought into action so quickly that there should be no impact on replication. However, very short periods are likely to result in a larger number of false alarms, where writer re-elections are triggered unnecessarily.
  • Setting the properties so that the period is very long will ensure that a re-election only takes place if the current writer is really "out for the count", however, a long delay between the loss of the writer and a new writer picking up could be very detrimental in some situations, such as where very large numbers of small files are being replicated between zones.

Handling Induction Failure

In the event that the induction of a new node fails, here is a possible approach for manually fixing the problem using the API.

Requirements: A minimum of two nodes with a fusion server installed and running, without having any prior knowledge about the other. This can be verified by querying <hostname>:8082/fusion/nodes

Steps:

Generate an xml file (we'll call it induction.xml) containing an induction ticket with the inductors details (Generally the inductor port should not change but this is the port that all DConE traffic uses. You can find this in your application.properties file as application_port)

<inductionTicket>
  <inductorNodeId>${NODE1_NODEID}</inductorNodeId>
  <inductorLocationId>${NODE1_LOCATIONID}</inductorLocationId>
  <inductorHostName>${NODE1_HOSTNAME}</inductorHostName>
  <inductorPort>6789</inductorPort>
</inductionTicket>
Send the xml file to your inductee:
curl -v -s -X PUT -d@${INDUCTION.XML} -H "Content-Type: application/xml" http://${NODE2_HOSTNAME}:8082/fusion/node/${NODE2_IDENTITY}

MEMBERSHIP

Requirements: A minimum of two nodes that have been inducted.

Steps:

Generate an xml file (we'll call it membership.xml) containing a membership object. DConE supports various configuration of node roles but for the time being the Fusion UI only supports <Acceptor, Proposer, Learner> and <Proposer, Learner>. If you choose to have an even number of <Acceptor, Proposer, Learner> nodes you must specify a tiebreaker.

<membership>
  <membershipIdentity>${MEANINGFUL_MEMBERSHIP_NAME}</membershipIdentity>
  <distinguishedNodeIdentity>${NODE1_NODEID}</distinguishedNodeIdentity>
  <acceptors>
    <node>
      <nodeIdentity>${NODE1_NODEID}</nodeIdentity>
      <nodeLocation>${NODE1_LOCATIONID}</nodeLocation>
    </node>
    <node>
      <nodeIdentity>${NODE2_NODEID}</nodeIdentity>
      <nodeLocation>${NODE2_LOCATIONID}</nodeLocation>
    </node>
  </acceptors>
  <proposers>
    <node>
      <nodeIdentity>${NODE1_NODEID}</nodeIdentity>
      <nodeLocation>${NODE1_LOCATIONID}</nodeLocation>
    </node>
    <node>
      <nodeIdentity>${NODE2_NODEID}</nodeIdentity>
      <nodeLocation>${NODE2_LOCATIONID}</nodeLocation>
    </node>
  </proposers>
  <learners>
    <node>
      <nodeIdentity>${NODE1_NODEID}</nodeIdentity>
      <nodeLocation>${NODE1_LOCATIONID}</nodeLocation>
    </node>
    <node>
      <nodeIdentity>${NODE2_NODEID}</nodeIdentity>
      <nodeLocation>${NODE2_LOCATIONID}</nodeLocation>
    </node>
  </learners>
</membership>
the xml file to one of your nodes:
curl -v -s -X POST -d@${MEMBERSHIP.XML} -H "Content-Type: application/xml" http://${NODE_HOSTNAME}:8082/fusion/node/${NODE_IDENTITY}/membership

STATEMACHINE

Requirements: A minimum of two nodes inducted together and a membership created that contains them (you'll want to make a note of the membership id of your chosen membership).

Steps:
Generate an xml file (we'll call it statemachine.xml) containing a fsMapping object.
<replicatedDirectory>
  <uri>${URI_TO_BE_REPLICATED}</uri>
  <membershipId>${MEMBERSHIP_ID}</membershipId>
  <familyRepresentativeId>
    <nodeId>$NODE1_ID</nodeId>
  </familyRepresentativeId>
</replicatedDirectory>

Send the xml file to one of your nodes:

curl -v -s -X POST -d@${STATEMACHINE.XML} -H "Content-Type: application/xml" http://${NODE1_HOSTNAME}:8082/fusion/fs

Emergency bypass to allow writes to proceed

If WD Fusion is down and clients use the HDFS URI, then further writes will be blocked. The emergency bypass feature gives the administrator an option to bypass WD Fusion and write to the underlying file system, which will introduce inconsistencies between zones. This is suitable for when short-term inconsistency is seen as a lesser evil compared to blocked progress.

The inconsistencies can then be fixed later using the Consistency and Repair process(es). A client that is allowed to bypass to the underlying filesystem will continue to bypass for the duration of the retry interval. Long-running clients will automatically reload configurations at a hardcoded 60 second interval. Thus it is possible to disable and enable the bypass on-the-fly.

Enable/disable emergency bypass via the UI

  1. Log in to the Fusion UI and go to the Settings tab. Click Client Bypass Settings.
    WD Fusion Deployment

    Emergency bypass via the UI.

  2. Tick the Enable fusion bypass checkbox. This will enable two entry fields for configuration: WD Fusion Deployment

    Emergency bypass via the UI.

    Bypass response time
    The time (in seconds) that will pass before the client will bypass WD Fusion. Default: 14.
    Bypass retry interval
    The time (in seconds) before the client attempts to use WD Fusion, again. Default: 60.
  3. Click Update to save your changes.

Enable/disable emergency bypass via manual configuration change

In core-site.xml add the following properties:

<property>
<name>fusion.client.can.bypass</name>
<value>true or false; default is false</value>
</property>
<property>
<name>fusion.client.bypass.response.secs</name>
<value>integer number representing seconds; default is 14</value>
</property>
<property>
<name>fusion.client.bypass.retry.interval.secs</name>
<value>integer number representing seconds; default is 60</value>
</property>
The properties are also listed in the Reference Section.

Kerberos Troubleshooting

Kerberos Error with MIT Kerberos 1.8.1 and JDK6 prior to update 27

Prior to JDK6 Update 27, Java fails to load the Kerberos ticket cache correctly when using MIT Kerberos 1.8.1 or later, even after a kinit.

The following exception will occur when attempting to access the Hadoop cluster.

WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException:
GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

The workaround is:

Uninstall WD Fusion

In cases where you need to remove WD Fusion from a system, use the following script:

/opt/wandisco/fusion-ui-server/scripts/uninstall.sh

  • The script is placed on the node during the installation process.
  • You must run the script as root or invoke sudo.
  • Running the script without using an additional option performs the following actions:
    Default uninstall
    1. Stops all WD Fusion related services
    2. Uninstalls the WD Fusion. IHC and UI servers
    3. Uninstalls any Fusion-related plugins (See Plugins)
    4. Uninstalls itself. You'll need to handle backups manually from this point

Usage

Example

sudo CONFIG_BACKUP_DIR=/data/my_config_backup LOG_BACKUP_DIR=/data/my_log_backup /opt/wandisco/fusion-ui-server/scripts/uninstall.sh -c -l -p

See below for a full explanation of each option:

Uninstall with config purge

Running the script with -p will also include the removal of any configuration changes that were made during the WD Fusion installation.

Reinstallation
Use the purge (-p) option in the event that you need to complete a fresh installation.

As the purge option will completely wipe your installation, there's a backup option that can be run to back up your config files, which gives you an easier method for recovering your installation:

Backup config/log files

Run the script with the -c option to back up your config and -l to back up WD Fusion logs. The files will be backed up to the following location:

/tmp/fusion_config_backup/fusion_configs-YYYYMMDD-HHmmss.tar.gz
Change the default save directory
You can change the locations that the script uses for these backups by adding the following environmental variables:
CONFIG_BACKUP_DIR=/path/to/config/backup/dir
LOG_BACKUP_DIR=/path/to/log/backup/dir

Dry run

Use the -d option to test an uninstallation. This option lets you test the effects of an unstallation, without any actual file changes being made. Use this option to be sure that your uninstallation will do what you expect.

Help

Running the script with -h outputs a list of options for the script.

[sysadmin@localhost ~]$ sudo /opt/wandisco/fusion-ui-server/scripts/uninstall.sh -h
Usage: /opt/wandisco/fusion-ui-server/scripts/uninstall.sh [-c] [-l] [-p] [-d]
 -c: Backup config to '$CONFIG_BACKUP_DIR' (default: /tmp/fusion_config_backup).
 -d: Dry run mode. Demonstrates the effect of the uninstall without performing the requested actions.
 -h: This help message.
 -l: Backup logs to '$LOG_BACKUP_DIR' (default: /tmp/fusion_log_backup).
 -p: Purge config, log, data files, etc to leave a cleaned up system.

4. Managing Replication

WD Fusion is built on WANdisco's patented DConE active-active replication technology. DConE sets a requirement that all replicating nodes that synchronize data with each other are joined in a "membership". Memberships are co-ordinated groups of nodes where each node takes on a particular role in the replication system.

For more information about DConE and its different roles see the reference section's chapter called A Paxos Primer.

4.1 Create a membership

  1. Log in to the WD Fusion UI. Click on the Membership tab. Click on the Create New tab. The "New Membership" window will open that will display the WD Fusion nodes organized by zone.
    membership

    Create Membership1

  2. Configure the membership by selecting which nodes should be acceptors. Acceptors vote on the ordering of changes.
    membership

    Note how a two-node membership requires that one of the nodes be upgraded to a Distinguished Node.

    For some guidance on the best way to configure a membership read Create Resilient Memberships in the reference section.

    membership
  3. Click Create to complete the operation. Click Cancel to discard the changes.
  4. Identical memberships are not allowed
    You will be prevented from creating more than 1 membership with a particular configuration.
    membership

    Guide to node types

    APL
    Acceptor - the node will vote on the order in which replicated changes will play out.
    Proposer - the node will create proposals for changes that can be applied to the other nodes.
    Learner - the node will receive replication traffic that will synchronize its data with other nodes.
    PL
    Proposer - the node will create proposals for changes that can be applied to the other nodes.
    Learner - the node will receive replication traffic that will synchronize its data with other nodes.
    Distinguished Node
    Acceptor + - the distinguished node is used in sitations where there is an even number of nodes, a configuration that introduces the risk of a tied vote. The Distinguished Node's bigger vote ensures that it is not possible for a vote to become tied.

    4.2 Replicated Folders

    WD Fusion allows selected folders within your hdfs file system to replicated to other data centers in your cluster. This section covers the set up and management of replicated folders.

    Create a replicated folder

    The first step in setting up a replicated folder is the creation of a target folder:

    1. In each zone, create a directory in the hdfs file space. To avoid permission problems, ensure that the owning user/group are identical across the zones. Use Hadoop's filesystem command to complete the tasks:
      hadoop fs -mkdir /user/hiver
      hadoop fs -chown -R hiver:groupname /user/hiver
      
    2. As user hdfs, run the following commands on each data center:
      hadoop fs -mkdir /user/hiver/warehouse-replicated
      hadoop fs -chown hiver:hiver user/hiver/warehouse-replicated
      
      This ensures that the a universal system user has read/write access to the hdfs directory warehouse-replicated that will be replicated through WD Fusion.

    Create Rule

    1. Once the folder is in place on all nodes, login to WD Fusion's UI on one of the WD Fusion nodes and click on the Replicated Folders tab.
    2. Click on the + Create button. membership

      Create Rule

    3. The replicated folder entry form screen will appear. membership

      Create Rule

      Navigate the HDFS File Tree (1), on the right-hand side of the New Rule panel to select your target folder, created in the previous section. The selected folder will appear in the Path entry field. You can, instead, enter the full path to the folder in the Path directory.

      Next, select two or more zones from the Zones list (2). You then select a Membership from the dropdown selector. If there's no existing membership with the combination of Zones that you selected, then you will see the message:
      There are no memberships available matching your criteria.
      In this case you can create a new membership, see 4.1 Create a membership and restart the Create Replicated Folder process.

    4. You can now complete the creation of the Replicated folder by clicking on the Create button. However, there are some additional options available on the Advanced Options panel. Consider if you need to apply any Advanced Options for the folder.

      Note that the allocated writer for this zone is listed under the a Advanced Options panel. This can be useful information in case you need to troubleshoot replication problems.
      membership
      These include Preserve Origin Block Size, which is used for columnal storage formats such as parquet, Preserve Replication Factor which is used when want replica data to continue to use the replication factor that is set on its originating cluster, rather than the use the factor that applies on the new cluster. Exclude from replication ? lets you set an "exclude pattern" to indicate files and folders in your replicated folder that you don't want to be replicated. If you apply any Advanced Options you need to click the Update button to make sure that they are applied.
      Known Issue: Add exlusions after you have created a folder
      If you need to set up exclusions, set them up after you have created the replicated folder. If you enable them during the folder's creation they will be ignored. This issue will be fixed in a coming release.
      FUI-2414
      The option Override Consistency Check Interval allows administrators to set a consistency check interval that is specific to the replicated folder space and different from the default value that is set in the Consistency Check section of the Settings tab.
      Known Issue: Now fixed
      The minor fault with the Replicated folder Advanced option for Overriding the Consistency Check interval where enabling the option fixes the interval to 6 hours, regardless of what value you enter, has been fixed in version 2.6.6
      FUI-1984


      Path interpretation

      If the path contains a leading slash "/", we assume it is an absolute path, if it contains no leading slash then we assume it is a relative path and the root directory will be added to the beginning of the exclusion.

    5. If you didn't complete a consistency check on the selected folder, you may do so now. membership

      Replicate to Zones

    6. After the completion of a consistency check, the Consistency column will report the consistency status.
      membership

      Replicated folder status

    Edit/ View Replicated Folder

    If you click on the View link for a Replicated Folder, then you enter a tabbed UI:

    View/Edit

    membership

    The View/Edit tab lets you make changes to selected properties of the Replicated Folder:

    Writer for this zone
    Indicates which node is set to handle writes for this zone.
    Path
    The file path for the replicated folder in question.
    Zones
    The zones that are replicated between, for the corresponding folder.
    Membership
    The membership used to define the replication.
    Advanced Options
    Various advanced options that can be set for a replicated folder. See Advanced Options.

    Consistency Check

    The Consistency Check tab offers access to the consistency repair tool. membership

    Source of truth
    From the available zones, you must choose the one that represents the most up-to-date state.
    Resolve
    Once you have selected from the available zones, click the Resolve button.
    membership

    You will see a confirmation message concerning your choice of repair. There is a checkbox that lets you choose to Preserve extraneous files, Click Confirm to conplete the repair.

    membership

    After clicking Confirm, you will get a rundown of the state of each zone, after the repair has been completed.

    Custom Consistency Check

    Use the Custom Consistency Check to select a sub directory of the Replicated Directory and check that it is in a consistent state across all membership

    Path
    Shows the path to be checked
    HDSF File Tree
    Use the HDFS File Tree to select the directory to be checked.
    Outcome
    Note: When running a custom consistency check, there may be a delay before results are shown. Stay on this page to see the results.

    Please select a path and click "Check Now".
    membership
    Outcome
    The Outcome panel will now report on the number of inconsistencies. You will be invited to "Click for a full report".

    File Transfers

    The File Transfer panel shows the movement of data coming into the zone.
    membership

    Repair

    The repair tab provides a tool for repairing an inconsistency between the available zones. Run through the following procedure to perform a repair: membership

    1. Select the Source of truth from the drop-down. This will flag one of the available zones as most up-to-date / most correct in terms of stored data.
    2. Select from one of two Resolution types, Recursive or Preserve
      Recursive
      If checkbox is ticked, this option will cause the path and all files under it to be made consistent. The default is true, but is ignored if the path represents a file.
      Preserve
      If checkbox is ticked, when the repair is executed in a zone that is not the source zone, any data that exists in that zone but not the source zone will be retained and not removed. The default is false, i.e., to make all replicas of the path consistent by removing all data in the no-source zone(s) that does not exist in the source.

    Checking repair status

    It's possible to generate a report on the current state of a repair. Follow the procedure outlined below:

    You can access repairs by invoking the following API mount point:

    <node-hostname>:8082/fusion/fs/repairs

    Parameters

    path
    The path for which the list of repairs should be returned. The default value is the root path, "/".
    recursive
    If true, also get repairs done on descendants of path. This option is false by default.
    showAll
    Whether or not to include past repairs for the same file. The options are "true" to show all repairs on the given path, and "false" to show only the last repair.
    sortField
    The field by which the entries in the RepairListDTO should be sorted. The options are to sort by the "startTime" or "path" property. The default value is "path".
    sortOrder
    The order in which the entries should be sorted according to the sort field. The options are to sort in ASC (ascending) or DESC (descending) order.
    return
    A RepairListDTO representing a list of repairs under path.
    Command-line only
    The Repair status tool is currently only available through the command-line. In the next release the functionality will be added to the Fusion UI.

    Configure Hadoop

    Once WD Fusion has been installed and set up, you will need to modify your Hadoop applications so that when appropriate, they write to your replicated folder.

    configure hadoop

    Configure Hadoop applications to write to the replicated file space.

    Configure for High Availability Hadoop

    If you are running Hadoop in a High Availability (HA) configuration then you should run through the following steps for WD Fusion:

    1. Enable High Availability on your Hadoop clusters. See the documentation provided by your Hadoop vendor, i.e. - Cloudera (via QJM) or Hortonworks.
      The HA wizard does not set the HDFS dependency on ZooKeeper
      Workaround:
      • Create and start a ZooKeeper service if one doesn't exist.
      • Go to the HDFS service.
      • Click the Configuration tab.
      • In the Service-Wide category, set the ZooKeeper Service property to the ZooKeeper service.
    2. Edit WD Fusion configuration element 'fusion.underlyingFs' to match the new nameservice ID in the cluster-wide core-site.xmlin your Hadoop manager.
      E.g, change:
      <property>
              <name>fusion.underlyingFs</name>
              <value>hdfs://vmhost08-vm0.cfe.domain.com:8020</value>
      </property>
      To:
      <property>
              <name>fusion.underlyingFs</name>
              <value>hdfs://myCluster</value>
      </property>
      
    3. Click Save Changes to commit the changes.
    4. If Kerberos security is installed make sure the configurations are there as well: Setting up Kerberos with WD Fusion.
    5. You'll need to restart all Fusion and IHC servers once the client configurations have been deployed

    Known issue on failover

    Where High Availability is enabled for the NameNode and WD Fusion, when the client attempts to failover to the Standby NameNode it generates a stack trace that outputs to the console. As the WD Fusion client can only delegate the method calls to the underlying FileSystem object, it isn't possible to properly report that the connection has been reestablished. Take care not to assume that a client has hung, it may, in fact, be in the middle of a transfer.

4.3 Reporting

The following section details with the reporting tools that WD Fusion currently provides.

4.3.1 Consistency Check

The consistency check mechanism lets you verify that replicated HDFS data is consistent between sites. Read about Handling file inconsistencies.

Consistency Checks through WD Fusion UI

Username Translation
If any nodes that take part in a consistency check have the Username Translation feature enabled, then inconsistencies in the "user" field will be ignored.

NameNode Settings

Replication Rules table - indicates if inconsistencies are detected.

Consistency

Consistency Status
A status which links to the consistency check report. It can report Check Pending, Inconsistent, Consistent or Unknown.
Last Check:
Shows the time and date of the check that produced the current status. By default, Consistency checks are automatically started every 24 hours.
Next Check:
Shows the time and date of the next automatically scheduled Consistency Check. Remember, you don't need to wait for this automatic check, you can trigger a consistency check at any time through the Consistency Check tool.

Click on the report link to get more information about the current consistency check results.

Fix inconsistencies with the Consistency Check tool

WD Fusion's Consistency Check tool includes a feature for resolving any inconsistencies that are detected across the distributed file system. Use the following procedure to resolve any such inconsistencies:

  1. Start by completing a fresh Consistency Check. Select the inconsistent object using the corresponding check box, then click on the Consistency Check button. After a few moments you'll get an up-to-date report on inconsistency. NameNode Settings

    Consistency Check

  2. To fix an inconsistency, click on the Inconsistent link in the Consistency column.
    NameNode Settings

    Inconsistent

  3. The inconsistency is shown in terms of object properties. NameNode Settings

    Consistency Check

    Path:
    The absolute path for the object.
    Length:
    The size of the object.
    Is a directory:
    Identifies if the object is a directory (true) or a file (false).
    Owner:
    System account that owns the object.
    Group:
    System group associated with the object(s)
    Permission:
    File permissions for the object.
  4. Compare the various states of the inconsistent element across your cluster. You need to decide which zone(s) have a correct/up-to-date copy of the element, then select the zone under the Source of truth column. Click Resolve.
    NameNode Settings

    Confirm Consistency Check

  5. You'll get a confirmation prompt that will confirm which copies will be overwritten and which zone will source the file. Click Confirm to complete the fix or click Cancel to stop the process.
    NameNode Settings

    Consistency Check

  6. If you clicked Confirm then the fix operation will begin. The UI will indicate Fix requested. NameNode Settings

    Consistency Check

  7. Rechecking the Consistency will now confirm that the object is now consistent across all zones.
    NameNode Settings

    Consistency Check

  8. NameNode Settings

    Consistency Check

    NameNode Settings

    Consistency Check

4.3.2 File Transfer Report

As a file is being pulled into the local zone, the transfer is recorded in the WD Fusion server and can be monitored for progress.

Use the REST API filter by the replicated path and sort by ascending or descending "complete time" or "start time":

GET /fusion/fs/transfers?path=[path]&sortField=[startTime|completeTime]&order=[ascending|descending]

File transfer Report Output

Example output showing an in-progress and completed transfer:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<fileTransfers>
    <fileTransfer>
        <startTime>1426020372314</startTime>
        <elapsedTime>4235</elapsedTime>
        <completeTime>1426020372434</completeTime>
        <username>wandisco</username>
        <familyRepresentativeId>
            <nodeId>dconefs5-1</nodeId>
            <dsmId>93452fe3-c755-11e4-911e-5254001ba4b1</dsmId>
        </familyRepresentativeId>
        <file>/tmp/repl/isoDEF._COPYING_<;/file>
        <remoteFs>hdfs://vmhost5-vm4.frem.wandisco.com:8020</remoteFs>
        <origin>dc1<;/origin>
        <size>4148166656</size>
        <remaining>4014477312</remaining>
        <bytesSec>3.3422336E7</bytesSec>
        <percentRemaining>96.77714626516683</percentRemaining>
        <state>in progress</state>
    </fileTransfer>
    
    <fileTransfer>
        <startTime>1426019512082</startTime>
        <elapsedTime>291678</elapsedTime>
        <completeTime>1426019803760</completeTime>
        <username>wandisco</username>
        <familyRepresentativeId>
            <nodeId>dconefs5-1</nodeId>
            <dsmId>93452fe3-c755-11e4-911e-5254001ba4b1</dsmId>
        </familyRepresentativeId>
        <file>/tmp/repl/isoABC</file>
        <remoteFs>hdfs://vmhost5-vm4.frem.wandisco.com:8020</remoteFs>
        <origin>dc1</origin>
        <size>4148166656</size>
        <remaining>0</remaining>
        <bytesSec>1.4221733E7</bytesSec>
        <percentRemaining>0.0</percentRemaining>
        <state>complete</state>
    </fileTransfer>
</fileTransfers>		

Output key with data type

Username
System user performing the transfer. (String)
File name
Name of the file being transferred. (String)
Remote FS
The file of the originating node. (URI)
Origin
The file's originating Zone. (String)
Size
The cumulative size of data transferred. (Long)
Appends
The number of appends that have been made to the file being transferred. (Long)
AppendSize
The size of the latest append.
Remaining
Remaining bytes still to be transferred for the latest append. (Long)
Percent remaining
Percentage of the file still to be transferred. (Double)
Bytes/Sec
The current rate of data transfer, i.e. Amount of file downloaded so far / elapsed download time. (Long)
State
One of "in progress", "incomplete", "completed", "appending", "append complete", "deleted" or "failed". (TransferState)
In progress: means we are performing an initial pull of the file.
Appending: means data is currently being pulled and appended to the local file.
Append completed: means all available data has been pulled and appended to the local file, although more data could be requested later.

Note: files can be renamed, moved or deleted while we pull the data, in which case the state will become "incomplete".
When the remote file is closed and all of its data has been pulled, the state will then change to "Complete".
If a file is deleted while we are trying to pull the end state will be "deleted".
If the transfer fails the state will be "failed".
Start Time
The time when the transfer started. (Long)
Elapsed Time
Time that has so far elapsed during the transfer. Once the transfer completes it is then a measure of the time between starting the transfer and completing. (Long)
Complete Time
During the transfer this is an estimate for the complete time based on rate of through-put so far. Once the transfer completes this will be the actual time at completion. (Long)
Delete Time
If the file is deleted then this is the time the file was deleted from the underlying filesystem. (Long)

Record retention

Records are not persisted and are cleared up on a restart. The log records are truncated to stop an unbounded use of memory, and the current implementation is as follows:
For each state machine, if there are more than 1,000 entries in its list of transfers we remove the oldest transfers ,sorted by complete time, which are in a terminal state ("completed", "failed" or "deleted") until the size of the list is equal to 1,000. The check on the number of records in the list is performed every hour.

4.4 Deleting memberships

It is currently not possible to delete memberships that are no longer required. Currently, removing memberships would potentially break the replication system.

4.5 Bandwidth management

For deployments that are run under an enterprise license, additional tools are available for monitoring and managing the amount of data transfered between zones.

Enterprise License only The Bandwidth Management tools are only enabled on clusters that are running on an Enterprise license. See the Deployment Checklist for details about License Types.

Overview

The bandwith management tools provide two additional areas of functionality to support Enterprise deployments.

  • Limit the rate of outgoing traffic to each other zone.
  • Limit the rate of incoming traffic from each other zone.

Any applicable bandwidth limits are replicated across your nodes and applied on a per-zone basis.

Fusion11

Fusion Nodes - when Enterprise license is in use.

The Fusion Nodes screen will display current incoming traffic for the local zone. You will need to log in to the WD Fusion UI on a node within each Zone to see all incoming traffic levels.

Setting up bandwidth limits

Use this procedure to set up bandwidth limits between your zones

  1. Click on the Set bandwidth limit button for each corresponding zone.
    Fusion11
  2. The Maximum bandwidth dialog will open. For each remote zone you can set a maximum Outgoing to and Incoming from values. Entered values are in Megabits per second. These are converted into Gigabytes per hour and displayed in brackets after each entry field.
    Fusion11

    Maximum bandwidth entry dialog.

    Outgoing to
    The provided value will be used as the bandwidth limit for data coming from the target zone.
    Incoming from
    As it is only possible to actually limit traffic at source, the Incoming from value is applied at the target zone as the Outgoing to limit for data being sent to the present zone.
  3. When you have set your bandwidth values, click Update to apply these settings to your deployment.
    Fusion11

    Maximum bandwidth entry dialog.

  4. 5. Settings

    Set up a Custom Disk Monitor

    Use this procedure to set up a custom monitor in WD Fusion UI's Disk Monitor tool.

    The Monitoring Data tool monitors the disk usage of the WD Fusion software, providing a basic level of protection against it consuming all disk space. The tool also lets you set up your own monitors for user-selected resources.

    Disk Monitor - not intended as a final word in system protection
    The disk monitor is no substitute for dedicated, system-wide monitoring tools. Instead, it is intended to be a 'last stand' against possible disk space exhaustion that could lead to data loss or corruption.

    Read our Recommendations for system-wide monitoring tools.
    1. Log in to the WD Fusion UI. Click on the Settings tab.
    2. Click on Disk Monitoring at the top of the side menu.
      NameNode Settings

      Settings - Disk monitor

    3. Click Create.
      NameNode Settings

      Settings - Disk monitor

    4. Enter the required details for setting up a disk monitor.
      NameNode Settings

      Settings - Disk monitor

      File system path
      Enter the full path of the system directory that will be monitored for disk usage.
      Severity level
      Select a system log severity level (Severe, Warning, Info or Debug) that will correspond with the Disk Capacity Threshold.

      Caution Assigning a monitor with Severe level will impact operation should its trigger Disk Capacity Threshold be met. The affected WD Fusion will immediately shut down to protect its file system from corruption. Ensure that Severe level monitors are set up with a threshold that corresponds with serious risk. Set the threshold too low and you may find WD Fusion nodes are shutdown needlessly.

      Disk Capacity Threshold (bytes)
      The maximum amount of data that can be consumed by the selected system path before the monitor sends an alert message to the log file.
      Message
      A human-readible message that will be sent to the log at the point that the Disk Capacity Threshold is reached.
    5. You can set a monitor to have multiple trigger points. Click + Add another severity monitor and add an additional Severity level, Disk Capacity Threshold and Message. You can have a separate monitor for each Log level. Monitor Settings

      Settings - Additional Disk monitors

    Edit a Disk Monitor

    You can make changes to an existing custom monitor by clicking on the Edit

    Monitor Settings

    Settings - Change it

    Caution You can't delete or modify the default Monitor which protects the system from disk space exhaustion from the temporary files created in the WANdisco replication directory /DConE/consensusNode

    Delete a Disk Monitor

    You can delete a custom monitor by clicking on the Edit or Remove link on the existing custom monitor.

    Remove Settings

    Settings - Remove it


    On the edit screen, click Remove Monitor to remove the whole entire custom monitor. It is possible to remove individual rules from the monitor, although you need to remove them in reverse order of severity using the Remove bottom monitor button.
    Remove Settings

    Settings - "Remove Monitor"

    Change the UI Settings

    You can change how you interact with WD Fusion UI through the browser. Use the following procedure to change either the HTTP or HTTP SSL port that is used to view the UI through a browser.

    1. Log in to the WD Fusion UI. Click on the Settings tab.
    2. Click on UI Settings link on the side menu.
    3. Enter a new HTTP Port or HTTP SSL.
      Change UI Settings 1

      Settings - Change it

    4. Click Update. You may need to update the URL in your browser to account for the change you just made.

    Changing the WD Fusion server settings

    The server settings give you control over traffic encryption between WD Fusion and IHC servers.

    Server Settings

    Enable SSL for WD Fusion

    The following procedure is used for setting up SSL encryption for WD Fusion. The encryption will be applied between all components: Fusion servers, IHC servers and clients.

    The procedure must be followed for each WD Fusion server in your replication system, in turn.

    1. Login to WD Fusion UI, click on the Settings tab.
    2. Click the Enable SSL for WD Fusion checkbox.
      Server Settings
    3. Enter the details for the following properties: Server Settings
      KeyStore Path
      Path to the keystore.
      e.g. /opt/wandisco/ssl/keystore.ks
      KeyStore Password
      Encrypted password for the KeyStore.
      e.g. ***********
      Key Alias
      The Alias of the private key.
      e.g. WANdisco
      Key Password
      Private key encrypted password.
      e.g. ***********
      TrustStore Path
      path to the TrustStore.
      /opt/wandisco/ssl/keystore.ks
      TrustStore Password
      Encrypted password for the TrustStore.
      e.g. ***********
    4. Click Update to save the settings. Repeat the steps for all WD Fusion servers.

    Changing SSL Settings

    Any changes that you make to the SSL settings must be applied, manually in the UI of every other WD Fusion node. Adding an update to the SSL settings will apply changes in the core-site file via the management endpoint (Cloudera Mananger, Ambari, etc). You may be required to make manual changes to configuration files and restart some services.

    Setting up SSL

    What follows is a manual procedure for setting up SSL. In most cases it has been superseded by the above Fusion UI-driven method. If you make changed using the following method, you will need to restart the WD Fusion server in order for the changed to appear in on the Settings tab.

    1. Create the keystores / truststores. Every Fusion Server and IHC server should have a key store with a private key entry / certificate chain for encrypting and signing. Every Fusion Server and Fusion Client must also have a truststore for validating certificates in the path specific in "fusion.ssl.truststore". The keystores and truststores can be the same file and may be shared amongst the processes.

    2. Fusion Server configuration

      To configure Server-Server or Server-Client SSL, enter the following configurations to the application.properties file. e.g.

      ssl.enabled=true
      ssl.key.alias=socketbox
      ssl.key.password=***********
      ssl.keystore=/etc/ssl/key.store
      ssl.keystore.password=**************
      Server-Server or Server-Client
      Configure the keystore for each server:
      Key Value Default File
      ssl.key.alias alias of private key/certificate chain in key store NA application.properties
      ssl.key.password encrypted password to key NA application.properties
      ssl.keystore path to Keystore NA application.properties
      ssl.keystore.password encrypted password to key store NA application.properties
      Server-to-Server or Server-to-IHC

      Configure the truststore for each server:

      Key Value Default File
      ssl.truststore path to truststore Default application.properties
      ssl.truststore.password encrypted password to trust store Default application.properties
    3. Fusion client configuration Server-Client only

      Configure the truststore for each client:

      Key Value Default File
      fusion.ssl.truststore path to truststore NA core-site.xml
      fusion.ssl.truststore.password encrypted password for truststore NA core-site.xml
      fusion.ssl.truststore.type JKS, PCKS12 JKS core-site.xml
    4. IHC Server configuration (Server-IHC SSL only)

      Configure the keystore for each IHC server:

      Key Value Default File
      ihc.ssl.key.alias alias of private key/certificate chain in keystore NA .ihc
      ihc.ssl.key.password encrypted password to key NA .ihc
      ihc.ssl.keystore path to keystore NA .ihc
      ihc.ssl.keystore.password encrypted password to keystore NA .ihc
      ihc.ssl.keystore.type JKS, PCKS12 JKS .ihc
    5. Enable SSL:

      The following configuration is used to turn on each type of SSL encryption:

      Type Key Value Default File
      Fusion Server - Fusion Server ssl.enabled true false application.properties
      Fusion Server - Fusion Client fusion.ssl.enabled true false core-site.xml
      Fusion Server - Fusion IHC Server fusion.ihc.ssl.enabled true false .ihc

      Enable SSL (HTTPS) for the WD Fusion Server

      The manual steps for getting WD Fusion Server to support HTTPS connections:

      1. You need to add the following property to application.properties.
        Type Key Value Default File
        Enable HTTPS support for Fusion core fusion.http.policy HTTP_ONLY, HTTPS_ONLY, BOTH_HTTP_HTTPS. If you enable HTTPS_ONLY, you need to make some matching changes to the WD Fusion UI server so that it is able to communicate with the core Fusion server. HTTP_ONLY application.properties

        Enable HTTPS for Fusion UI

        Note that if you enable the Fusion Server to communicate over HTTPS-only, then you must also make the following changes so that the Fusion UI matches up:

        target.ssl true
        target.port 443 (This is the port that Fusion Server uses for accepting REST requests, over HTTPS).
      2. Advanced Options

        Only apply these options if you fully understand what they do.
        The following advanced options provide a number of low level configuration settings that may be required for installation into certain environments. The incorrect application of some of these settings could cause serious problems, so for this reason we strongly recommend that you discuss their use with WANdisco's support team before enabling them.

        URI Selection

        The default behavior for WD Fusion is to fix all replication to the Hadoop Distributed File System / hdfs:/// URI. Setting the hdfs-scheme provides the widest support for Hadoop client applications, so some applications can't support the available "fusion:///" URI or they can only run on HDFS. instead of the more lenient HCFS. Each option is explained below:

        Use HDFS URI with HDFS file system
        URI Option A
        This option is available for deployments where the Hadoop applications support neither the WD Fusion URI or the HCFS standards. WD Fusion operates entirely within HDFS.

        This configuration will not allow paths with the fusion:// uri to be used; only paths starting with hdfs:// or no scheme that correspond to a mapped path will be replicated. The underlying file system will be an instance of the HDFS DistributedFileSystem, which will support applications that aren't written to the HCFS specification.
        Use WD Fusion URI with HCFS file system
        URI Option B
        This is the default option that applies if you don't enable Advanced Options, and was the only option in WD Fusion prior to version 2.6. When selected, you need to use fusion:// for all data that must be replicated over an instance of the Hadoop Compatible File System. If your deployment includes Hadoop applications that are either unable to support the Fusion URI or are not written to the HCFS specfication, this option will not work.
        Use Fusion URI with HDFS file system
        URI option C
        This differs from the default in that while the WD Fusion URI is used to identify data to be replicated, the replication is performed using HDFS itself. This option should be used if you are deploying applications that can support the WD Fusion URI but not the Hadoop Compatible File System.

        Benefits of HDFS.
        The following advanced options provide a number of low level configuration settings that may be required for installation into certain environments. The incorrect application of some of these settings could cause serious problems, so for this reason we strongly recommend that you discuss their use with WANdisco's support team before enabling them.

        Use Fusion URI and HDFS URI with HDFS file system
        URI Option D
        This "mixed mode" supports all the replication schemes (fusion://, hdfs:// and no scheme) and uses HDFS for the underlying file system, to support applications that aren't written to the HCFS specification.

        Setting up Node Location

        WD Fusion is designed to fit into deployments that have far-flung data centers. The Node Location setting is used to identify where in the world the data center is situated, using standard global positioning system coordinates. These coordinates will be used by any connected WD Fusion nodes to correctly place the node's location on the world map.

        location

        WD Fusion setting server location.

        Set up email notifications

        This section describes how to set up notification emails that will be triggered if one of the tracked system resources reaches a defined threshold.

        Important: Email notification is disabled by default. You must complete the following steps before any messages will be sent.

        Email Settings

        Email Notification Settings are located in the Zone section of the settings

        Complete the following steps to enable email notification:

        1. Enter your SMTP properties in the Server configuration tab.
        2. Enter recipient addresses in the Recipients tab.
        3. Tick the Enable check-box for each trigger-event for which you want an email notification sent out.
        4. [Optionally] You can customize the messaging that will be included in the notification email message by adding your own text in the Templates tab.

Notification emails

The following triggers support email notification. See the Templates section for more information.

Consistency Check Failing
Email sent if a consistency check fails.
CPU Load Threshold Hit
Dashboard graph for CPU Load has reached. See Dashboard Graphs Settings.
HDFS Usage Threshold Hit
Dashboard graph for Database partition disk usage has been reached. See Dashboard Graphs Settings.
Java Heap Usage Threshold Hit
The system's available Java Heap Threshold has been reached. See Dashboard Graphs Settings.
License Expiring
The deployment's WANdisco license is going to expire.
Node Down
One of the Nodes in your deploy is down.
Quorum Lost
One of the active replication groups is unable to continue replication due to the loss of one or more nodes.

Server config

The server config tab contains the settings for the SMTP email server that you will use for relaying your notification emails. You need to complete and check the provided details are correct first, before your notification emails can be enabled.

SMTP Settings

Email Notification Settings are located in the Zone section of the settings

SMTP Host
The hostname or IP address for your email relay server.
SMTP Port
The port used by your email relay service. SMTP default port is 25.
Connection Encryption:
Drop-down for choosing the type of encryption that the mail server uses, None, SSL or TLS are supported. If SSL or TLS are selected you should make sure that you adjust the SMTP port value, if required.
Authentication
Checkbox for indicating that a username and password are required for connecting to the mail server. If you tick the checkbox additional entry fields will appear.
SMTP Username
A username for connecting to the email server.
SMTP Password
A password for connecting to the email server.
From
Optional field for adding the sender email address that will be seen by to the recipient.
To
Optional field for entering an email address that can be used for testing that the email setup will work.
Update Settings
Button, click to store your email notification entries.
Reset Changes
Reloads the saved settings, undoing any changes that you have made in the template that have not been saved.
Send Test Email
Reloads the saved settings, undoing any changes that you have made in the template that have not been saved.

Recipients

The recipients tab is used to store one or more email addresses that can be used when sending out notification emails. You can enter any number of addresses, although you will still need to associate an entered address with a specific notification before it will be used. Seeing Adding recipients
NameNode Settings

Email Notification Settings - Adding recipients

Adding recipients

  1. Enter a valid email address for a recipient who should recieve a notification email from WD Fusion.
  2. Click the Add button.

You can repeat the procedure as many times as you like, you can send each different notification to a different recipient (by associating that recipient's address with the particular trigger), or you can send a single notification email to multiple recipients (by associating multiple addresses with the notification email.

Enable Notification Emails

Once you have working server settings valid recipient email addresses you can start to enable notification emails from the Alerts tab.

  1. Go to the Alerts tab and select a notification trigger for which you would like to send emails. For example Consistency Check Failing. Tick the Enabled checkbox.

    Important: If a trigger is not enabled, no email notification will ever be sent. Likewise, an enabled trigger will not send out notification emails unless recipients are added.

    NameNode Settings
  2. From the Add More Recipients window, click on one or more of the recipients that you entered into the Recipients tab. Once you have finished selecting recipients, click Add.
    NameNode Settings
  3. The email notification is now set up. You can choose to change/add additional recipients, review or customize the messaging by clicking on the Edit Template link.
    NameNode Settings

Templates

The Templates tab gives you access to the email default text, allowing you to review and customize with additional messaging.

Email Settings

Email templates

Consistency Check Failing
This is the trigger system event for which the notification email will be sent.
Subject
The email's subject line. A default value is set for each of the triggers, however, you can reword these by changing the text in the template.
Custom Message
This entry box lets you add your own messaging to the notification. This could be anything that might be useful to an on-duty administrator such as links to related documentation or contact details for the next level of support, etc.
Message Body
The message body contains the fixed payload of the notification email; you can't edit this element and it may contain specific error messaging taken from logs.

Example Notification Email

This is what an email notification looks like:

From: cluster-admin@organization.com>
Date: Mon, Jan 4, 2016 at 3:49 PM
Subject: WANdisco Fusion UI - Consistency Check Failing
To: admin@organization.com

Here is a custom message.
 - Custom messaging entered in the Template

Consistency Check Failing triggered a watch event, any relevant error message will appear below.
 - Default Message

The following directory failed consistency check:  

  /repl1
- Specific error message

==================== NODE DETAILS =====================  
Host Name     : xwstest-01.your.organization.com
IP address    : 10.0.0.146
IP port       : 6444
-------------------------------------------------------
Node Id       : wdfs1
Node Name     : wdfs1
Node status   : LOCAL
Node's zone   : zone1
Node location : location1
Node latitude : 11.0
Node longitude: 119.0
-------------------------------------------------------
Memory usage  : 0.0%
Disk usage    : 0.0%
Last update   : 2016.Jan.04 at 15:49:28 GMT
Time Now      : 2016.Jan.04 at 15:49:48 GMT
=======================================================
 - Standard footer
		

Setting up Kerberos

If the Hadoop deployment is secured using Kerberos you need to enable Kerberos in the WD Fusion UI. Use the following procedure:

  1. Look to the security procedures of your particular form of Hadoop:
  2. Running with unified or per-service principle:

    Unified
    Some Hadoop platforms are Kerberized under a single hdfs user, this is common in Cloudera deployments. For simplicity, this is what we recommend.
    • Generate a keytab for each of your WD Fusion nodes using the hdfs service, for clarification the steps below present a manual setup:
      ktadd -k fusion.keytab -norandkey hdfs/${hostname}@${krb_realm}

    Per-service
    • If your deployment uses separate principals for each HDFS service then you will need to set up a principal for WD Fusion.
    • On the KDC, using kadmin.local, create new principals for WD Fusion user and generate keytab file, e.g.:
      > addprinc -randkey hdfs/${hostname}@${krb_realm} 
      > ktadd -k fusion.keytab -norandkey hdfs/${hostname}@${krb_realm}
  3. Copy the generated keytab to a suitable filesystem location, e.g. /etc/wandisco/security/ on the WD Fusion server that will be accessible to your controlling system user, "hdfs" by default.

    Note: We don't recommend storing the keytab in Hadoop's own Kerberos /etc/hadoop/conf, given that this is overwritten by the cluster manager.

  4. Setting up handshake tokens

    By default, handshake tokens are created in the user's working directories, e.g. /user/jdoe. It is recommended that you create them elsewhere, using the following procedure:

    1. Open the core-site.xml file and add the following property:
      <property>
      	  <name>fusion.handshakeToken.dir</name>
      	  <value>/some/token/dir</value>
        </property>
      fusion.handshakeToken.dir
      This is the location where you want handshake tokens to be created for the cluster. E.G., If for DC1 you configure the "handshakeToken.dir" to be "/repl1/tokens/", then handshake tokens will be written in "/repl1/tokens/.fusion/.token_$USERNAME_$UUID" where $USERNAME is the username of the user connecting and $UUID will be a random UUID.

      Important requirement: All WD Fusion system users must have read and write permissions for the location.

    Important: Known issue running Teragen and Terasort
    There are known problems running Teragen and Terasort with FusionHdfs or FusionHcfs configurations. Some required directories are currently missing and will cause Terasort to hang. You can work around the problem by creating the following directories, then making sure that Yarn and MapR users are added and that they have access to the directories. E.g.,

    sudo -u hdfs hadoop fs -mkdir /user/yarn
    sudo -u hdfs hadoop fs -chown yarn /user/yarn
    sudo -u hdfs hadoop fs -mkdir /user/mapred
    sudo -u hdfs hadoop fs -chown mapred /user/mapred

    Set up Kerberos single KDC with Ambari

    The following procedures illustrate how to installing Kerberos, running with a single Key Distribution Center, under Ambari.

    When to use kadmin.local and kadmin?
    When performing the Kerberos commands in this procedure you can use kadmin.local or kadmin depending on your access and account:

    • IF you can log onto the KDC host directly, and have root access or a Kerberos admin account: use the kadmin.local command.
    • When accessing the KDC from a remove host, use the kadmin from any host, run one of the following:
      $ sudo kadmin.local
      or
      $ kadmin

    Setup Procedure

    1. Before you start, download and install the Java Cryptogrphic Extension (JCE) Unlimited Strength Jurisdiction Policy Files 7.
      unzip UnlimitedJCEPolicyJDK7.zip -d  /usr/jdk64/jdk1.7.0_67/jre/lib/security/
    2. Install the Kerberos server:
      yum install -y krb5-server krb5-libs krb5-auth-dialog krb5-workstation
    3. Edit /etc/krb5.conf and replace "EXAMPLE.COM" with your realm. E.g.
      sed -i "s/EXAMPLE.COM/g" /etc/krb5.conf /var/kerberos/krb5kdc/kdc.conf /var/kerberos/krb5kdc/kadm5.acl
      [logging]
       default = FILE:/var/log/krb5libs.log
       kdc = FILE:/var/log/krb5kdc.log
       admin_server = FILE:/var/log/kadmind.log
       
      [libdefaults]
       default_realm = DOMAIN.COM
       dns_lookup_realm = false
       dns_lookup_kdc = false
       ticket_lifetime = 24h
       renew_lifetime = 7d
       forwardable = true
       
      [realms]
       DOMAIN.COM = {
        kdc = host15-vm0.cfe.domain.com
        admin_server = host15-vm0.cfe.domain.com
       }
       
      [domain_realm]
       .wandisco.com = DOMAIN.COM
       wandisco.com = DOMAIN.COM				
      
    4. Edit /var/kerberos/krb5kdc/kdc.conf:
      
      [kdcdefaults]
       kdc_ports = 88
       kdc_tcp_ports = 88
        
      [realms]
       DOMAIN.COM = {
        #master_key_type = aes256-cts
        acl_file = /var/kerberos/krb5kdc/kadm5.acl
        dict_file = /usr/share/dict/words
        admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
        max_life = 24h 0m 0s
        max_renewable_life = 7d
       supported_enctypes = aes256-cts:normal aes128-cts:normal
      des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal
      des-cbc-md5:normal des-cbc-crc:normal
       }				
      
    5. Edit the /var/kerberos/krb5kdc/kadm5.acl and replace EXAMPLE.COM with your principle.
    6. To create a database, run
      /usr/sbin/kdb5_util create -s
    7. Start Kerberos service:
      /sbin/service krb5kdc start			
      /sbin/service kadmin start		
      
    8. Prepare your kerberos clients. Run
      
      yum install -y  krb5-libs krb5-workstation
      Repeat this on all other machines in the cluster to make them kerberos workstations connecting to the KDC. E.g.
      for i in {1..4}; do ssh root@vmhost17-nfs$i.cfe.domain.com 'yum install -y  krb5-libs krb5-workstation';done	
      
    9. Copy the /etc/krb5.conf file from the kerberos server node to all kerberos client nodes
      for i in {1..5}; do scp /etc/krb5.conf root@vmhost17-vm$i.cfe.domain.com:/etc/;done
      
    10. Create a user on all nodes: useradd -u 1050 testuser
      for i in {0..4}; do ssh root@vmhost17-nfs$i.cfe.domain.com 'useradd -u 1050 testuser';done
    11. Create principal and password for user (testuser):
      [root@vmhost17-vm0 ~]# kadmin.local
      Authenticating as principal root/admin@DOMAIN.COM with password.
      kadmin.local:  addprinc testuser/admin
      WARNING: no policy specified for testuser/admin@DOMAIN.COM; defaulting to no policy
      Enter password for principal "testuser/admin@DOMAIN.COM":
      Re-enter password for principal "testuser/admin@DOMAIN.COM":
      Principal "testuser/admin@DOMAIN.COM" created.
      kadmin.local:  exit
      [root@vmhost01-vm1 ~]# su - testuser
      [testuser@vmhost01-vm1 ~]$ kinit
      Password for testuser/admin@DOMAIN.COM:
      [testuser@vmhost01-vm1 ~]$ klist
      Ticket cache: FILE:/tmp/krb5cc_519
      Default principal: testuser/admin@DOMAIN.COM
      Valid starting     Expires            Service principal
      04/29/15 18:17:15  04/30/15 18:17:15  krbtgt/DOMAIN.COM@DOMAIN.COM renew until 04/29/15 18:17:15
      
    12. WD Fusion installation step

      During the WD Fusion Installation's Kerberos step, set the configuration for an existing Kerberos setup.

      Set up Kerberos single KDC on CDH cluster

      The following procedures illustrate how to installing Kerberos, running with a single Key Distribution Center, under CDH.

      Set up a KDC and Default Domain

      When to use kadmin.local and kadmin?
      When performing the Kerberos commands in this procedure you can use kadmin.local or kadmin depending on your access and account:

      • IF you can log onto the KDC host directly, and have root access or a Kerberos admin account: use the kadmin.local command.
      • When accessing the KDC from a remove host, use the kadmin from any host, run one of the following:
        $ sudo kadmin.local
        or
        $ kadmin

      Setup Procedure

      1. Before you start, download and install the Java Cryptogrphic Extension (JCE) Unlimited Strength Jurisdiction Policy Files 7.
        unzip UnlimitedJCEPolicyJDK7.zip -d  /usr/jdk64/jdk1.7.0_67/jre/lib/security/
      2. Install the Kerberos server:
        yum install -y krb5-server krb5-libs krb5-auth-dialog krb5-workstation
      3. Edit /etc/krb5.conf and replace "EXAMPLE.COM" with your realm. E.g.
        sed -i "s/EXAMPLE.COM/g" /etc/krb5.conf /var/kerberos/krb5kdc/kdc.conf /var/kerberos/krb5kdc/kadm5.acl
        [logging]
         default = FILE:/var/log/krb5libs.log
         kdc = FILE:/var/log/krb5kdc.log
         admin_server = FILE:/var/log/kadmind.log
         
        [libdefaults]
         default_realm = DOMAIN.COM
         dns_lookup_realm = false
         dns_lookup_kdc = false
         ticket_lifetime = 24h
         renew_lifetime = 7d
         forwardable = true
         
        [realms]
         DOMAIN.COM = {
          kdc = host15-vm0.cfe.domain.com
          admin_server = host15-vm0.cfe.domain.com
         }
         
        [domain_realm]
         .wandisco.com = DOMAIN.COM
         wandisco.com = DOMAIN.COM				
        
      4. Edit /var/kerberos/krb5kdc/kdc.conf:
        
        [kdcdefaults]
         kdc_ports = 88
         kdc_tcp_ports = 88
          
        [realms]
         DOMAIN.COM = {
          #master_key_type = aes256-cts
          acl_file = /var/kerberos/krb5kdc/kadm5.acl
          dict_file = /usr/share/dict/words
          admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
          max_life = 24h 0m 0s
          max_renewable_life = 7d
         supported_enctypes = aes256-cts:normal aes128-cts:normal
        des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal
        des-cbc-md5:normal des-cbc-crc:normal
         }				
        
      5. Edit the /var/kerberos/krb5kdc/kadm5.acl and replace EXAMPLE.COM with your principle.
      6. To create a database, run
        /usr/sbin/kdb5_util create -s
      7. Start Kerberos service:
        /sbin/service krb5kdc start			
        /sbin/service kadmin start		
        
      8. Prepare your kerberos clients. Run
        
        yum install -y  krb5-libs krb5-workstation
        Repeat this on all other machines in the cluster to make them kerberos workstations connecting to the KDC. E.g.
        for i in {1..4}; do ssh root@vmhost17-nfs$i.cfe.domain.com 'yum install -y  krb5-libs krb5-workstation';done	
        
      9. Copy the /etc/krb5.conf file from the kerberos server node to all kerberos client nodes
        for i in {1..5}; do scp /etc/krb5.conf root@vmhost17-vm$i.cfe.domain.com:/etc/;done
        
      10. Create a user on all nodes: useradd -u 1050 testuser
        for i in {0..4}; do ssh root@vmhost17-nfs$i.cfe.domain.com 'useradd -u 1050 testuser';done
      11. Create principal and password for user (testuser):
        [root@vmhost17-vm0 ~]# kadmin.local
        Authenticating as principal root/admin@DOMAIN.COM with password.
        kadmin.local:  addprinc testuser/admin
        WARNING: no policy specified for testuser/admin@DOMAIN.COM; defaulting to no policy
        Enter password for principal "testuser/admin@DOMAIN.COM":
        Re-enter password for principal "testuser/admin@DOMAIN.COM":
        Principal "testuser/admin@DOMAIN.COM" created.
        kadmin.local:  exit
        [root@vmhost01-vm1 ~]# su - testuser
        [testuser@vmhost01-vm1 ~]$ kinit
        Password for testuser/admin@DOMAIN.COM:
        [testuser@vmhost01-vm1 ~]$ klist
        Ticket cache: FILE:/tmp/krb5cc_519
        Default principal: testuser/admin@DOMAIN.COM
        Valid starting     Expires            Service principal
        04/29/15 18:17:15  04/30/15 18:17:15  krbtgt/DOMAIN.COM@DOMAIN.COM renew until 04/29/15 18:17:15
        
      12. Create HDFS principal

      13. Kadmin.local: addprinc hdfs@DOMAIN.COM 
        Create hdfs.keytab and move hdfs.keytab file in the /etc/cloudera-scm-server/ directory on the host where you are running the Cloudera Manager Server. Make sure that the hdfs.keytab file has readable permissions for all users:
        kadmin: xst -k hdfs.keytab hdfs@DOMAIN.COM
        mv hdfs.keytab /etc/cloudera-scm-server/
        chmod +r /etc/cloudera-scm-server/ hdfs.keytab

      Create a Kerberos Principal and Keytab File for the Cloudera Manager Server

      The following sequence is an example procedure for creating the Cloudera Manager Server principal and keytab file for MIT Kerberos.

      1. In the kadmin.local or kadmin shell, type in the following command to create the Cloudera Manager Service principal:
        kadmin: addprinc -randkey cloudera-scm/admin@WANDISCO.COM
      2. Create the Cloudera Manager Server cmf.keytab file:
        kadmin: xst -k cmf.keytab cloudera-scm/admin@DOMAIN.COM

        Important: The Cloudera Manager Server keytab file must be named cmf.keytab because that name is hard-coded in Cloudera Manager.

      Deploying the Cloudera Manager Server Keytab

      After obtaining or creating the Cloudera Manager Server principal and keytab, follow these instructions to deploy them:

      1. Move the cmf.keytab file to the /etc/cloudera-scm-server/. This is the directory on the host where you are running the Cloudera Manager Server.
        $ mv cmf.keytab /etc/cloudera-scm-server/
      2. Ensure that the cmf.keytab file is only readable by the Cloudera Manager Server user account cloudera-scm.
        sudo chown cloudera-scm:cloudera-scm /etc/cloudera-scm-server/cmf.keytab
        
        sudo chmod 600 /etc/cloudera-scm-server/cmf.keytab
        
      3. Add the Cloudera Manager Server principal (cloudera-scm/admin@DOMAIN.COM) to a text file named cmf.principal and store the cmf.principal file in the /etc/cloudera-scm-server/ directory on the host where you are running the Cloudera Manager Server.
      4. Make sure that the cmf.principal file is only readable by the Cloudera Manager Server user account cloudera-scm.
        sudo chown cloudera-scm:cloudera-scm /etc/cloudera-scm-server/cmf.principal
        
        sudo chmod 600 /etc/cloudera-scm-server/cmf.principal
        

        Note: For Single KDC copy cmf.keytab and cmf.principal to another CM node:

        scp /etc/cloudera-scm-server/cmf* vmhost17-vm0.bdfrem.wandisco.com:/etc/cloudera-scm-server/
            

        Configure the Kerberos Default Realm in the Cloudera Manager Admin Console

        1. In the Cloudera Manager Admin Console, select Administration > Settings.
        2. Click the Security category, and enter the Kerberos realm for the cluster in the Kerberos Security Realm field that you configured in the krb5.conf file.
        3. Click Save Changes.

        Adding Gateway roles to all YARN hosts.

        1. From the Services tab, select your YARN service.
        2. Click the Instances tab.
        3. Click Add Roles and choose Gateway role.
        4. Select all hosts and click Install.

        Enable Hadoop Security

        You can do this by hand CM Enable Security

        Cloudera Manager Kerberos Wizard

        After configuring kerberos, you now have a working Kerberos server and can secure the Hadoop cluster. The wizard will do most of the heavy lifting; you just have to fill in a few values.

        1. To start, log into Cloudera Manager by going to http://your_hostname:7180 in your browser. The user ID and Password are the same as those used for accessing your Management Endpoint (Ambari or Cloudera Manager, etc.) or if you're running without an manager, such as with a Cloud deployment, then they will be set in a properties file.
        2. There are lots of productivity tools here for managing the cluster but ignore them for now and head straight for the Administration > Kerberos wizard.

        3. Click on the "Enable Kerberos" button.
        4. Check each KRB5 Configuration item and select Continue.
          kerberos CM configuration screen
        5. The Kerberos Wizard needs to know the details of what the script configured. Fill in the entries as follows:
          • KDC Server Host KDC_hostname
          • Kerberos Security Realm: DOMAIN.COM
          • Kerberos Encryption Types: aes256-cts-hmac-sha1-96
          Click Continue.
        6. You want Cloudera Manager to manage the krb5.conf files in your cluster so, please check "Yes" and then select "Continue."
        7. Enter the credentials for the account that has permissions to create other listeners.
          User: testuser@WANDISCO.COM
          Password: password for testuser@WANDISCO.COM
          
        8. The next screen provides good news. It lets you know that the wizard was able to successfully authenticate.
        9. On this step setup wizard will create Kerberos principals for each service in the cluster.
        10. You're ready to let the Kerberos Wizard do its work. You should select I'm ready to restart the cluster now and then click Continue.
        11. Successfully enabled Kerberos.
        12. You now running a Hadoop cluster secured with Kerberos.
        13. WD Fusion installation step

          You should enter paths to /etc/krb5.conf file and to hdfs.keytab file and then select principal hdfs.

          Kerberos and HDP's Transparent Data Encryption

          There are some extra steps required to overcome a class loading error that occurs when WD Fusion is used with at-rest encrypted folders. Specifically, cluster config changes described as follows:

          <property>
          <name>hadoop.kms.proxyuser.fusion.users</name>
          <value>*</value>
          </property>
                
          <property>
          <name>hadoop.kms.proxyuser.fusion.groups</name>
          <value>*</value>
          </property>
           
          <property>
          <name>hadoop.kms.proxyuser.fusion.hosts</name>
          <value>*</value>
          </property>

          Setting up SSL encryption for DConE traffic

          WD Fusion supports the use of Secure Socket Layer encryption (SSL) for securing its replication traffic. To enable this encryption you need to generate a keypair that must be put into place on each of your WD Fusion nodes. You then need to add some variables to the application.properties file.

          1. Open a terminal and navigate to <INSTALL_DIR>/etc/wandisco/config.

          2. Within /config make a new directory called ssl.
            mkdir ssl

          3. Navigate into the new directory.
            cd ssl

          4. Copy your private key into the directory. If you don't already have keys set up you can use JAVA's keygen utility, using the command:
            keytool -genkey -keyalg RSA -keystore wandisco.ks -alias server -validity 3650 -storepass <YOUR PASSWORD>

            Read more about the Java keystore generation tool in the KB article - Using Java Keytool to manage keystores

            Ensure that the system account that runs the WD Fusion server process has sufficient privileges to read the keystore files.

            Java keytool options

            Variable Name Description
            -genkey Switch for generating a key pair (a public key and associated private key). Wraps the public key into an X.509 v1 self-signed certificate, which is stored as a single-element certificate chain. This certificate chain and the private key are stored in a new keystore entry identified by alias.
            -keyalg RSA The key algorithm, in this case RSA is specified.
            wandisco.ks This is file name for your private key file that will be stored in the current directory.
            - alias server Assigns an alias "server" to the key pair. Aliases are case-insensitive.
            -validity 3650 Validates the keypair for 3650 days (10 years). The default would be 3 months.
            - storepass <YOUR PASSWORD> This provides the keystore with a password.

            If no password is specified on the command, you'll be prompted for it. Your entry will not be masked so you (and anyone else looking at your screen) will be able to see what you type.

            Most commands that interrogate or change the keystore will need to use the store password. Some commands may need to use the private key password. Passwords can be specified on the command line (using the -storepass and -keypass options).
            However, a password should not be specified on a command line or in a script unless it is for testing purposes, or you are on a secure system.

            The utility will prompt you for the following information

            What is your first and last name?  [Unknown]:  
            What is the name of your organizational unit?  [Unknown]:  
            What is the name of your organization?  [Unknown]:  
            What is the name of your City or Locality?  [Unknown]:  
            What is the name of your State or Province?  [Unknown]:  
            What is the two-letter country code for this unit?  [Unknown]:  
            Is CN=Unknown, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=Unknown correct?  [no]:  yes
            
            Enter key password for <mykey> (RETURN if same as keystore password):
          5. With the keystore now in place, you'll now need to add variables to the application.properties

            SSL DConE Encryption Variables for application.properties

            Variable Name Example Description
            ssl.enabled true Requires a "true" or "false" value. Clearly when the value is set to false, none of the other variables will be used.
            ssl.debug true Requires a "true" or "false" value. When set to true debugging mode is enabled.
            ssl.keystore ./properties/wandisco.ks The path to the SSL private Keystore file that is stored in the node. By default this is called "wandisco.ks".
            ssl.key.alias wandisco The assigned alias for the key pair. Aliases are case-insensitive.
            ssl.keystore.password <a password> The SSL Key password. This is described in more detail in Setting a password for SSL encryption.
            ssl.truststore ./properties/wandisco.ks The path to the SSL private truststore file that is stored in the node. By default this is called "wandisco.ks" because, by default the keystore and truststore are one and the same file, although it doesn't have to be.
            ssl.truststore.password "bP0L7SY7f/4GWSdLLZ3e+ The truststore password. The password should be encrypted.

            Changes in any of these values require a restart of the DConE service. Any invalid value will restart the replicator and no DConE traffic will flow.

          Setting the server key

          In the keystore, the server certificate is associate with a key. By default, we look for a key named server to validate the certificate. If you use a key for the server with a different name, enter this in the SSL settings.

          SSL Troubleshooting

          A complete debug of the SSL logging will be required to diagnose the problems. To capture the debugging, ensure that the variable debugSsl is set to "true".

          To enable the logging of SSL implemented layer, turn the logging to FINEST for 'com.wandisco.platform.net' package.

          Enable SSL for Hadoop Services

          This section shows you how to enable SSL encryption for Hadoop's native services such as HDFS, Yarn or MapReduce.

          1. On ALL nodes create key directories:
            /etc/security/serverKeys and /etc/security/clientKeys
          2. On all nodes, create keystore files:
            cd /etc/security/serverKeys
            keytool -genkeypair -alias $HOSTNAME -keyalg RSA -keysize 2048 -dname CN=$HOSTNAME,OU=Dev,O=BigData,L=SanRamon,ST=ca,C=us -keypass $PASSWORD -keystore $HOSTNAME.ks -storepass $PASSWORD
            
            There's further explanation of what these options do, see the key Java keytool options
          3. On all nodes export the certificate public key to a certificate file:
            cd /etc/security/serverKeys
            keytool -exportcert -alias $HOSTNAME -keystore $HOSTNAME.ks -rfc -file $HOSTNAME.crt -storepass $PASSWORD
          4. On all nodes, import the certificate into truststore file:
            cd /etc/security/serverKeys
            keytool -importcert -noprompt -alias $HOSTNAME -file $HOSTNAME.crt -keystore $HOSTNAME.trust -storepass $PASSWORD
            
          5. Create a single truststore file containing the public key from all certificates (this will be for clients) start on node1:
            cd /etc/security/serverKeys
            Copy trust store file from current node to next one and redo all steps above.
          6. From last node copy trust store, which has all certificates to all servers under /etc/security/clientKeys/all.jks
          7. On all nodes, copy keystore to "service".ks (e.g. hdfs.ks)

          Keystores are used in two ways:

          • The keystore contains private keys and certificates used by SSL servers to authenticate themselves to SSL clients. By convention, such files are referred to as keystores.
          • When used as a truststore, the file contains certificates of trusted SSL servers, or of Certificate Authorities trusted to identify servers. There are no private keys in the truststore.
          Most commonly, cert-based authentication is only done in one direction server->client. When a client also authenticates with a certificate this is called mutual authentication.

          While all SSL clients must have access to a truststore, it is not always necessary to create and deploy truststores across a cluster. The standard JDK distribution includes a default truststore which is pre-provisioned with the root certificates of a number of well-known Certificate Authorities. If you do not provide a custom truststore, the Hadoop daemons load this default truststore. Therefore, if you are using certificates issued by a CA in the default truststore, you do not need to provide custom truststores. However, you must consider the following before you decide to use the default truststore:

          If you choose to use the default truststore, it is your responsibility to maintain it. You may need to remove the certificates of CAs you do not deem trustworthy, or add or update the certificates of CAs you trust. Use the keytool utility to perform these actions.

          Security Considerations

          keystores contain private keys. truststores do not. Therefore, security requirements for keystores are more stringent:

          • Hadoop SSL requires that truststores and the truststore password be stored, in plaintext, in a configuration file that- is readable by all.
          • Keystore and key passwords are stored, in plaintext, in a file that is readable only by members of the appropriate group.

          These considerations should guide your decisions about which keys and certificates you will store in the keystores and truststores that you will deploy across your cluster.

          Keystores should contain a minimal set of keys and certificates. Ideally you should create a unique keystore for each host, which would contain only the keys and certificates needed by the Hadoop SSL services running on the host. Usually the keystore would contain a single key/certificate entry. However, because truststores do not contain sensitive information you can safely create a single truststore for an entire cluster. On a production cluster, such a truststore would often contain a single CA certificate (or certificate chain), since you would typically choose to have all certificates issued by a single CA.

          Important: Do not use the same password for truststores and keystores/keys. Since truststore passwords are stored in the clear in files readable by all, doing so would compromise the security of the private keys in the keystore.

          SSL roles for Hadoop Services

          Service SSL Role
          HDFS server and client
          MapReduce server and client
          YARN server and client
          HBase server
          Oozie server
          Hue client

          SSL servers load the keystores when starting up. Clients then take a copy of the truststore and uses it to validate the server's certificate.

          Configure SSL for HDFS, YARN and MapReduce

          Before you begin

          Ensure keystores/certificates are accessible on all hosts running HDFS, MapReduce or YARN. As these services also run as clients they also need access to the truststore. (As mentioned, it's okay to put the truststores on all nodes as you can't always determine which hosts will be running the relevant services.)

          keystores must be owned by the hadoop group and have permissions 0440 (readable by owner and group). truststores must have permission 0444 (readable by all).

          You'll need to specify the absolute paths to keystore and truststore files - these paths need to be valid for all hosts - this translates into a requirement for all keystore file names for a given service to be the same on all hosts.

          Multiple daemons running on a host can share a certificate. For example, in case there is a DataNode and an Oozie server running on the same host, they can use the same certificate.

          Configuring SSL for HDFS

          1. In Ambari, navigate to the HDFS service edit the configuration.
          2. Type SSL into the search field to show the SSL properties.
          3. Make edits to the following properties:
            Property Description
            SSL Server Keystore File Location Path to the keystore file containing the server certificate and private key.
            SSL Server Keystore File Password Password for the server keystore file.
            SSL Server Keystore Key Password Password that protects the private key contained in the server keystore.
          4. If you don't plan to use the default truststore, configure SSL client truststore properties:
            Property Description
            Cluster-Wide Default SSL Client Truststore Location Path to the client truststore file. This truststore contains certificates of trusted servers, or of Certificate Authorities trusted to identify servers.
            Cluster-Wide Default SSL Client Truststore Password Password for the client truststore file.
          5. We recommend that you also enable web UI authentication for the HDFS service, providing that you have already secured the HDFS service. Enter web consoles in the search field to bring up Enable Authentication for HTTP Web-Consoles property. Tick the check box to enable web UI authentication.
            Property Description
            Enable Authentication for HTTP Web-Consoles Enables authentication for hadoop HTTP web-consoles for all roles of this service.
          6. Now the necessary edits are complete, click Save Changes.
          7. Follow the next section for setting up SSL for YARN/MapReduce.

          Configuring SSL for YARN / MapReduce

          Follow these steps to configure SSL for YARN or MapReduce services.
          1. Navigate to the YARN or MapReduce service and click Configuration.
          2. In the search field, type SSL to show the SSL properties.
          3. Edit the following properties according to your cluster configuration:
            Property Description
            SSL Server Keystore File Location Path to the keystore file containing the server certificate and private key.
            Enable Authentication for HTTP Web-Consoles Password for the server keystore file.
            SSL Server Keystore Key Password Password for the client truststore file.
          4. We recommend that you also enable web UI authentication for the HDFS service, providing that you have already secured the HDFS service. Enter web consoles in the search field to bring up Enable Authentication for HTTP Web-Consoles property. Tick the check box to enable web UI authentication.
            Property Description
            Enable Authentication for HTTP Web-Consoles Enables authentication for hadoop HTTP web-consoles for all roles of this service.
          5. Click Save Changes.
          6. Navigate to the HDFS service and in the search field, type Hadoop SSL Enabled. Click the value for the Hadoop SSL Enabled property and select the checkbox to enable SSL communication for HDFS, MapReduce, and YARN.
            Property Description
            Hadoop SSL Enabled Enable SSL encryption for HDFS, MapReduce, and YARN web UIs, as well as encrypted shuffle for MapReduce and YARN.
          7. Restart all affected services (HDFS, MapReduce and/or YARN), as well as their dependent services.