WANdisco Fusion ®

1. Release Notes

Releases:

December 2017 - Release 2.10.5.2 Hotfix
November 2017 - Release 2.10.5.1
November 2017 - Release 2.10.5
November 2017 - Release 2.10.3.4 Hotfix
October 2017 - Release 2.10.4
September 2017 - Release 2.10.3.2
August 2017 - Release 2.10.3.1
May 2017 - Release 2.10.2
April 2017 - Release 2.10

1.1. Release 2.10.5.2 Hotfix Build 807

22 December 2017

WANdisco Fusion 2.10.5.2 is a hotfix release for customers using 2.10.5.x versions of the product. It addresses a small number of minor issues.

WANdisco advises that all customers using the product should apply this hotfix to their environment.

1.1.1. Installation

Application of the hotfix is performed by the update of the IHC server RPM, the Fusion server RPM and the client stack or package. e.g. the following packages should be updated for HDP 2.6.0:

  fusion-hcfs-hdp-2.6.0-ihc-server-2.10.5.2.el6-xxxx.noarch.rpm
  fusion-hcfs-hdp-2.6.0-server-2.10.5.2.el6-xxxx.noarch.rpm
  fusion-hcfs-hdp-2.6.0-2.10.5.2.stack.tar.gz

Please contact WANdisco support for assistance with this process.

1.1.2. Issues Resolved

This hotfix addresses the following issues.

FUS-4785 - Unsuccessful EMR-HDFS replication following target restart

The restart of a writer Fusion node on an AWS EMR zone during file transfer could result in that transfer failing.

FUS-4803 - Document need for exclusion for HDI to HDI replication

Replication between HD Insight clusters requires an exclusion patter for replication rules to account for lack of append support in default HDI configuration.

FUS-4807 - S3 content replication correction

Under some circumstances, file content replication to an S3 zone could result in target object sizes that were larger than those in the source zone.

FUS-4808 - Document link correction

Product documentation links to sample application.properties files are corrected.

FUS-4832/FUS-4314 - Support AWS SSE-S3

The fs.fusion.s3.sse.enabled property has been introduced to allow support of AWS SSE-S3. Once SSE is enabled on the bucket, this configuration property can be set to true to enable SSE-S3.

FUS-4841 - Document SSL for Fusion on AWS

Documentation for the use of SSL in an AWS deployment has been updated.
http://docs.wandisco.com/bigdata/wdfusion/2.10/#SSL-with-AWS

FUS-4852 - Improved repair_task_cleaner.sh script

Correction to an auxiliary script that cleans stale repair tasks.

FUS-4869 - License update instructions corrected

Documentation for replacing the product license has been updated.

FUS-4890 - ClassNotFound for ConfigurationRuntimeException with Presto

Fusion indicates at runtime when a configuration needed for correct Presto operation is not present.

1.1.3. Available Packages

Packages are available for all 2.10.5.x supported platforms. Please contact WANdisco support for access to the hotfix packages.

WANdisco Fusion 2.10.5.2 is a hotfix release for customers using 2.10.5.x versions of the product. It addresses a small number of minor issues.

WANdisco advises that all customers using the product should apply this hotfix to their environment.

1.1.4. Installation

Application of the hotfix is performed by the update of the IHC server RPM, the Fusion server RPM and the client stack or package. e.g. the following packages should be updated for HDP 2.6.0:

  fusion-hcfs-hdp-2.6.0-ihc-server-2.10.5.2.el6-xxxx.noarch.rpm
  fusion-hcfs-hdp-2.6.0-server-2.10.5.2.el6-xxxx.noarch.rpm
  fusion-hcfs-hdp-2.6.0-2.10.5.2.stack.tar.gz

Please contact WANdisco support for assistance with this process.

1.1.5. Issues Resolved

This hotfix addresses the following issues.

FUS-4785 - Unsuccessful EMR-HDFS replication following target restart

The restart of a writer Fusion node on an AWS EMR zone during file transfer could result in that transfer failing.

FUS-4803 - Document need for exclusion for HDI to HDI replication

Replication between HD Insight clusters requires an exclusion patter for replication rules to account for lack of append support in default HDI configuration.

FUS-4807 - S3 content replication correction

Under some circumstances, file content replication to an S3 zone could result in target object sizes that were larger than those in the source zone.

FUS-4808 - Document link correction

Product documentation links to sample application.properties files are corrected.

FUS-4832/FUS-4314 - Support AWS SSE-S3

The fs.fusion.s3.sse.enabled property has been introduced to allow support of AWS SSE-S3. Once SSE is enabled on the bucket, this configuration property can be set to true to enable SSE-S3.

FUS-4841 - Document SSL for Fusion on AWS

Documentation for the use of SSL in an AWS deployment has been updated.
http://docs.wandisco.com/bigdata/wdfusion/2.10/#SSL-with-AWS

FUS-4852 - Improved repair_task_cleaner.sh script

Correction to an auxiliary script that cleans stale repair tasks.

FUS-4869 - License update instructions corrected

Documentation for replacing the product license has been updated.

FUS-4890 - ClassNotFound for ConfigurationRuntimeException with Presto

Fusion indicates at runtime when a configuration needed for correct Presto operation is not present.

1.1.6. Available Packages

Packages are available for all 2.10.5.x supported platforms. Please contact WANdisco support for access to the hotfix packages.

1.2. Release 2.10.5.1 Build 805

23 November 2017

WANdisco Fusion 2.10.5.1 is a minor release that fixes a handful of issues that could impact 2.10.5 deployments:

1.2.1. General Improvements

FUS-4620 - OutOfMemoryError: Java heap space fix

Fixed an issue that could result in OutofMemoryError as a result of FUS-4643-2.

FUS-4714 - On renameDirToReplicated, non-writer will never know the source zone is complete

Fixed an issue where the none-writer zone was unable to detect that a repair was completed on the source zone, allowing task records to accumulate in non-writers.

FUS-4700 - EMR sending 0-length request due to eventual consistency

We’ve clarified in the documentation that when replicating EMR to LocalFileSystem, you must enable "consistent view" to avoid replication errors that result in 0-length files.

FUS-4718 - Incorrect assertion on writer heartbeat

Removed incorrect assertion in debug builds.

1.3. Release 2.10.5 Build 801

14 November 2017

WANdisco Fusion 2.10.5 is a minor release that offers bug fixes and other improvements as detailed below.

1.3.1. Installation

Find detailed installation instructions in the user guide at http://docs.wandisco.com/bigdata/wdfusion/install.html#procedure.

1.3.2. Upgrades from Earlier Versions

As a minor release, Fusion 2.10.5 supports a simplified upgrade process for existing users of Fusion 2.10.3 and later. Please consult WANdisco support for details of the upgrade process.

1.3.3. General Improvements

FUS-4473 - Repair from S3 to HDFS modifies permissions for files

WD Fusion no longer applies default permissions on repaired files when the source zone does not have permissions for content. This could previously affect the outcome of a repair from an S3 zone to an HDFS zone.

FUS-4628 - Repair task accumulation

Repair tasks that were initiated before, and that completed after the failure of an HDFS NameNode were previously not marked as Done, resulting in the accumulation of tasks in the Fusion server.

FUS-4643, FUS-4683 - Restart of non-writer can cause re-execution of agreements

Under specific conditions, a non-writer Fusion node that becomes a writer can execute agreements that have already been processed by the previous writer node. For this to occur, a non-writer node that has been re-started must have switched to the writer role after observing continued agreement execution by a writer which has not failed. This can happen as a result of intermittent network failure or very large garbage collection pauses at that writer node.

Because the outcome of agreement re-execution is dependent on intervening activity in the underlying storage, the impact of this issue is indeterminate, however it can result in data loss.

FUS-4602 - Repair Resource returns 500 for valid Task ID

Under some scenarios, querying the /fs/repair/task endpoint results in an HTTP 500 response code instead of the expected task information.

FUS-4550 - Fusion client failure when name service URI contains whitespace

Cilent applications that reference the file system with a URI that contains both an authority and a path that has a whitespace character could fail to initialize their file system reference correctly. This could result in job failure or failure to access specific file system content.

FUS-4489 - null Fusion authority incorrectly applied

When configured with the fusion:// URI scheme, clients that do not provide an authority in the URI may fail to obtain a file system reference correctly.

FUS-4493 - Compatibility with 2.10.3.1

The WD Fusion 2.10.4 release introduced a serialization change the prevented wire-level compatibility with environments operating WD Fusion 2.10.3.x versions.

FUS-4002 - Fusion to IHC connections should use SO_REUSEADDR

WD Fusion IHC servers were not defaulting socket configuration to accommodate high rates of connection re-establishment. This can be worked around by configuration of kernel properties to modify those defaults, and the 2.10.5 release does not require that configuration change to be made in the operating system.

FUS-4471 Removing last replicated directory causes NPE

A WD Fusion environment that has only a single replication rule may not succeed in removing that rule on request.

FUS-4394 - MapR Fusion client RPM does not correctly modify hadoop-env.sh

Installation of the MapR client library can result in an incorrect attempt to modify the cluster hadoop-env.sh script.

FUS-4490 - Cloudera Navigator Metadata Server logs classpath error

Actions taken on Cloudera parcel installation may not correctly set the libraries referenced by Cloudera Navigator.

FUS-4556 - Update Azure to azure file size limit

Azure to Azure replication in WD Fusion 2.10.3.1 would not accommodate files larger than 4MB due to a lack of support for appends. See Known Issue.

FUI-5267 - Additional .hive-staging default exclusion rule

Default exclusion rules did not include .hive-staging correctly.

FUI-4579 - Support object store endpoints that require SSL certificates

Installation of WD Fusion for S3 can fail if the endpoint requires SSL certificates.

FUI-4909 - Add explicit HTTPS support to Swift installer

Swift installer requires explicit configuration for HTTPS endpoints and improved validation.

FUI-5219 - EMR UI installer validation fails

A specific set of actions in the installer could result in failure of validation for EMR environments only.

FUI-5253 - Incomplete repair listing

A failure of the repair API to provide all repair tasks from the non-initiating node when a mix of historical and ongoing repairs are present.

FUI-5256 - Resolution for dependency with CVE-2104-0114

Resolution for a benign, but reported exposure to CVE-2014-0114 by updating the version of commons-beanutils to 1.9.3.

FUI-5269 - Graph % shown only for one node in dashboard

If the ui.hostname property is left at default 0.0.0.0 setting, it can affect the ability for the dashboard to display graph information for the node.

FUI-4649 - Improved link generation for UI hosts in HA zones

If the WD Fusion UI is configured to run on a host that is not that of the Fusion server, links generated to that node from other UI instances are incorrect, using the hostname from the Fusion server.

FUI-5229 - Silent installer for Kerberos without Ambari management

Silent installer for Fusion with Hive in a non-managed environment was not catered for.

FUI-5260 - UI RPM upgrade Java detection on unmanaged node

In-place upgrade from 2.10.2 in a mixed HDP/S3 configuration can fail to detect Java on an unmanaged node.

FUI-5261 - UI RPM upgrade user detection on unmanaged node

In-place upgrade from 2.10.2 in a mixed HDP/S3 configuration can fail to detect the user correctly on an unmanaged node.

FUI-4483 - LocalFS installer client step improved

Change in text displayed during client installation for LocalFS variant.

FUI-4548 - Update WD Hive installer documentation links

Correct links to documentation for WD Hive installation.

FUI-5221 - Kerberos settings tab shown for EMR and S3

Kerberos settings tab is not required for EMR or S3 variants.

FUI-5247 - Unable to update custom UI settings

Custom UI settings when a custom port is used may not update correctly.

FUI-5248 - Installer redirects to IP

Using custom setting for UI port and host may result in redirection to the UI server’s IP address rather than hostname.

HIVE-639 - Hive tab will now show records if DB has whitespace in location

The Fusion UI does not show table or database information if a database exists with whitespace in its location path.

HIVE-572 - Unable to truncate table in HDP

If Kerberos and SSL for the cluster manager are enabled after WD Fusion installation, it can affect operation of truncate table.

HIVE-608 - Correct permissions for Hive Stack configuration file

The Hive Stack applied in HDP dpeloyments did not have correct permissions on the configuration file.

1.3.4. Available Packages

This release of WANdisco Fusion supports the following versions of Hadoop:

  • ASF Apache Hadoop 2.5.0 - 2.7.0

  • CDH 5.2.0 - 5.11.0

  • HDP 2.1.0 - 2.6.2

  • MapR 4.0.1 - 5.2.0

  • IOP (BigInsights) 4.0 - 4.2.5

The trial download includes the installation packages for CDH and HDP distributions only.

1.3.5. System Requirements

Before installing, ensure that your systems, software and hardware meet the requirements found in our online user guide at http://docs.wandisco.com/bigdata/wdfusion.

1.3.6. Third-Party Component Interoperability

WANdisco Fusion is interoperable with a wide variety of systems, including Hadoop distributions, object storage platforms, and cloud environments.

  • Amazon S3

  • Amazon EMR 5.0, 5.3, 5.4

  • Ambari 1.6, 1.7, 2.0, 3.1

  • Apache Hadoop 2.5.0 - 2.7.0

  • CDH 5.2 - 5.11

  • EMC Isilon 7.2, 8.0

  • Google Cloud Storage

  • Google Cloud Dataproc

  • HDP 2.1.0 - 2.6.2

  • IBM BI 4.0 - 4.2.5

  • MapR M4.0 - M5.2

  • Microsoft Azure Blob Storage

  • Microsoft Azure HDInsights 3.2 - 3.6

  • MySQL, Oracle (Hive Metastore)

  • Oracle BDA, Oracle BDCS

1.3.7. Client Applications Supported

WANdisco Fusion is architected for maximum compatibility and interoperability with applications that use standard Hadoop File System APIs. All applications that use the standard Hadoop Distributed File System API or any Hadoop-Compatible File System API should be interoperable with WANdisco Fusion, and will be treated as supported applications. Additionally, Fusion supports the replication of content with Amazon S3 and S3-compatible objects stores, locally-mounted file systems, and NetApp NFS devices, but does not require or provide application compatibility libraries for these storage services.

1.3.8. Known Issues

Fusion 2.10.5 includes a small set of known issues with workarounds. In each case, resolution for the known issues is underway.

FUS-387

Renaming the parent directory of a location with current file transfers may result in incomplete transfer

In some circumstances, modification of the metadata for a parent directory within a replicated location can prevent the completion of content transfer that is underway for files underneath that directory. Fusion’s metadata consistency is unaffected, but file content may not be available in full. Consistency check and repair can be used to both detect and resolve any resulting missing content.

FUS-3022

Fusion does not support truncate command

The public boolean truncate(Path f, long newLength) operation in org.apache.hadoop.fs.FileSystem (> 2.7.0) is not yet supported. Files will be truncated only in the cluster where the operation is initiated. Consistency check and repair can be used to both detect and resolve any resulting inconsistencies.

FUS-3714

Fusion does not support concat() operation

The public void concat(Path trg, Path[] psrcs) operation in org.apache.hadoop.fs.FileSystem is not yet supported, and will result in filesystem inconsistency. Consistency check and repair can be used to both detect and resolve any resulting inconsistencies.

FUS-4556

Replication between Azure HDI and objective-based storage solutions, e.g., Azure HDI ←→ Azure HDI or Azure HDI ←→ Amazon EMR will fail because these platforms can’t support file append operations. You can workaround this problem by setting up an exclusion rule with the pattern /**/*._COPYING_ for your replicated folders.

1.4. Release 2.10.3.4 Hotfix

10 November 2017

WD Fusion 2.10.3.4 is a hotfix release for customers using 2.10.3.x versions of the product. It addresses a small number of important issues including one that under adverse network conditions can result in data inconsistencies.

WANdisco advises that all customers using the product should apply this hotfix to their environment.

1.4.1. Hotfix installation

Application of the hotfix is performed by the update of the IHC server RPM, the Fusion server RPM and the client stack or package. e.g. the following packages should be updated for HDP 2.6.0:

  fusion-hcfs-hdp-2.6.0-ihc-server-2.10.3.4.el6-2491.noarch.rpm
  fusion-hcfs-hdp-2.6.0-server-2.10.3.4.el6-2491.noarch.rpm
  fusion-hcfs-hdp-2.6.0-2.10.3.4.stack.tar.gz

Please contact WANdisco support for assistance with this process.

1.4.2. Issues Resolved

This hotfix addresses the following issues.

FUS-4643 - Restart of non-writer can cause re-execution of agreements

Under specific conditions, a non-writer Fusion node that becomes a writer can execute agreements that have already been processed by the previous writer node. For this to occur, a non-writer node that has been re-started must have switched to the writer role after observing continued agreement execution by a writer which has not failed. This can happen as a result of intermittent network failure or very large garbage collection pauses at that writer node.

Because the outcome of agreement re-execution is dependent on intervening activity in the underlying storage, the impact of this issue is indeterminate, however it can result in data loss.

FUS-4602 - Repair Resource returns 500 for valid Task ID

Under some scenarios, querying the /fs/repair/task endpoint results in an HTTP 500 response code instead of the expected task information.

FUS-4550 - Fusion client failure when name service URI contains whitespace

Cilent applications that reference the file system with a URI that contains both an authority and a path that has a whitespace character would fail to initialize their file system reference correctly. This could result in job failure or failure to access specific file system content.

1.4.3. Available Packages

Packages are available for all 2.10.3.x supported platforms. Please contact WANdisco support for access to the hotfix packages.

1.5. Release 2.10.4 Build 630

2 October 2017

WANdisco is pleased to present WD Fusion 2.10.4. This release adds support for a number of new Hadoop distribution versions, and includes new features that are detailed below. This is a minor release, and offers bug fixes and other improvements as detailed below.

1.5.1. Installation

Find detailed installation instructions in the user guide at http://docs.wandisco.com/bigdata/wdfusion/install.html#procedure.

1.5.2. Upgrades from Earlier Versions

As a minor release, Fusion 2.10.4 supports a simplified upgrade process for existing users of Fusion 2.10.3. Please consult WANdisco support for details of the upgrade process.

1.5.3. General Improvements

URISyntaxException when Illegal character in path

Fixes issue when a hive table name contains a whitespace character

Boxcars confused by fast bypass

Fixes issue with combination of boxcars and fast bypass mechanism

Fusion client NPE in WD Hive MS

Fixes issue where Fusion client raises a Null pointer exception

Dependency on fusion-server for non-repl operations, even with repl_exchange_dir

Fixes issue triggered by particular combination of system restart order

Incorrect Hive safety value property

Corrects safety valve setting used to configure cluster for use of replicated metastore

Fusion Client handhskae handler NPE

Fixes NPE issue during handshake between replicated metastore and Fusion server

CDH / Hive Install Issues

Fixes minor display issue with installer at Cloudera Impala configuration step

Fusion client NPE in WD Hive MS

Fixes issue with combination of Sentry grants and service restart order

Hive CLI hangs randomly with SocketTimeoutException in WD-Hive Metastore HA

Fixes issue exhibited by read timeout exception from hiveserver2

Can’t see Hive databases on Hive CC tab

Resolves problem with visibility of databases during consistency check

Some installations don’t give the right permission to /var/log/wd-hive-metastore/ and /var/run/wd-hive-metastore/. wd-hive-metastore fails to start

Corrected permissions on runtime directories

CDH Plugin deployment should generate own keytabs

Hive plugin installation in a Kerberos-enabled CDH deployment automates keytab generation if required

Change in doc wording

Refines wording on upgrade process

Uninstallation documentation incorrect

Removes information on backup option from config files

Solr symlinks must be careful to only reference activated fusion parcel

fusion_env.sh updated to improve referencing of parcels

Sidelined path and Java heap exhaustion

Resolution for issues with Java heap exhaustion

Assertion Error after Cancelling Repair Task = Full System Down and won’t restart

Resolves Fusion server panic on repair task cancellation

Talkback not able to customize setting for TALKBACKNAME, FUSION_MARKER variables

Fixes to non-interactive mode for talkback.sh script

Missing Files from Replication

Improved logic for pre-rename pull activity

Client RPM upgrade or uninstall leaves dead symlinks

Post-installation script improvements

REST API: GET response about tasks finished is stable "HTTP Error 500"

NPE in Fusion server resolved for task retrieval API

Fusion client cannot authenticate to hadoop if RPC privacy mode is enabled

Fix for authentication when SaSL-based auth and impersonation of Kerberos users is enabled

Talkback improvement - Store Temporary Files within $TMPDIR

Simple extension to allow talkbacks to be processed without assuming use of /tmp

Replication for HDP+Azure

Correction to replication issue specific to HDP+Azure

Executed gsn folder not removed on Swift

Cleanup of Fusion metadata folders on Swift

fs.fusion.swift.region is not being used

No longer default to default region

Swift consistency check doesn’t notice sub-folders

Swift object store listing no longer ignores pseudo-subdirectories

Talkback does not follow log location change

Change in the configured location for log data is accommodated by talkback

CDH Fusion Parcels Updates

Umbrella ticket for a set of individual issues related to management of CDH parcels

RPM upgrade is looking for htrace-core4.jar which should be htrace-core.jar

Simple correction to JAR naming

replicated directory removal task did not complete

Fix for failure to delete replication rules

Support for HDP 2.6.2

HDP 2.6.2 support

Support RedHat Enterprise Linux 7.4

RHEL 7.4

NetApp: setowner throws NPE can cause fusion to lock up

Resolves issue specific to NetApp deployments

Disable and remove DES, 3DES, and RC4 ciphers

Documentation fix for approach to disabling ciphers deemed insecure in some environments

Client RPM upgrade or uninstall leaves dead symlinks

Fixes to post-install script that is run on upgrade or uninstallation

Allow Environment Variables to be sourced from File

Allows the user to run talkback non-interactively by providing it with a configuration file

Add output of hdfs dfs -count to talkback

Optional extra check during talkback

Talkback to respect custom fusion.dsmToken.dir

Talkbacks load and respect the value of this parameter

Talkback does not generate replicated_directory_info on Swift

Talkback accommodates a command mechanism dependent on the zone type it is run under

Transferring small files to S3 records Long.MAX_VALUE as transfer speed

Corrected transfer progress listener

Default to exclude tmp files from replication

New defaults for exclusion patterns applied to replication rules

Better diagnostics for IHC SSL configuration problems

Improved IHC log output on SSL failure

Talkback not able to customize setting for TALKBACKNAME, FUSION_MARKER variables

Also corrects descriptions in operation

Relocatable RPM does not work for stack in hardened envs

Stack packaging changes to account for non-root hardened installations

Giving the wrong password for snapdiff repair throws ugly error

Improves log feedback

Provide interoperability with S3 compatible object storages

Confirmed interoperability with a variety of S3 compatible systems

Add anchors

Documentation improvement

Documentation for needing to create hive principles and keytabs if Ambari does not manage Kerberos but Kerberos itself is enabled

Documentation improvement

Consistency check result cutting off part of the table due to long filenames

Retain content within table boundaries

Missing Kerberos configuration on security step for LocalFS

The possibility to configure Kerberos is restored on the security step during UI installation

Decommission SPARK_CLASSPATH from the installer

Use of SPARK_CLASSPATH is deprecated and should not be used

Document known issue: FUI-4352

Document need for JAVA_HOME to be set for user that runs the Fusion server process

MapR install fails to write core-site.xml with "Not supported: indent-number"

Correction for MapR 5.0

Unmanaged installation has problems with removing core-site properties

Correct installation issue related to core-site modification for unmanaged installs

UI installer is setting defaultFS in safety valve

Remove addition of fs.defaultFS in the core-site safety valve

Missing Step when installing hive-plugin

Correct instruction to restart Cloudera Manager service at specific step in Hive installation

Hive tab will not show records if any DB has a space in location

The Fusion UI will now show tables or databases when there is any Database with a space in the location path.

Fusion UI installer impala client parcel has incorrect download link

Correct Impala client parcel download links

WD-Hive plugin installer fails to detect the completion of hive service restart

Corrected issue when installing Fusion with Hive plugin on HDP 2.6 single node cluster, the installer failed to detect the hive service has completed the restart.

Hive install steps should only ask for Kadmin creds if Ambari is managing Kerberos

The installer should query Ambari as to whether it is managing Kerberos, and only request kadmin credentials if this is the case

Shared KMS toggle should be disabled if KMS is not available

Clusters that do not have KMS enabled should not provide the KMS toggle in the Replicated path addition or edit page.

S3 filetree throws NPE if we try to list a / object

S3 listings no longer fail under some virtual directory configurations

Swift install on a hadoop client node pulls in cluster core-site

Improved referencing existing configuration from core-site.xml

Swift consistency check doesn’t notice sub-folders

UI portion of FUS-4182

Clicking on Re-check button on CC tab should not navigate away from the page

Fixed minor navigation issue

EMR silent installer often fails with "Failed to start WD Fusion Server…​" message

Improved startup during silent installation

Add explicit https support to Swift installer

Swift installer allows specification of HTTP or HTTPS

Correct wording in WD Hive install step

Minor revision

UI Settings - having HTTP External blank caused APInotFound error due to Dcone bug

Corrected error relating to update of HTTP port setting with blanks

Remove zone type icon from secondary header

Zone icon in the secondary header should not be present

Kadmin Credentials / Validation workflow enhancement

Disable validation of hive settings if no kadmin credentials are provided

Cluster graph bandwidth limit arrows not showing

Cluster graph bandwidth limit arrows now showing on enterprise license when required

Replicated Folder page does not update (shows pending spinner animation) after stopping local node during automation test scenario

Closed with spinner removal

Next button disabled after going back and navigation block not shading- Hive installer

Correction to when after going back from step 2 → step 1, especially when kadmin validation has not been performed, then the Next button is inactive.

Upgrade jackson to 2.8.9 or above

Upgraded Jackson version

[REPL-53] - "Can’t get Kerberos realm" when install BigReplicate on a ppc Kerberos enabled BI 425 environment

Properly configure the default_realm in krb5.conf

Start a node button sometimes asks for confirmation

Fusion no longer asks for confirmation to start a node

Remove MySQL GRANTs for Ambari deployments

The page instructing users to grant permissions has been removed, no longer required.

Remove the NO_CONTENT consistency state as it was removed from core

Related to FUS-3654 from 2.10.3

Path in breadcrumb should be more prominent

Path is more prominent in the breadcrumb

Unable to Update License when License is Expired

Corrected permissions issue when applied in non-default location

"Select All" checkbox disabled after a RR is added on the remote node.

Corrected: The "Select All" checkbox is disabled the first time you navigate to the Replicated Rules tab after adding a rule on the remote zone. If you navigate away from the page and back, the checkbox is enabled.

Replication > Rules Section > Select all checkbox true by default when there are no rules

False by default when no rules

Trying to read comments in fusion_env.sh?

No longer log warnings from comment lines

get and symlink ubuntu impala parcels

Hard link to el_x_ parcels

[AWS Quickstart] UI client does not understand UNMANAGED_ASF type

Support new UNMANAGED_ASF type

Improve where plugins are injected at startup in the index.html.

Inject the plugins in the position marked by fusion-ui-client.

S3 Plugin Fusion - Fetching IP’s and hostnames of machine takes a long time if Fusion is installed on a non cloud machine

New case when S3 plugin checks to see if it is running on EC2 first

CDH and HDP is returning different status for HDFS service health while the HDFS service is stopped during fusion Installation

Consistent representation

Follow up → Introduce "stalled" Transfer state

Better solution for displaying transfers identified as "stalled".

UI needs to support enpoints that require SSL certs

UI no longer fails to validate the object storage if the endpoint requires SSL certs

[REPL-31] - Fusion Server fails to start after Kerberos disabled,If Kerberos enabled with "Enable HTTP Authentication" during BigReplicate installation

Warning message if Kerberos is configured, but the Cluster is not kerberized

LocalFS installer client install is confusing

Improved flow for localfs installation

Settings Page: Improve description for AWS credentials

Improved description of setting in UI

Cloudera installation > Step 9 > Sub Step 2: Parcel link ordering doesn’t make sense

Improved order

Menu bar with tabs of Fusion UI is moved to middle of screen time-to-time

Fixed intermittent display issue

Repair button is not enabled

The Repair button on bulk repair or repair tag was not enabled after all the options are filled

1.5.4. New Platform Support

WD Fusion has added support for the following new platforms since Fusion 2.10:

  • ASF Apache Hadoop 2.5.0 - 2.7.0

  • CDH 5.12

  • HDP 2.6.2

Additionally, the Pivotal Hadoop Distribution is no longer a supported platform.

1.5.5. Available Packages

This release of WANdisco Fusion supports the following versions of Hadoop:

  • ASF Apache Hadoop 2.5.0 - 2.7.0

  • CDH 5.2.0 - 5.12.0

  • HDP 2.1.0 - 2.6.2

  • MapR 4.0.1 - 5.2.0

  • IOP (BigInsights) 4.0 - 4.2.5

The trial download includes the installation packages for CDH and HDP distributions only.

1.5.6. System Requirements

Before installing, ensure that your systems, software, and hardware meet the requirements found in our online user guide at http://docs.wandisco.com/bigdata/wdfusion.

Third-Party Component Interoperability

WANdisco Fusion is interoperable with a wide variety of systems, including Hadoop distributions, object storage platforms, and cloud environments.

  • Amazon S3

  • Amazon EMR 5.0, 5.3, 5.4

  • Ambari 1.6, 1.7, 2.0, 3.1

  • Apache Hadoop 2.5.0 - 2.7.0

  • CDH 5.2 - 5.12

  • EMC Isilon 7.2, 8.0

  • Google Cloud Storage

  • Google Cloud Dataproc

  • HDP 2.1.0 - 2.6.0

  • IBM BI 4.0 - 4.2.5

  • MapR M4.0 - M5.2

  • Microsoft Azure Blob Storage

  • Microsoft Azure HDInsights 3.2 - 3.6

  • MySQL, Oracle (Hive Metastore)

  • Oracle BDA, Oracle BDCS

Client Applications Supported

WANdisco Fusion is architected for maximum compatibility and interoperability with applications that use standard Hadoop File System APIs. All applications that use the standard Hadoop Distributed File System API or any Hadoop-Compatible File System API should be interoperable with WANdisco Fusion, and will be treated as supported applications. Additionally, Fusion supports the replication of content with Amazon S3 and S3-compatible objects stores, locally-mounted file systems, and NetApp NFS devices, but does not require or provide application compatibility libraries for these storage services.

1.5.7. Known Issues

Fusion 2.10.4 includes a small set of known issues with workarounds. In each case, resolution of the known issues is underway.

  • Renaming the parent directory of a location with current file transfers may result in incomplete transfer - FUS-387.

In some circumstances, modification of the metadata for a parent directory within a replicated location can prevent the completion of content transfer that is underway for files underneath that directory. Fusion’s metadata consistency is unaffected, but file content may not be available in full. Consistency check and repair can be used to both detect and resolve any resulting missing content.

  • Fusion does not support truncate command - FUS-3022

The public boolean truncate(Path f, long newLength) operation in org.apache.hadoop.fs.FileSystem (> 2.7.0) is not yet supported. Files will be truncated only in the cluster where the operation is initiated. Consistency check and repair can be used to both detect and resolve any resulting inconsistencies.

  • Fusion does not support concat() operation - FUS-3714

The public void concat(Path trg, Path[] psrcs) operation in org.apache.hadoop.fs.FileSystem is not yet supported, and will result in filesystem inconsistency. Consistency check and repair can be used to both detect and resolve any resulting inconsistencies.

  • Minor issues relating to the changing of UI settings FUI-5247, FUI-5248

There’s an issue with changing the UI settings in the Fusion UI settings screen if the custom port you plan to use is already specified in UI settings - the entry of your change will be blocked with an error. Workaround: temporarily change the in-use port. Also, setting a custom value for the UI port/host/external host during installation will cause the Fusion UI to redirect to the HTTP host IP instead of the provided hostname.

1.6. Release 2.10.3.2 Hotfix

27 September 2017

WD Fusion 2.10.3.2 includes a fix relating to the WANdisco Hive Metastore function, in HDP 2.6.2:

  • Added a new method for get_all_functions along with a corresponding change to the API for AlterTableEvent.

1.7. Release 2.10.3.1 Build 504

9 August 2017

WANdisco is pleased to present WD Fusion 2.10.3. This release adds support for some new Hadoop distribution versions and includes new features that are detailed below.

1.7.1. Installation

Find detailed installation instructions in the user guide at Installation Procedure.

1.7.2. Upgrades from Earlier Versions

Fusion 2.10.3 supports a different upgrade process for existing users of Fusion 2.10. Please consult WANdisco support for details of the upgrade process.

1.7.3. New Feature Highlights

This release includes the following new features.

Non-interactive Talkback

The talkback script used to capture environment and log information can be run in a non-interactive mode where user input is not required during execution. (FUS-3454, FUS-3649)

Recovery from failed DataNodes

WD Fusion 2.10.3 overrides these modified Hadoop settings

  • for automatic replacement of failed datanodes dfs.client.block.write.replace-datanode-on-failure.enable to "true", and

  • for replacement policy dfs.client.block.write.replace-datanode-on-failure.policy to "DEFAULT".

Support for non-local filesystem references

WD Fusion will honor application intentions when specifying a file system location other than that of the Fusion-enabled local file system.

1.7.4. Hive-Specific Improvements

Hive Installation

Ambari-based installations of the Hive Replicated Metastore accommodate nodes with multiple network interfaces or multiple hostnames. (HIVE-471)

CDH Hive Support

Changes introduced in CDH 5.11 are incorporated into WD Fusion. (HIVE-482)

Hive Metastore HA

Multiple WD Fusion Replicated Hive Metastore instances can be deployed for redundancy and high availability. (HIVE-346, HIVE-227)

Improved Hive CLI Behavior

Responsiveness of the Hive CLI is improved. (FUS-3851, REPL-32, HIVE-363)

1.7.5. Metastore Event Listener Events

MetaStoreEventListener events are replicated across Hive metastore instances. (HIVE-410, HIVE-234, HIVE-243, HIVE-222, REPL-2, REPL-7, REPl-22)

Add If Not Exists Partition

Hive commands that include "ADD IF NOT EXISTS PARTITION" will function correctly. (HIVE-439)

Revert WD Hiverserver2 Template

The WD Hiveserver2 Template can be removed succesfully. (HIVE-469)

Large Database List Performance

Environments with very large numbers of Hive databases can be listed with good performance in the WD Fusion UI. (HIVE-497)

/tmp/hive Scratch Directory Permissions

If Hive is configured to use a scratch directory through hive.start.cleanup.scratchdir, it will now be recreated with the correct permissions. (HIVE-283)

Installer Deployment Detection

Hive installation succeeds when Ambari hostname labels differ from the actual hostnames. (HIVE-429)

Hive Restart on BigInsights

Restarting the hosts on which the Hive Replicated Metastore and Hiveserver2 Template reside now results in the associated services starting. (HIVE-437, HIVE-446, HIVE-442)

RHEL7 Init Scripts

WD Hive stacks and init scripts account for the removal of paths created under /var/run on RHEL7. (HIVE-450, HIVE-464)

Support Large Max Rows

Repair functionality is no longer limited by the maximum integer value. (HIVE-452)

Fix to Add Constraint

Fix provided for a failure to progress during execution of an alter table add constraint operation. (REPL-39, HIVE-455)

Update Hive Configuration for Kerberos

Enabling Kerberos after installation functions correctly. (HIVE-473, REPL-48)

Databases in Correct Path

The WD Fusion UI now shows all databases in their correct location. (HIVE-503)

Hiveserver2 Template Name

The display name of the Hiveserver2 Template is shortened. (HIVE-348, HIVE-311)

1.7.6. Other Improvements

Ubuntu Support

Ubuntu 16.04 is a supported operating system. (FUI-4611)

Improved Operation Under Load

Conditions of significant load no longer risk application level timeouts due to contention over coordinated activities. (FUS-3576, FUS-3799, FUS-3810)

Recovery From Errors During Transfer

Better handling of failed TCP connections during final stages of file content transfer. (FUS-3837)

Configurable Notification Buffer

Internal buffers sizes used for notification event and request events can be configured. (FUS-3849)

SSL Configuration Recommendations

WANdisco advises customers to avoid the use of OpenSSL as it can exhibit a memory leak. (FUS-3867)

Failed Transfer Reporting

Failed transfers no longer remain in incomplete status indefinitely. Stalled transfers are shown. (FUS-3833, FUS-3869)

New Node Installation with Ambari

Installation of new cluster nodes via Ambari now honors dependencies for the Fusion client libraries correctly. (FUS-3887)

Improved S3 Performance

The performance of replication to Amazon S3 endpoints is enhanced, enabled in part by the introduction of a new configuration parameter (fs.fusion.s3.transferThreads). (FUS-3978, FUS-3984)

Robust Replication Rule Removal

Replication rule removal does not trigger a failure of the Fusion server. (FUS-3987)

Memory Usage Improvements

Memory use during consistency check and regular operation is improved. (FUS-4013, DCO-709, FUS-4027, FUS-4050, FUS-3489, FUS-3733, FUS-4032)

Consistency Check Performance

The performance of the consistency check is improved. (FUS-2469)

Fix to Metadata Modification Failure

The FUS-3433 known issue in prior releases is resolved, allowing metadata modifications to files that are recently moved from non-replicated to replicated locations to behave as expected. (FUS-3433)

REST API Timeout Behavior

Timeouts are applied on execution of REST APIs to prevent clients waiting indefinitely. (FUS-3477)

Lack of S3 copyObject accommodated

Fusion can accommodate S3 implementations that do not provide copyObject functionality. (FUS-3588)

Fusion REST API Availability

The Fusion server REST API is available regardless of the state of quorum in the system. (FUS-3619)

Improved behavior with large ingest

Ingesting very large numbers of files to a replicated directory at once is handled with better performance than previous releases. (FUS-3620, FUS-2469, FUS-3663, FUS-3673, FUS-3631)

Fixed Repair to Amazon EMR

When performing repair to an Amazon EMR zone, Fusion will perform correctly regardless of the state of existing files in EMR. (FUS-3701)

Empty Bandwidth Policy Handling

Ill-defined bandwidth limit policies do not result in Fusion server failure. (FUS-3716, FUS-3756)

Hadoop Rename Variant Support

The rename with options method in the Hadoop FileSystem API is supported. (FUS-3739, REPL-24)

NetApp Repair Improvement

iNode modifications made on a NetApp instance during repair are ignored to allow completion. (FUS-3850)

License Check Deadlock Corrected

A potential startup deadlock condition related to license checks has been corrected. (FUS-3863)

Configuration Changes for S3

Additional configuration options are provided for replication with S3 endpoints:

  • fs.fusion.s3.connectionTimeout, default is 10s

  • fs.fusion.s3.socketTimeout, default is 50s

  • fs.fusion.s3.maxConnections, default is 50

  • fs.fusion.s3.maxErrorRetry, default is 3

  • fs.fusion.s3.tcpKeepAlive, default is false

(FUS-3902)

Configuration Defaults For Performance

Internal configuration defaults have been modified to improve performance. (FUS-4099, FUS-4070)

Improved Startup Time

WD Fusion 2.10.3 improves startup time for processing agreed proposals. (FUS-4071, FUS-4086)

Prevent Read-after-write Inconsistency on S3

Changes made to interaction with the S3 API to avoid triggering conditions that can result in read-after-write inconsistency, affecting correct outcomes for replication. (FUS-3908, FUS-3936)

Improved Consensus Backoff Logic

Backoffs applied to conflicting proposals are limited. (DCO-698)

Consistency Check Reporting

WD Fusion 2.10.3 resolves the known issue in 2.10 where a consistency check that is triggered from a non-writer node may never complete. (FUS-2675, FUS-3684, FUS-3775)

Separate Execution Pools for Request Types

Introduced new executor thread pools to avoid processing starvation from long-running activities. (FUS-3799)

Scale Dependency Calculations

Fusion’s ability to scale effectively with extremely large numbers of outstanding agreements is improved. (FUS-3974, REPL-38)

No Scheduled Consistency Checks

There is no longer a requirement to run scheduled consistency checks for regular operation of WD Fusion. 2.10.3 turns off these checks by default. (FUS-4041, FUS-4003, FUS-3615)

Allow Reference to Remote File System

The Fusion client library will determine whether the scheme of the URI used to reference the underlying file system indicates that coordination via Fusion is not required, and act accordingly. This improves interoperability with distcp in particular. (FUS-1970)

API Information Corrected

Information returned from the /fusion/fs/transfers REST API no longer reports negative values. The repair API is similarly improved. (FUS-3219, FUS-3291)

Eliminate Generation of Extraneous Files During Rename

Multiple renames occurring in quick succession on a single file no longer result in extraneous files in non-originating zones. (FUS-3439)

Oozie Classpath Setup

Installation ensures that Oozie classpaths are configured to reference WD Fusion client libraries as required. (FUS-3659, FUS-3690)

Minimize Unecessary Logging

Hadoop clusters that do not have ACL support no longer result in excessive logging of aclStatus messages. (FUS-3704, FUS-3657)

Snapshot Repair Mechanism

Improved resilience of the snapshot repair mechanism. (FUS-3881)

Setting global exclusions via zone properties API

The behavior of the zone properties API when updating global exclusions has been corrected. (FUS-3741)

Improved S3 Interoperability

Configurations where a Hadoop cluster uses S3 as the underlying file system are improved. (FUS-3039)

Handle Data Pipeline Errors

WD Fusion interoperates with HDFS in a best-effort mode by default to avoid stalling from failed data pipeline errors. (FUS-3597)

Failed Transfers Marked as Failed

File content transfer that fails for external reasons no longer marks those transfers as incomplete. (FUS-3605)

Improved Content Transfer

Non-recoverable failures during content transfer are no longer retried. (FUS-3697)

Deterministic Default Exclusions

The default exclusions that apply to replication rules remain set regardless of changes to global zone properties. (FUS-3741)

Group names with spaces

Post-installation scripts allow for group names that include whitespace characters. (FUS-3774)

Ambari Installation Improvements

Upgrades in Ambari-based environments behave correctly. (FUS-3794)

Missing files do not trigger retry loop

Non-recoverable failures on file pulls (due to a source file being removed or modified) no longer result in retry behavior. (FUS-3805, FUS-3833)

Repairs with FACLs

Repairs operate correctly with FACL information. (FUS-3825)

Fixed Relocatable Installation

Ambari-based installations performed to a non-standard directory work as expected. (FUS-4025)

Apache NiFi Interoperability

Classpath issues that occurred when using Nifi are resolved. (FUS-3852)

File transfer status

Improvements to status representation of file transfers. (FUS-3869, FUS-3833)

Reinstallation of decommissioned node

WD Fusion can be reinstalled on a node that has had a previous installation in place that has not been uninstalled correctly.

Improved resilience to failed content transfer

Origin zone file renames do not affect successful content transfer.

Improved speed of consistency check

Consistency check performance is improved.

Updated Apache Commons Collection Version

Apache Commons Collection 3.2.2 is used in preference to 3.2.1. (FUS-1803)

Apache Spark2 follows a different model for third-party JAR integration than Spark. Fusion integrates correctly with Spark2. Resolution also applies to Accumulo. (FUS-4031)

Plugin Configuration on Classpath

Plugins can read configuration available on the CLASSPATH. (FUS-3815)

IHC Restart Messages

The service fusion-ihc-server-xxx_x_x restart command will indicate success or otherwise in response. (FUS-4039)

No Overwrite of Custom Logger Properties

Customization of the java.util.logging.FileHandler.pattern can be performed. (FUS-4045)

Improved unidirectional networking resilience to server restarts

Replication operation while a zone has been set to inbound is no longer affected by server restarts. (FUS-3721)

Impala Parcels for Ubuntu

Impala 3.0 parcels are now provided or Ubuntu. (FUS-3992)

Links to HTTPS-enabled nodes are represented correctly. (FUI-4468, FUI-4517)

1.7.7. Fusion UI Improvements

Many minor improvements to user interface behavior are provided with this release. (FUI-4459 - FUI-4845)

New Platform Support

WD Fusion has added support for the following new platforms since Fusion 2.10:

  • CDH 5.11

  • HDP 2.6

  • HDInsights 3.6

  • IBM BigInsights 4.2.5

Additionally, the platform support for IBM BigInsights 4.3 has been removed with the non-release of that version of BigInsights.

1.7.8. Available Packages

This release of WANdisco Fusion supports the following versions of Hadoop:

  • CDH 5.2.0 - CDH 5.11.0

  • HDP 2.1.0 - HDP 2.6.0

  • MapR 4.0.1 - MapR 5.2.0

  • Pivotal HD 3.0.0 - 3.4.0

  • IOP (BigInsights) 4.0 - 4.2.5

The trial download includes the installation packages for CDH and HDP distributions only.

1.7.9. System Requirements

Before installing, ensure that your systems, software and hardware meet the requirements found in our online user guide at http://docs.wandisco.com/bigdata/wdfusion.

Third-Party Component Interoperability

WANdisco Fusion is interoperable with a wide variety of systems, including Hadoop distributions, object storage platforms, and cloud environments.

  • Amazon S3

  • Amazon EMR 5.0, 5.3, 5.4

  • Ambari 1.6, 1.7, 2.0, 3.1

  • CDH 5.2 - 5.11

  • EMC Isilon 7.2, 8.0

  • Google Cloud Storage

  • Google Cloud Dataproc

  • HDP 2.1.0 - 2.6.0

  • IBM BI 4.0 - 4.2.5

  • MapR M4.0 - M5.2

  • Microsoft Azure Blob Storage

  • Microsoft Azure HDInsights 3.2 - 3.6

  • MySQL, Oracle (Hive Metastore)

  • Oracle BDA, Oracle BDCS

  • Pivotal HD 3.0 - 3.4

Client Applications Supported

WANdisco Fusion is architected for maximum compatibility and interoperability with applications that use standard Hadoop File System APIs. All applications that use the standard Hadoop Distributed File System API or any Hadoop-Compatible File System API should be interoperable with WANdisco Fusion, and will be treated as supported applications. Additionally, Fusion supports the replication of content with Amazon S3 and S3-compatible objects stores, locally-mounted file systems, and NetApp NFS devices, but does not require or provide application compatibility libraries for these storage services.

1.7.10. Known Issues

Fusion 2.10.3 includes a small set of known issues with workarounds. In each case, resolution for the known issues is underway.

  • Renaming the parent directory of a location with current file transfers may result in incomplete transfer - FUS-387.

In some circumstances, modification of the metadata for a parent directory within a replicated location can prevent the completion of content transfer that is underway for files underneath that directory. Fusion’s metadata consistency is unaffected, but file content may not be available in full. Consistency check and repair can be used to both detect and resolve any resulting missing content.

  • Fusion does not support truncate command - FUS-3022

The public boolean truncate(Path f, long newLength) operation in org.apache.hadoop.fs.FileSystem (> 2.7.0) is not yet supported. Files will be truncated only in the cluster where the operation is initiated. Consistency check and repair can be used to both detect and resolve any resulting inconsistencies.

  • Fusion does not support concat() operation - FUS-3714

The public void concat(Path trg, Path[] psrcs) operation in org.apache.hadoop.fs.FileSystem is not yet supported, and will result in filesystem inconsistency. Consistency check and repair can be used to both detect and resolve any resulting inconsistencies.

  • Consistency repair tool fails for files in Swift storage - FUS-3642

An issue impacting Swift storage which will be fixed in a future release.

1.8. Release 2.10.2 Build 413

13 May 2017

WD Fusion 2.10.1 introduces fixes and refinements that are specific to deployment into IBM’s BigInsights platform.

  • Supported added for IBM BigInsights 4.3

  • There is improved feedback given when making DSM changed on the web UI.

  • Made a refinement to the IHC ihcFailureTracker code to improve the handling of transient network issues that resulted from longer than expected timeouts.

1.8.1. Known Issues

There’s a problem that impacts BigInsights with BigReplication without Hive plugin that is caused by the client in HDFS. Specifically, if Hive is using the hive.start.cleanup.scratchdir the wd-hive-metastore is recreating that directory with the wrong permissions.

There’s a workaround:

  1. In Ambari go to Hive Configs and search for 'scratch'.

  2. Modify setting 'hive.start.cleanup.scratchdir' to false.

  3. Deploy configuration to all nodes.

  4. Modify permissions in HDFS for /tmp/hive e.g. hdfs dfs -chown 733 /tmp/hive . -Restart all hive components through Ambari.

This issue will be fixed in the next release, 2.10.3.

1.9. Release 2.10 Build 46

13 April 2017

WANdisco is pleased to present WD Fusion 2.10 as the next major release of the Fusion platform, available now from the WANdisco file distribution site. This release includes significant new product functionality that leverages the Fusion architecture to support a broader range of use cases, expand performance and scale, and ease the administration of Fusion environments.

1.9.1. Installation

Upgrades from Earlier Versions

As a major release, Fusion 2.10 introduces incompatibilities with the network protocols and storage formats used by prior versions. Please contact {name} support for information on the upgrade mechanism appropriate for your environment.


1.9.2. New Feature Highlights

This release includes the following major new features.

WANdisco Fusion for Network File Systems

WD Fusion 2.10 adds support for replicating data efficiently from Network File Systems (NFS) for NetApp devices to any mix of on-premises and cloud environments. This feature allows data replication at any scale from NFS to other Fusion zones.

User Interface

The WD Fusion user interface now presents a logical view of the Fusion operational components, Fusion zones and bandwidth limit policies in place of the physical map of locations. This makes it easier to observe the deployment of complex solutions and navigate directly to detailed views of individual item configuration.

Client Bypass

An improvement has been made to the mechanism used by the HDFS and HCFS client library to detect when a working Fusion server is unavailable. The improvement allows clients to bypass the Fusion server when needed without waiting for a TCP connection loss or timeout.

Replication of Read-Only Locations

Fusion 2.10 can be configured to replicate from storage system locations that do not provide write access for the identity used by the Fusion server.

S3 Enhancements

Fusion configuration options now include custom S3 endpoints so that replication can occur to non-AWS S3 providers. Additionally, when Fusion is hosted in AWS EC2, replication can occur to an S3 endpoint that is in a region other than where the Fusion services reside.

Repair Features and Improvements

The Fusion repair feature allows the transfer of initial content between Fusion zones that have not previously replicated, and can be used as a mechanism to perform once-off replication that remains consistent with other replication activity. Repair has been enhanced significantly in Fusion 2.10, including the following:

Auto-Parallelization of Repair

Fusion repair functionality has been extended with major improvements in performance by automatically scaling a single repair task across multiple threads of execution. This removes the need to issue multiple repair requests for subset of a replicated location. It also provides the ability to tune the threads used for repair independently of those used for consensus-driven activity of replicated content.

Checkpoint Repair

When initiating a repair task for initial data transfer or similar, you now have the option of selecting a checkpoint repair. This avoids the need for Fusion to scan the file system of the originating zone under the repair path to determine content. Checkpoint repair refers to content from an HDFS fsimage file, avoiding the need to lock other operations during a repair scan.

Repair Cancellation

You can cancel a repair task that is underway.

Resource Consumption for Repair

Heap requirements for repair execution are now independent of the volume of data under repair.

Global View of Repair Status

Repair task status is available from any node, regardless of origin.

Consistency Check Features and Improvements
Consistency Check ACL Information

File system ACL information will be reported by consistency check and repaired by repair.

Restart WD Fusion if you enable ACLs on your cluster
After enabling ACLs on your cluster, Fusion Servers must be restarted, or the ACLs will behave as if ACLs are disabled, and would not appear in consistency check and repair operations.
Consistency Check Cancellation

You can cancel consistency checks that are underway.

Resource Requirements for Consistency Check

Resource requirements for consistency check are now independent of the volume of metadata against which the check is performed.

User Interface Security

The WANdisco Fusion user interface can be accessed over HTTPS, and for that configuration to be performed independently of other SSL configuration.

Relocatable Installation

You can choose to install WD Fusion 2.10 in a location other than the default /opt/wandisco. See Custom Location Installations.

Network Support for Firewalled Fusion Zones

Fusion 2.10 can operate in an environment where one Fusion zone does not allow inbound network connectivity. This is typical for a secured on-premises deployment, where it may be difficult to modify or establish corporate firewall rules to allow inbound TCP connections to the Fusion services.

ACL Replication

ACL replication can be enabled to allow changes from local- and remote-originated zones to be replicated. ACL information will be represented in consistency check results as appropriate.

LocalFS to LocalFS ACL Replication
We support replication of Hadoop ACLs as exposed via the FileSystem object. Deployments which don’t expose ACLs in this way (e.g. local filesystem) or don’t support ACLs at all (S3) will not replicate the ACLs between zones.
Enhanced Logging

Among a range of minor improvements to logged information, Fusion 2.10 adds the ability to log the identity of the proxy user for which requests are made.

Manual Fast Bypass

This feature introduces a mechanism to quickly prevent applications from using Fusion when interacting with the underlying file system, without the need to make configuration changes. The fusion.replicated.dir.exchange configuration property in core-site.xml specifies the location under which a directory named bypass can be created to trigger this. Subsequent client activity in that cluster will bypass coordination through Fusion.

API to Track Completion of Transfers for a Specified Location

The API to track the status of transfers under a replicated directory now allows that tracking to be limited to a subdirectory of a replicated location.

Installation without Root Identity

Fusion 2.10 can be installed as a non-root user with sufficient permissions (sudo tar, sudo ambari-server, sudo cp).

Shadow Client JAR

The Fusion 2.10 client library for HDFS and HCFS compatibility ensures that classpath conflicts do not occur with any client application, allowing Fusion to be used by applications that use alternative versions of the Guava and Netty libraries.

Unsidelining

Periods of extended network outage between Fusion zones can be accommodated by limits that allow Fusion servers to identify a sidelined node, ensuring that operation of other nodes can continue in its absence. Prior to this release, bringing a sidelined node back into operation was a completely manual process. Fusion 2.10 adds a mechanism by which sidelined nodes can be recovered and participate in ongoing activity.

Operation as an HDFS Non-Superuser

To support operation in environments where minimal security privileges must be allocated, the Fusion server can now operate as a principal without HDFS superuser privileges.

Selective Replication of open() Requests

A configuration option (fusion.client.coordinate.read) is provided to allow coordination of open() requests, which by default is false.

Preferred Writer Selection

This release provides an API by which a preferred writer node can be specified for a given replicated path. The writer node is the Fusion server instance responsible for executing modifications to the local zone’s metadata via the file system API.

Grace Period for License Expiry

License expiration allows continued operation for a short grace period (by default one month for production licenses), during which notifications are presented to the administrator about the license expiry. This is in addition to the existing warnings provided prior to license expiration.

Additionally, license expiry does not halt operation of the Fusion server, which remains available to service activities that occur in non-replicated locations.

New Platform Support

WD Fusion has added support for the following new platforms since Fusion 2.9:

  • CDH 5.9 and 5.10

  • HDP 2.5

  • HDInsights 3.2 - 3.5

  • IBM BigInsights 3.0

  • Amazon EMR 5.3 and 5.4

  • MapR 5.2.0


1.9.3. Available Packages

This release of WANdisco Fusion supports the following versions of Hadoop:

  • CDH 5.2.0 - CDH 5.10.0

  • HDP 2.1.0 - HDP 2.5.0

  • MapR 4.0.1 - MapR 5.2.0

  • Pivotal HD 3.0.0 - 3.4.0

  • IOP (BigInsights) 2.1.2 - 4.2

The trial download includes the installation packages for CDH and HDP distributions only.


1.9.4. System Requirements

Before installing, ensure that your systems, software and hardware meet the requirements found in our online user guide at docs.wandisco.com/bigdata/wdfusion/2.10/#_deployment_guide

Certified Third-Party Components

WANdisco certifies the interoperability of Fusion with a wide variety of systems, including Hadoop distributions, object storage platforms, cloud environments, and applications.

  • Amazon S3

  • Amazon EMR 4.6, 4.7.1, 5.0

  • Ambari 1.6, 1.7, 2.0, 3.1

  • CDH 4.4, 5.2 - 5.10

  • EMC Isilon 7.2, 8.0

  • Google Cloud Storage

  • Google Cloud Dataproc

  • HDP 2.1.0 - 2.5.0

  • IBM BI 2.1.2 - 4.2

  • MapR M4.0 - M5.0

  • Microsoft Azure Blob Storage

  • Microsoft Azure HDInsights 3.2 - 3.5

  • MySQL (Hive Metastore)

  • Oracle BDA

  • Pivotal HD 3.0 - 3.4

Client Applications Supported

WANdisco Fusion is architected for maximum compatibility and interoperability with applications that use standard Hadoop File System APIs. All applications that use the standard Hadoop Distributed File System API or any Hadoop-Compatible File System API should be interoperable with WANdisco Fusion, and will be treated as supported applications. Additionally, Fusion supports the replication of content with Amazon S3 and S3-compatible objects stores, locally-mounted file systems, and NetApp NFS devices, but does not require or provide application compatibility libraries for these storage services.

1.9.5. Known Issues

Fusion 2.10 includes a small set of known issues with workarounds. In each case, resolution of the known issues is underway.

  • Renaming the parent directory of a location with current file transfers may result in incomplete transfer - FUS-387.

In some circumstances, modification of the metadata for a parent directory within a replicated location can prevent the completion of content transfer that is underway for files underneath that directory. Fusion’s metadata consistency is unaffected, but file content may not be available in full. Consistency check and repair can be used to both detect and resolve any resulting missing content.

  • Metadata change following move of file from non-replicated to replicated location may be overwritten - FUS-3433

Under certain conditions, a metadata modification to a file that has recently been moved from a non-replicated to replicated location may be lost. Consistency check and repair can be used to both detect and resolve any resulting missing content.

  • Fusion does not support truncate command - FUS-3022

The public boolean truncate(Path f, long newLength) operation in org.apache.hadoop.fs.FileSystem (> 2.7.0) is not yet supported. Files will be truncated only in the cluster where the operation is initiated. Consistency check and repair can be used to both detect and resolve any resulting inconsistencies.

  • Fusion does not support concat() operation - FUS-3714

The public void concat(Path trg, Path[] psrcs) operation in org.apache.hadoop.fs.FileSystem is not yet supported, and will result in filesystem inconsistency. Consistency check and repair can be used to both detect and resolve any resulting inconsistencies.

  • Consistency check will not be marked as done when initiated from a non-writer node - FUS-2675

While a consistency check initiated via the API at a non-writer node will execute and complete, its status will not be marked as such. Issue fixed in 2.10.3.

  • There are reports of a Linux Kernal bug that may cause WD Fusion to hang. See our KB article.

1.9.6. Other Improvements

In addition to the highlighted features listed above, Fusion 2.10 includes a wide set of improvements in performance, functionality, scale, interoperability and general operation.

  • Parallel repair functionality avoids duplicate repair activity - FUS-3073

  • Correction to handling of specific path names to avoid issues with Hive replication - FUS-3543

  • Stack installer does not access non-initialized variables (fix for install on Oracle Enterprise Linux) - FUS-3551

  • Installation completes with WebHDFS disabled - FUS-3555

  • /fusion/fs no longer returns 500 response when adding removed replicated location - FUS-2148

  • Talkback does not attempt to ssh to KDC as root user - FUS-3192

  • Consistency check tasks can be canceled - FUS-3053

  • service fusion-server restart displays success - FUS-3193

  • Installer supports configuration changes needed for SOLR - FUS-3200

  • Client library no longer conflicts with user jars - FUS-3372, FUS-3407

  • CDH parcel upgrade performed for alternatives - FUS-3418

  • IHC SSL configuration no longer in core-site.xml - FUS-2828

  • MapR 5.2.0 support - FUS-2870

  • Fusion UI now applies auth_to_local setting when browsing HDFS - FUI-3995

  • Repair page redesigned to avoid unselectable source of truth - FUI-3759

  • Fusion handshake token directory installer input is pre-populated when adding node to an existing zone - FUI-3920

  • UI correctly displayed size of replicated folder - FUI-3974/FUI-3995

  • Support for CDH 5.9 - FUI-4084

  • Support for Cloudera Manager 5.9 - FUI-4085

  • Support for CDH and Cloudera Manager 5.10 - FUI-4089

  • Consistency check marked as done when initiated from a non-writer node - FUI-3921/FUS-2675

  • Improved checks for Fusion client installation - FUI-3922

  • Install accommodates HIVE_AUX_JARS with single jar - FUS-3438

  • Allow operation with ambari-agent as non-root user - FUS-3211

  • Log proxy.user.name for requests - FUS-3154

  • Improve default exclusion paths for Hive tables - HIVE-310

  • Heap requirements for consistency check now independent of data volume - FUS-2402, FUS-3292

  • Avoid out of memory under failed socket connection scenario - DCO-683

  • Empty metadata content does not result in recursive delete - FUS-3190

  • Correct permission replication for Hive tables - FUS-3095, REPL-16

  • Allow cancellation of repair tasks that are underway - FUS-3052

  • Provide aggregate reporting of repair status across zones - FUS-2823, FUS-2948

  • Integrate with alternative native SSL libraries - FUS-2859

  • Talkback improves host resolution - FUS-3249

  • Service init functions allow AD groups with spaces in name - FUI-4278

  • RPM upgrades do not overwrite logging configuration - FUI-3894

  • Email alert interval polling defaults to 60s - FUI-3768

  • Metastore starts with DBTokenStore configured on CDH 5.5 - HIVE-384, HIVE-389

  • Support replication of external tables via default DSM - HIVE-225, HIVE-284

  • Correct Metastore configuration deployment with multiple nodes - HIVE-299

  • Bypass mechanism for replicated Metastore - HIVE-134

  • Metastore event listener replication - HIVE-222, HIVE-243, HIVE-234, REPL-2, REPL-7

  • WD Hive Metastore service status in Cloudera Manager - HIVE-257

  • Correct Hive installation on RHEL 7 - HIVE-261

  • Improve installation of Hive for HDP configuration - HIVE-296

  • Stack removal for Hive improved - HIVE-307

  • Standardized Java detection - FUS-2479, FUI-3165, HIVE-327

  • Hive support for CDH 5.9 - HIVE-356

  • Hive support for CDH 5.10 - HIVE-257

  • Correct permissions on /tmp/wd-hive-metrics.log et al. - HIVE-392

  • Sidelined DSMs no longer trigger re-elections - FUS-3083

  • fusion.ssl.enabled property renamed to fusion.client.ssl.enabled - FUS-3013

  • Additional properties for S3 configuration - FUS-3513

  • Client requests to sidelined DSM no longer retry - FUS-3003, FUS-2927, FUS-3051, FUS-3299

  • HttpFS classpath corrections - FUS-3201