Glossary

Technical guide and glossary for Hadoop and WANdisco Fusion terms.

Introducing WD Fusion

WD Fusion provides consistent, continuous data replication between file systems in Hadoop clusters. Client applications that use WD Fusion interact with a virtual file system that integrates the underlying storage across multiple clusters. When changes are made to files in one cluster, they are replicated immediately and consistently to the other clusters that WD Fusion spans.

(DIAGRAM HERE)
  • Applications in all clusters can read and write to the file system through WD Fusion at any time, and be guaranteed that the file systems will remain consistent across all the participating clusters.
  • WD Fusion can span different versions and distributions of Hadoop, including CDH, HDP, EMC Isilon, Amazon S3/EMRFS and MapR, and presents the standard Hadoop-Compatible File System interface to applications, which do not need to be modified.
  • Similarly, WD Fusion does not require any changes to the underlying Hadoop clusters or their file systems. It operates as a proxy that client applications use when working with replicated file system content.

WD Fusion Terms

To help you understand how WD Fusion operates, this documentation uses the terms Zone, Membership and Replication Rule. They each play a critical role in your configuration and use of WD Fusion. You should understand this terminology before installing WD Fusion.

Zones

A Zone represents the file system used in a standalone Hadoop cluster. Multiple Zones could be from separate clusters in the same data center, or could be from distinct clusters operating in geographically-separate data centers that span the globe. WD Fusion operates as a distributed collection of servers. While each WD Fusion server always belongs to only one Zone, a Zone can have multiple WD Fusion servers (for load balancing and high availability). When you install WD Fusion, you should create a Zone for each cluster's file system.

DConE Terms

Memberships

A Membership is a defined group of WD Fusion servers that replicate data between their Zones. You can have as many WD Fusion servers in a Membership as you like, and each WD Fusion server can participate in multiple Memberships. WD Fusion allows you to define multiple Memberships, and WD Fusion servers can fulfil different roles in each Membership they participate in. This allows you to control exactly how your WD Fusion environment operates normally, and how it behaves when faced with network failures, host failures and other types of issues.

Replication Rules

File system content is replicated selectively by defining Replication Rules, which specify: the directory in the file system that will be replicated, the Zones that will participate in that replication, and the Membership associated with those Zones. Without any Replication Rules defined, each Zone's file system operates independently of the others. With the combination of Zones, Memberships and Replication Rules, WD Fusion gives you complete control over how data are replicated between the file systems of your Hadoop clusters.

Induction

The process of forming a membership between a number of WD Fusion nodes is called Induction.