logo

WANDISCO FUSION®
LIVE HIVE PLUGIN

1. Welcome

1.1. Product overview

The Fusion Plugin for Live Hive 2.0 enables WANdisco Fusion to replicate Hive’s metastore, allowing WANdisco Fusion to maintain a replicated instance of Hive’s metadata and, in future, support Hive deployments that are distributed between data centers.

1.2. Documentation guide

This guide contains the following:

Welcome

this chapter introduces this user guide and provides help with how to use it.

Release Notes

details the latest software release, covering new features, fixes and known issues to be aware of.

Concepts

explains how Fusion Plugin for Live Hive 2.0 through WANdisco Fusion uses WANdisco’s LiveData platform.

Installation

covers the steps required to install and set up Fusion Plugin for Live Hive 2.0 into a WANdisco Fusion deployment.

Operation

the steps required to run, reconfigure and troubleshoot Fusion Plugin for Live Hive 2.0.

Reference

additional Fusion Plugin for Live Hive 2.0 documentation, including documentation for the available REST API.

1.2.1. Admonitions

In the guide we highlight types of information using the following call outs:

The alert symbol highlights important information.
The STOP symbol cautions you against doing something.
Tips are principles or practices that you’ll benefit from knowing or using.
The KB symbol shows where you can find more information, such as in our online Knowledgebase.

1.3. Contact support

See our online Knowledgebase which contains updates and more information.

If you need more help raise a case on our support website.

1.4. Give feedback

If you find an error or if you think some information needs improving, raise a case on our support website or email docs@wandisco.com.

2. Release Notes

2.1. Live Hive Plugin 2.0.2 build 4

03 October 2018

The Fusion Plugin for Live Hive extends WANdisco Fusion by replicating Apache Hive metadata. With it, WANdisco Fusion maintains a LiveData environment including Hive content, so that applications can access, use and modify a consistent view of data everywhere, spanning platforms and locations, even at petabyte scale. WANdisco Fusion ensures the availability and accessibility of critical data everywhere.

Release 2.0.2 provides one fix and one enhancement that may prove to be crucial in applicable deployments.

2.1.1. Available Packages

This release of the Fusion Plugin for Live Hive 2.0 supports deployment into WANdisco Fusion 2.12 or greater for HDP and CDH Hadoop clusters:

  • CDH 5.12.0 - CDH 5.14.0

  • HDP 2.6.0 - HDP 2.6.4

2.1.2. Installation

The Fusion Plugin for Live Hive 2.0 supports an integrated installation process that allows it to be added to an existing WANdisco Fusion deployment. Consult the Installation Guide for details.

2.1.3. What’s new

This release is limited to the fix of two issues, noted below.

2.1.4. Resolved Issues

The following issues have been resolved with this release.

  • WD-LHV-981 - Fixed an issue where the proxy initialised Thrift client protocol value from wrong source, resulting in services being run using the wrong system account, i.e. not "hive".

  • WD-LHV-894 - Added a configurable property for enabling Hive transaction pass-through on replicated tables.

2.2. Live Hive Plugin 2.0.1 build 21

29 August 2018

The Fusion Plugin for Live Hive extends WANdisco Fusion by replicating Apache Hive metadata. With it, WANdisco Fusion maintains a LiveData environment including Hive content, so that applications can access, use and modify a consistent view of data everywhere, spanning platforms and locations, even at petabyte scale. WANdisco Fusion ensures the availability and accessibility of critical data everywhere.

The 2.0.1 release of WANdisco Fusion Plugin for Live Hive is a minor update, adding functionality and resolving some of the known issues with prior releases. We advise all customers using WANdisco Fusion Plugin for Live Hive 2.0 to apply this minor update.

2.2.1. Available Packages

This release of the Fusion Plugin for Live Hive 2.0 supports deployment into WANdisco Fusion 2.12. or greater for HDP and CDH Hadoop clusters:

  • CDH 5.12.0 - CDH 5.14.0

  • HDP 2.6.0 - HDP 2.6.4

2.2.2. Installation

The Fusion Plugin for Live Hive 2.0 supports an integrated installation process that allows it to be added to an existing WANdisco Fusion deployment. Consult the Installation Guide for details.

2.2.3. What’s new

This minor update to WANdisco Fusion Plugin for Live Hive includes the following new feature.

Optional automated restarts of external services during installation
Automatic restarts of external services during installation are now optional, allowing for greater control of when restarts should happen. Indication of preference is needed prior to triggering the installation.

2.2.4. Resolved Issues

The following known issues have been resolved with this release.

  • WD-LHV-878 - Table repairs in Fusion Plugin for Live Hive 2.0 also apply to the containing database, which introduces a risk of data loss. Repairs now only apply to selected context and not parent objects.

2.2.5. Known Issues

  • WD-LHV-654 - Consistency checks currently include non-replicated paths which should be excluded.

  • WD-LHV-238 - The Live Hive Plugin requires common Hadoop distributions and versions to be in place for all replicated zones.

  • WD-LHV-341 - The proxy for the Hive Metastore must be deployed on the same host as the Fusion server.

2.2.6. Other Improvements

  • Optional restarts for external services - WD-LHV-807

  • Support truncate operations - WD-LHV-555

  • Use WrappedMessageManager instance accessible through ServerContext - WD-LHV-827

  • FIX - Error message incorrect - WD-LHV-769

  • FIX - Duplicate databases in Hive replicated path status page - WD-LHV-828

  • FIX - Consistency checks never complete for heterogenous Live Hive Plugin - WD-LHV-872

  • FIX - “Show more” doesn’t work with more than 10 databases or tables - WD-LHV-838

  • FIX - Databases not shown correctly if quantity more than 27 - WD-LHV-835

  • FIX - Details of other table shown instead of the table selected - WD-LHV-823

  • FIX - fusion_prefix not set after Live Hive installation - WD-LHV-764

  • FIX - Non-kerberized Live Hive installation fails - WD-LHV-881

  • FIX - LIVE_HIVE_PROXY failing to start during Live Hive installation - WD-LHV-842

  • FIX - Table inconsistency after checking and repairing Hive partition - WD-LHV-863

  • FIX - live.hive.proxy.keytab overwritten by Live Hive - WD-LHV-804

  • FIX - Support change for FUI-5233 - WD-LHV-857

  • FIX - Wizard step not active on Configuration page - WD-LHV-858

  • FIX - Remove tooltips from installer steps - WD-LHV-859

  • FIX - Proxy and HS2 template stack fail on reboot - WD-LHV-729

  • FIX - Proxy stack does not define correct dependencies for stale configuration restart - WD-LHV-739

  • FIX - Unable to delete LIVE_HIVE* services if installation broken - WD-LHV-697

  • FIX - Using manual kerberos parameters doesn’t persist past installer (CDH) - WD-LHV-695

  • FIX - Automatic restarts of services were unsuccessful, but were marked as complete on an Enterprise enabled CDH cluster - WD-LHV-681

  • FIX - Keytab missing from HA node - WD-LHV-834

  • FIX - Index inconsistent on database level but consistent on table level after repair - WD-LHV-923

2.3. Live Hive Plugin 2.0 build 1245

WANdisco is pleased to present the first major revision to the Fusion Plugin for Live Hive 2.0. This release is specifically for use with the latest version of WANdisco Fusion, 2.12.

28 June 2018

The Fusion Plugin for Live Hive 2.0 extends WANdisco Fusion by replicating Apache Hive metadata. With it, WANdisco Fusion maintains a Live Data environment including Hive content, so that applications can access, use and modify a consistent view of data everywhere, spanning platforms and locations, even at petabyte scale. WANdisco Fusion ensures the availability and accessibility of critical data everywhere.

The 2.0 release of the Fusion Plugin for Live Hive 2.0 is a minor update, adding functionality and resolving some of the known issues with prior releases, ensuring that it covers the features available with the prior Fusion Hive Metastore Plugin. With this release, implementations of WANdisco Fusion that include Hive replication requirements should take advantage of the Fusion Plugin for Live Hive 2.0.

2.3.1. New Platform Support

The Fusion Plugin for Live Hive 2.0 has added support for the following new platforms since version 1.0:

  • CDH 5.14.0

2.3.2. Available Packages

This release of the Fusion Plugin for Live Hive 2.0 supports deployment into WANdisco Fusion 2.12. or greater for HDP and CDH Hadoop clusters:

  • CDH 5.12.0 - CDH 5.14.0

  • HDP 2.6.0 - HDP 2.6.4

2.4. Installation

The Fusion Plugin for Live Hive 2.0 supports an integrated installation process that allows it to be added to an existing WANdisco Fusion deployment. Consult the Installation Guide for details.

2.5. What’s New

This minor update to Fusion Plugin for Live Hive 2.0 adds some new key features and addresses limitations of the 2.0 release. Notable enhancements are:

Replication Rules

Hive Replication Rules no longer generate HCFS replication rules in response to operations on tables that may create new data locations that require replication. Instead, this release requires that Hive data are replicated with an existing HCFS replication rule. This allows for a greater scale of operation, because Hive operations such as CREATE TABLE can now re-use existing replication rules, and you have greater control over which content is replicated by combining Hive replication rules and HCFS replication rules.

Pattern Syntax

Replication rules can be created to match Hive tables based on the same simple syntax used by Hive for pattern matching, rather than more complex regular expressions. Wildcards in replication rules can only be * for any character(s) or | for a choice. Examples are employees, emp*, emp*|*ees, all of which will match a table named employees.

Database-level replication

Hive replication rules now apply to Hive databases as well as tables. Where previous versions of Live Hive Plugin replicated all Hive databases, in this release data replication can be enabled on a per-database basis.

Initial transfers and consistency checks

Initial transfers and Consistency Checks can be performed for a Hive replication rule, covering all elements that match the rule.

Rule management

Hive replication rules can be removed.

In-place upgrades

Future upgrades of the Fusion Plugin for Live Hive 2.0 will be able to offer a streamlined upgrade path.

Database-level checks and initial transfer

You can perform a consistency check for an entire database, and get results that indicate whether elements within the database are inconsistent. Similarly, the Fusion Plugin for Live Hive 2.0 allows you to perform an initial transfer of data for a database as a whole, rather than requiring you to perform that transfer for each table.

2.6. Resolved Issues

The following known issues have been resolved with this release.

  • WD-LHV-219 - Consistency check and initial transfers could not be performed at the level of a Hive Regex rule, but must have been performed per-table. Consistency checks and initial transfers can now be performed for databases or tables.

  • WD-LHV-342 - The Fusion Plugin for Live Hive 2.0 did not provide for the removal of a Hive Regex rule. Hive replication rules can now be removed.

  • WD-LHV-343 - Databases were replicated to all zones on creation regardless of Hive Regex rules. Databases are now replicated in accordance with established Hive replication rules.

  • WD-LHV-344 - Replication rules for Hive data locations generated by the Fusion Plugin for Live Hive 2.0 could not be edited. Replication rules for locations associated with Hive data are no longer controlled by the Fusion Plugin for Live Hive 2.0, and can be edited as regular HCFS rules.

  • WD-LHV-480 - It wasn’t possible to replicate metadata for tables created using the CTAS "create table as select" method. Table metadata is now governed by Hive replication rules and associated table data by regular HCFS replication rules.

  • WD-LHV-486 - Previously unable to trigger initial transfers for metadata on nodes where the data was undergoing initial transfer. Initial transfers can now be triggered from any node.

2.7. Known Issues

  • WD-LHV-654 - Consistency checks currently include non-replicated paths which should be excluded.

  • WD-LHV-238 - The Live Hive Plugin requires common Hadoop distributions and versions to be in place for all replicated zones.

  • WD-LHV-341 - The proxy for the Hive Metastore must be deployed on the same host as the Fusion server.

2.8. Other Improvements

  • Support regex rule removal - WD-LHV-342

  • FIX - Hive commands through beeline using anonymous user without Kerberos fail - WD-LHV-407

  • FIX - Unable to remove Hive Regex - WD-LHV-481

  • FIX - Unable to trigger repair of a table on a fusion node where the database does not exist. - WD-LHV-486

  • FIX - metastore.service.host is no longer used - WD-LHV-488

  • Allow multiple LHV proxies in a single zone for proxy HA - WD-LHV-490

  • FIX - Hive Metastore Canary cannot pass through proxy - WD-LHV-500

  • Document deployment models - WD-LHV-523

  • Handle pre-existing kerberos principals on CDH - WD-LHV-552, WD-LHV-485

  • FIX - Creating a table with a location that matches existing replicated table broken - WD-LHV-554

  • CDH 5.13 Sentry / Metastore HA testing - WD-LHV-557

  • Remove auto-creation of a DSM on table creation - WD-LHV-561

  • Test if ConsistencyCheck and Initial Transfer work over DBs through REST API - WD-LHV-567

  • Trigger repair by regex for all databases, ignoring location - WD-LHV-571

  • Regex rule status tab - WD-LHV-572

  • FIX - Hive Table rules disappearing - WD-LHV-573

  • Hive Server 2 unable to connect to Hive Metastore through Hive Thrift Proxy with Kerberos - WD-LHV-574

  • Add ability to trigger CC via regex rule in the UI - WD-LHV-582

  • Use patterns as Hive uses them instead of general Regexes - WD-LHV-583

  • Support display of metadata replication status in FUI - WD-LHV-584

  • FIX - Installer fails on non-Kerberized cluster due to auto.kerberos.enabled option - WD-LHV-591

  • FIX - Consistency check and initial transfer open new connection for every thrift call - WD-LHV-599

  • FIX - Installer fails to install the Live Hive Plugin service in Cloudera Manager - WD-LHV-600

  • FIX - Consistency Check All option is incorrectly reporting databases as consistent - WD-LHV-606

  • FIX - Add if not exists partition fails with NPE if partition exists - WD-LHV-607

  • FIX - "Error connecting to Hive Metastore" within Cloudera Navigator Metadata Server log - WD-LHV-608

  • FIX - Adding LHV HA proxy to a node inducted to a zone post an existing activated LHV install breaks new LHV installation - WD-LHV-611

  • Regex rules can now be removed - WD-LHV-614

  • Provide the CC status in the WD-LHV-569 data - WD-LHV-616

  • Document how to manually deploy a proxy if only gateway nodes have been installed - WD-LHV-618

  • FIX - The stack upgrade process is incorrectly upgrading on every proxy restart - WD-LHV-620

  • Proxy not starting until manually start Live Hive Slave - WD-LHV-622

  • Need single database/table variants of the new get databases/tables by ruleid - WD-LHV-623

  • Remove hardcoded user:group from postinstall scripts and tidy up folder permissions - WD-LHV-625

  • Fix silent installer for LHV - WD-LHV-627

  • Document LHV-628 known issue - WD-LHV-629

  • Document requirements for user running live hive proxy - WD-LHV-630

  • FIX - Regex rule status consistency check fails shortly after issuing a repair - WD-LHV-634

  • Rename CDH service from "live_hive_proxy" - WD-LHV-351

  • Typos fixes - WD-LHV-551

  • FIX - Create Rule screen doc link "Hive Pattern" anchor refers to "regex" - WD-LHV-633

  • Link to Fusion UI from live_hive_proxy configuration in CDH - WD-LHV-372

  • FIX - Kerberos ticket expiration through long run of the testing batch. - WD-LHV-528

  • FIX - Service.args leaks - WD-LHV-540

  • FIX - Dependency on fusion-server for non-repl operations, even with repl_exchange_dir - WD-LHV-289

  • Standardise logging to use SL4J - WD-LHV-387

  • FIX - Wrong plugin status after installation - WD-LHV-497

  • Publish type script definitions - WD-LHV-507

  • FIX - NoClassDefFoundError after LiveHive successful installation - WD-LHV-521

  • Complete solution to Metastore port config - WD-LHV-539

  • Retry thrift client - WD-LHV-241

  • FIX - [CDH-5.13] LiveHive plugin status unknown in UI - WD-LHV-392

  • FIX - Silent installer will attempt to install live-hive erroneously - WD-LHV-412

  • Investigate Hiveserver2 token issue described in WD-LHV-386 - WD-LHV-414

  • FIX - Unexpected error occurred message won’t vanish - WD-LHV-483

  • Unify live hive service name on HDP and CDH - WD-LHV-495

  • Final step - warn of outage - WD-LHV-503

  • FIX - Installation failed: Hive Install step Restart Hive Service failed - WD-LHV-510

  • Include live-hive logs in talkbacks - WD-LHV-530

  • Document validation section - WD-LHV-534

  • Rewrite the package removal instructions - WD-LHV-538

  • Document the properties added as part of the fix for LHV-414 - WD-LHV-541

  • Add support for CDH 5.14 - WD-LHV-548

  • New User Guide Section: Deployment Planning - WD-LHV-550

  • Better error message and retry button on parcel/stack download page - WD-LHV-445

  • FIX - Hiveserver2 couldn’t connect to LiveHive Proxy - WD-LHV-542

3. Concepts

3.1. Product concepts

Familiarity with product and environment concepts will help you understand how to use the Fusion Plugin for Live Hive. Learn the following concepts to become proficient with replication.

Apache Hive

Hive is a data warehousing technology for Apache Hadoop. It is designed to offer an abstraction that supports applications that want to use data residing in a Hadoop cluster in a structured manner, allowing ad-hoc querying, summarization and other data analysis tasks to be performed using high-level constructs, including Apache Hive SQL querys.

Hive Metadata

The operation of Hive depends on the definition of metadata that describes the structure of data residing in a Hadoop cluster. Hive organizes its metadata with structure also, including definitions of Databases, Tables, Partitions, and Buckets.

Apache Hive Type System

Hive defines primitive and complex data types that can be assigned to data as part of the Hive metadata definitions. These are primitive types such as TINYINT, BIGINT, BOOLEAN, STRING, VARCHAR, TIMESTAMP, etc. and complex types like Structs, Maps, and Arrays.

Apache Hive Metastore

The Apache Hive Metastore is a stateless service in a Hadoop cluster that presents an interface for application to access Hive metadata. Because it is stateless, the metastore can be deployed in a variety of configuration to suit different requirements. In every case, it provides a common interface for applications to use Hive metadata.

The Hive Metastore is usually deployed as a standalone service, exposing an Apache Thrift interface by which client applications interact with it to create, modify, use and delete Hive Metadata in the form of databases, tables, etc. It can also be run in embedded mode, where the metastore implementation is co-located with the application making use of it.

WANdisco Fusion Live Hive Proxy

The Live Hive Proxy is a WANdisco service that is deployed with Live Hive, acting as a proxy for applications that use a standalone Hive Metastore. The service coordinates actions performed against the metastore with actions within clusters in which associated Hive metadata are replicated.

Hive Client Applications

Client applications that use Apache Hive interact with the Hive Metastore, either directly (using its Thrift interface), or indirectly via another client application such as Beeline or Hiveserver2.

Hiveserver2

is a service that exposes a JDBC interface for applications that want to use it for accessing Hive. This could include standard analytic tools and visualization technologies, or the Hive-specific CLI called Beeline.

Hive applications determine how to contact the Hive Metastore using the Hadoop configuration property hive.metastore.uris.

Hiveserver2 Template

A template service that amends the hiveserver2 config so that it no longer uses the embedded metastore, and instead correctly references the hive.metastore.uris parameter that points to our "external" Hive Metastore server.

Hive pattern rules

A simple syntax used by Hive for matching database objects. This pattern system replaced the more complex regular expressions that were used prior to Live Hive Plugin 1.2.

WANdisco Fusion plugin

The Fusion Plugin for Live Hive is a plugin for WANdisco Fusion. Before you can install it you must first complete the installation of the core WANdisco Fusion product. See WANdisco Fusion user guide.

Get additional terms from the Big Data Glossary.

3.2. Product architecture

The native Hive Metastore is not replaced, instead, Live Hive Plugin runs as a proxy server that issues commands via the connected client (i.e. beeline) to the original metastore, which is on the cluster.

The Live Hive Plugin proxy passes on read commands directly to the local Hive Metastore, while Fusion co-ordinates any write commands, so all metastores on all clusters will perform the write operations, such as table creation. Live Hive will also automatically start to replicate Hive tables when their names match a user defined rule.

Hive Plugin Architecture
Figure 1. Live Hive Plugin Plugin Architecture
1 Write access needs to be co-ordinated by Fusion before executing the command on the metastore.
2 Read Commands are 'passed-through' straight to the metastore as we do not need to co-ordinate via Fusion.
3 Makes connection to the metastore on the cluster.

3.2.1. Limitations

Membership changes

There is currently no support for dynamic membership changes. Once installed on all Fusion nodes, the Live Hive Plugin plugin is activated. See Activate Live Hive Plugin. During activation, the membership for replication is set and cannot be modified later. For this reason, it’s not possible to add new Live Hive Plugin nodes at a later time, including a High Availability node running an existing Live Hive proxy that wasn’t part of your original membership.

Any change to membership in terms of adding, removing or changing existing nodes will require a complete reinstallation of Live Hive.

Where to install Live Hive Plugin
  • Install the Live Hive Plugin on all zones. While it is possible to only install on a subset of your zones, there are two potential problem scenarios:

    • Live Hive Plugin installed on all zones but a Hive replicated rule is on a membership spanning a subset of zones.

    • Live Hive Plugin not installed on all zones, but a replicated rule is on a membership spanning all zones.

Both situations result in unpredictable behaviour that may end up causing serious problems.

  • HDP/Ambari only On HDP you cannot co-locate the Live Hive Plugin proxy on a node that is running the Hive metastore. This is because Ambari uses the value from hive.metastore.uris to determine what port the Metastore should listen on, which would clash with Live Hive Plugin.

  • You must install Live Hive Plugin on all Fusion nodes within a zone. Note that while the plugin must be installed on all nodes within a zone, the plugin’s proxy does not.

Hive must be running at all zones

All participating zones must be running Hive in order to support replication. We’re aware that this currently prevents the popular use case for replicating between on-premises clusters and s3/cloud storage, where Hive is not running. We intend to remove the limitation in a future release.

Support for Hive transactions

By default, Hive transactions will be rejected by the Live Hive Proxy. Where this is an absolute requirement, there is a method for enabling transactions to pass through.

Don’t enable pass-through of Hive transactions when they are used on tables which are under replication as it will cause inconsistency in Hive data across zones.
Enabling Hive transaction pass-through on replicated tables

Add block.txn.services property to the live-hive-site.xml file and set it to false. Note that this property is not exposed by default, you need to add it yourself.

Alternatively, you may set the property via the proxy stack/parcel and it will be written to the live-hive-site.xml. Add the following configuration property to the Live Hive Proxy Config in Ambari or Cloudera Manager.

block.txn.services = false

3.3. Deployment models

The following deployment models illustrate some of the common use cases for running Live Hive.

Hive Plugin Architecture
Figure 2. Live Hive Plugin Deployment Model

3.4. Analytic off-loading

In a typical on-premises Hadoop cluster, data ingest, analytic jobs all run through the same infrastructure where some activities impose a load on the cluster that can impact other activities. Fusion Plugin for Live Hive 2.0 allows you to divide up the workflow across separate environments, which lets you isolate the overheads associated with some events. You can ingest in one environment while using a different environment where capacity is provided to run the analytic jobs. You get more control over each environment’s performance.

  • You can ingest data from anywhere and query that at scale within the environment.

  • You can ingest data on premises (or where ever the data is generated) and query it at scale in another optimized environment, such as a cloud environment with elastic scaling that can be spun up only when queries jobs are queued. In this model, you may ingest data continuously but you don’t need to run a large cluster 24-hours-per-day for queries jobs.

3.5. Multi-stage jobs across multiple environments

A typical Hadoop workflow might involve a series of activities, ingesting data, cleaning data and then analyzing the data in a short series of steps. You may be generating intermediate output to be run against end-stage reporting jobs that perform analytical work, running all these work streams on a single cluster could require a lot of careful coordination with different types of workloads, conducting multi-stage jobs. This is a common chain of query activities for Hive, where you might ingest raw data, refine and augment it with other information, then eventually run analytic jobs against your output on a periodic basis, for reporting purposes, or in real-time.

In a replicated environment, however, you can control where those job stages are run. You can split this activity across multiple clusters to ensure the queries jobs needed for reporting purposes will have access to the capacity necessary to ensure that they run under within SLAs. You also can run different types of clusters to make more efficient use of the overall chain of work that occurs in a multi-stage job environments. You could have a cluster running that is tweaked and tuned for most efficient ingest, while running a completely different kind of environment that is tuned for another task, such as the end-stage reporting jobs that run against processed and augmented data. Running with LiveData across multiple environments allows you to run each different type of activity in the most efficient way.

3.6. Migration

Live Hive allows you to move both the Hive data, stored in HCFS and associated Hive metadata from an on-premises cluster over to cloud-based infrastructure. There’s no need to stop your cluster activity; the migration can happen without impact to your Hadoop operations.

3.7. Disaster Recovery

As data is replicated between nodes on a continuous basis, Live Hive is an ideal solution for protecting your data from loss. If a disaster occurs, there’s no complicated switchover as the data is always operational.

3.8. Application integration

This section covers what you need to know in order use Live Hive Plugin in various environments, using different Hadoop applications.

4. Installation

4.1. Pre-requisites

An installation should only proceed if the following prerequisites are met on each Live Hive Plugin node:

  • Hadoop Cluster CDH (5.12.0 - CDH 5.14.0) or HDP (2.6.0 - HDP 2.6.4)

  • Hive installed, configured and running on the cluster

  • WANdisco Fusion 2.12.0 or later

It’s extremely useful complete some work before you begin a Live Hive Plugin deployment. The following tasks and checks will make installation easier and reduce the chance of an unexpected roadblock causing a deployment to stall or fail.

It’s important to make sure that the following elements meet the requirements that are set in the Pre-requisites.

4.1.1. Server OS

One common requirement that runs through much of the deployment is the need for strict consistency between Fusion nodes, running Live Hive Plugin. Your nodes should, as a minimum, be running with the same versions of:

  • Hadoop/Manager software.

  • Linux.

    • Check to see if you are running a niche variant, e.g. Oracle Linux is compiled from RHEL but it is not identical to a RHEL installation.

  • Java.

    • Ensure you are running the same version, on consistent paths.

4.1.2. Hadoop Environment

Confirm that your Hadoop clusters are working. - All nodes must have a "fusion" system user account for running Fusion services. - Check the Hadoop daemon log files for any errors that might cause problems with your installation.

Folder Permissions

When installing the Live Hive proxy or plugin, the permissions of /etc/wandisco/fusion/plugins/hive/ is set to match the Fusion user (FUSION_SERVER_USER) and group (FUSION_SERVER_GROUP), which are set in the Fusion node installation procedure.

Permissions on the folder are also set such that processes can write new files to that location as long as the user associated with the process is the FUSION_SERVER_USER or is a member of the FUSION_SERVER_GROUP.

No automatic fix for permissioning

Changes to the fusion user/group are not automatically updated in their directories. You need to manually fix these issues, folloing the above guidelines.

Live Hive Plugin dependencies

Generally, Live Hive Plugin is reliant on distribution artefacts being available for it to load. Proxy init scripts, plugin cp-extra scripts, etc, load in relevant items from the cluster it is installed on.

For Cloudera deployments, these functions will work as expected because all managed nodes have the CDH Parcel, regardless of the role of the node, thus libs are available. However, on Ambari deployments there is a requirement to load in the needed items.

Hive Metastore Libraries (Ambari)

Check that hive metastore libraries are available on the Fusion node. These instructions are dependent on availability of the binaries and any local policies for access and use of the native system packages or access to repositories. The following commands check if the requirement packages / libraries are already installed:

RPM
rpm -qa "hive*-metastore"
DPKG
dpkg -l "hive*-metastore"
Platform independent
ls /usr/*/current/hive-metastore/lib/

If running any of these commands shows that hive-metastore is installed, no further action should be required. If the packages are not in place, you should run the appropriate packager:

  • Hadoop Cluster (CDH or HDP, meeting WANdisco Fusion requirements, see the Prerequisites Checklist)

  • Hive installed, configured and running on the cluster

  • WANdisco Fusion 2.12.0 or later

RHEL / CentOs
# can be made non-interactive with -y flag
yum install "hive*-metastore"
Suse
# can be made non-interactive with --non-interactive flag
zypper install "hive*-metastore"
Ubuntu
# can be made non-interactive with -y flag
apt-get install "hive*-metastore"

4.1.3. Firewalls and Networking

  • If iptables or SELinux are running, you must confirm that any rules that are in place will not block Live Hive Plugin communication.

  • If any nodes are multi-homed, ensure that you account for this when setting which interfaces will be used during installation.

  • Ensure that you have hostname resolution between clusters, if not add suitable entries to your hosts files.

  • Check your network performance to make sure there are no unexpected latency issues or packet loss.

Kerberos Configuration (CDH)

Prepare a Kerberos principal for each Fusion node and place this in a keytab with read/write permissions for user “fusion” on the relevant node.

The keytab/principal that you specify for the live hive service must use refer to the same principal that is used by the rest of the Hive stack. Usually it appears in the form hive/_HOST@DOMAIN.COM. Other values are likely to cause proxied requests to fail at the proxy-to-metastore step.

If a non-superuser principal is used, it also needs sufficient permission to impersonate all users. Setting the permissions is done by adding the following parameters to the Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml on all clusters:

hadoop.proxyuser.fusion.groups=*

and

hadoop.proxyuser.fusion.hosts=<FQDN of local Fusion nodes>
Kerberos Configuration (HDP)

Under HDP, these properties are added to the:

HDFS Config --> Advanced Core-site
For more information, see Secure Impersonation.
  1. (HDP) Set up the KDC server and create admin user and database.

  2. (HDP) Create principals for Hadoop in Kerberos database.

  3. Edit Hadoop config files to reference Keytabs and principals.

  4. Restart Cluster.

SSL

To enable SSL encrypted communication between Fusion nodes (optional) Java KeyStore and TrustStore files must be generated or available for all Fusion nodes.

We don’t recommend using Self-signed certificates, except for proof-of-concept/testing.

Confirm which components will need SSL encryption, e.g.

Connections that may be SSL secured
  • Live Hive Plugin server ←→ Live Hive Plugin server

  • server, IHC ←→ Live Hive Plugin server

  • client ←→ Live Hive Plugin server

  • browser ←→ UI server

Server utilisation
  • Will Live Hive Plugin be running on a dedicated server or sharing resources with other applications?

  • Check you will be running with sufficient disk space, will you be installing to non-default paths.

  • Use "ulimit -a" to check on the the open processes being sufficient.

  • Use netstat to review the connections being made to the server. Verify that any ports required by Live Hive Plugin are not in use.

  • Check you will be running with sufficient disk space, will you be installing to non-default paths.

  • Use "ulimit -a" to check on the the open processes being sufficient.

  • Use netstat to review the connections being made to the server. Verify that any ports required by Live Hive Plugin are not in use.

  • Consider using SCP to push large files across the WAN to ensure that no data transfer problems occur.

4.2. Installation

4.2.1. Installer Options

The following section provides additional information about running the Live Hive installer.

Installer Help

The bundled installer provides some additional functionality that lets you install selected components, which may be useful if you need to restore or replace a specific file. To review the options, run the installer with the --help option, i.e.

[user@gmart01-vm1 ~]# ./live-hive-installer.sh --help
Verifying archive integrity... All good.
Uncompressing WANdisco Hive Live.......................

This usage information describes the options of the embedded installer script. Further help, if running directly from the installer is available using '--help'. The following options should be specified without a leading '-' or '--'. Also note that the component installation control option effects are applied in the order provided.

Installation options
General options:
  help                             Print this message and exit

Component installation control:
  only-fusion-ui-client-plugin     Only install the plugin's fusion-ui-client component
  only-fusion-ui-server-plugin     Only install the plugin's fusion-ui-server component
  only-fusion-server-plugin        Only install the plugin's fusion-server component
  only-user-installable-resources  Only install the plugin's additional resources
  skip-fusion-ui-client-plugin     Do not install the plugin's fusion-ui-client component
  skip-fusion-ui-server-plugin     Do not install the plugin's fusion-ui-server component
  skip-fusion-server-plugin        Do not install the plugin's fusion-server component
  skip-user-installable-resources  Do not install the plugin's additional resources
Standard help parameters
# ./live-hive-installer.sh --help
Makeself version 2.1.5
 1) Getting help or info about ./live-hive-installer.sh :
  ./live-hive-installer.sh --help   Print this message
  ./live-hive-installer.sh --info   Print embedded info : title, default target directory, embedded script ...
  ./live-hive-installer.sh --lsm    Print embedded lsm entry (or no LSM)
  ./live-hive-installer.sh --list   Print the list of files in the archive
  ./live-hive-installer.sh --check  Checks integrity of the archive

 2) Running ./live-hive-installer.sh :
  ./live-hive-installer.sh [options] [--] [additional arguments to embedded script]
  with following options (in that order)
  --confirm             Ask before running embedded script
  --noexec              Do not run embedded script
  --keep                Do not erase target directory after running the embedded script
  --nox11               Do not spawn an xterm
  --nochown             Do not give the extracted files to the current user
  --target NewDirectory Extract in NewDirectory
  --tar arg1 [arg2 ...] Access the contents of the archive through the tar command
  --                    Following arguments will be passed to the embedded script

 3) Environment:
  LOG_FILE              Installer messages will be logged to the specified file
Silent installation

Instead of installing through the UI, you can install using the silent (scripted) installer. These steps need to be repeated on each node you want the Live Hive plugin installed on.

  1. Obtain the Live Hive Plugin installer from WANdisco and open a terminal session on your WANdisco Fusion node.

  2. Ensure the downloaded file is executable e.g.

    # chmod +x live-hive-installer.sh Enter
  3. Run the Live Hive Plugin installer e.g.

    # sudo ./live-hive-installer.sh Enter
  4. Now place the parcels or stacks in the relevant directory. They can be found in the directory /opt/wandisco/fusion-ui-server/ui-client-platform/downloads/core_plugins/live-hive-<version>. The steps are the same steps as in the UI installer. For more information see Parcels if you are using Cloudera, or Stacks if using Ambari. Ensure that you restart your Cloudera or Ambari server.

  5. Now edit the live_hive_silent_installer.properties file, located in /opt/wandisco/fusion-ui-server/plugins/live-hive-ui-server-<your version>/properties.
    The following fields are required:

    • live.hive.proxy.thrift.host - is defunct in favor of live.hive.proxy.thrift.uris.

    • live.hive.proxy.keytab - the keytab the live hive proxy will use.

    • live.hive.proxy.principal - the principal the live hive proxy will use. This must be in the form user/HOST@REALM.

    • metastore.service.principal - the user portion of the vanilla metastore’s service principal e.g. user. Note this may not be the same as the user/HOST@REALM entered above.

    • remote.thrift.host - the original Hive Metastore thrift host and port. This must be in the form host:port.
      If vanilla metastore HA is configured, this should be a comma separated list of all existing metastore host:ports

      Optional fields:

    • live.hive.proxy.thrift.port - is the port the live hive proxy binds to and runs on. Default=9090.

    • plugin.hive.metastore.heap.size - the maximum Java heap size of the metastore. Default = 1GB.

  6. To start the silent installation, go to /opt/wandisco/fusion-ui-server/plugins/live-hive-ui-server-<version> and run:

    # ./scripts/silent_installer_live_hive.sh ./properties/LIVE_HIVE_PROXY-silent-installer.properties Enter
  7. Repeat these steps on each node.

  8. Once the plugin is installed on all nodes that will replicate Hive metadata, activate the plugin.

Silent installation Known issues
Important workaround for release 1.0
After running a silent installation, the Live Hive plugin will still appear as ready to install, on the Plugin screen.

You must Restart the Fusion node.

If you change any properties in the UI before performing the restart then you will impair the UI server’s ability to see that the Live Hive Plugin plugin is activated due to a loss of the following ui.properties property,

plugin.installed.LiveHiveFusionPlugin=true

This can be fixed with a replacement of the property followed by a fusion-ui-server restart.

4.2.2. Cloudera-based steps

Run the installer

Obtain the Live Hive Plugin installer from WANdisco. Open a terminal session on your WANdisco Fusion node and run the installer as follows:

  1. Run the Live Hive Plugin installer on each host required:

    # sudo ./live-hive-installer.sh Enter

    You will see the following messaging.

    # sudo ./live-hive-installer.sh
    Verifying archive integrity... All good.
    Uncompressing WANdisco Live Hive.......................
    
    
        ::   ::  ::     #     #   ##    ####  ######   #   #####   #####   #####
       :::: :::: :::    #     #  #  #  ##  ## #     #  #  #     # #     # #     #
      ::::::::::: :::   #  #  # #    # #    # #     #  #  #       #       #     #
     ::::::::::::: :::  # # # # #    # #    # #     #  #   #####  #       #     #
      ::::::::::: :::   # # # # #    # #    # #     #  #        # #       #     #
       :::: :::: :::    ##   ##  #  ## #    # #     #  #  #     # #     # #     #
        ::   ::  ::     #     #   ## # #    # ######   #   #####   #####   #####
    
    
    
    You are about to install WANdisco Live Hive version 2.0.0
    
    Do you want to continue with the installation? (Y/n)
      wd-live-hive-plugin-2.0.0.tar.gz ... Done
      live-hive-fusion-core-plugin-2.0.0-1233.noarch.rpm ... Done
      storing user packages in '/opt/wandisco/fusion-ui-server/ui-client-platform/downloads/core_plugins/live-hive' ... Done
      live-hive-ui-server-2.0.0-dist.tar.gz ... Done
    All requested components installed.
    Go to your WANdisco Fusion UI Server to complete configuration.
Installer options
View the Installer Options section for details on additional installer functions, including the ability to install selected components.
IMPORTANT: Once you run this installer script, do not restart the Fusion node until you have fully completed the installation steps (up to activation) for this node.
Configure the Live Hive Plugin
  1. Open a session to your WANdisco Fusion UI. You will see message that confirms that the Live Hive Plugin has been detected. Click on plugins link to review the Plugins page.

    Live Hive Plugin Architecture
    Figure 3. Live Hive Plugin Architecture
  2. The plugin Fusion Plugin for Live Hive now appears on the list. Click the button labelled Install Now.

    Live Hive Plugin Architecture
    Figure 4. Live Hive Plugin Architecture
  3. The installation process runs through four steps that handle the placement of parcel files onto your Cloudera Manager server.

    Live Hive Plugin Architecture
    Figure 5. Live Hive Plugin Architecture
    Parcels
    Parcels need to be placed in the correct directory to make them available to the manager. To do this:

    Copy the paths for the .parcel and .parcel.sha files for your corresponding platform type,
    e.g. el6 (Enterprise Linux version 6).

    1. Download the Parcel packages to the Cloudera service directory (/opt/cloudera/parcel-repo/) on your node, e.g.

      ssh user@docs-cm.fusion.domain-name.com
      user@docs-cm.fusion.domain-name.com password 
      [user@docs-cm ~]$ sudo -i
      [user@docs-cm ~] cd /opt/cloudera/parcel-repo
      [user@docs-cm ~] wget <your-fusion-node.hostname>:8083/ui/downloads/core_plugins/live-hive/parcel_packages/LIVE_HIVE_PROXY-1.0.0-SNAPSHOT-el6.parcel
      [user@docs-cm ~] wget <your-fusion-node.hostname>:8083/ui/downloads/core_plugins/live-hive/parcel_packages/LIVE_HIVE_PROXY-1.0.0-SNAPSHOT-el6.parcel.sha
    2. Change the ownership of the parcel files to match up with Cloudera Manager, e.g.

      chown cloudera-scm:cloudera-scm LIVE_HIVE_PROXY-*
      # ls -l
      total 1492884
      -rw-r--r-- 1 cloudera-scm cloudera-scm 1520997979 Jun 16  2017 CDH-5.11.0-1.cdh5.11.0.p0.34-el6.parcel
      -rw-r--r-- 1 cloudera-scm cloudera-scm         41 Aug 24 15:38 CDH-5.11.0-1.cdh5.11.0.p0.34-el6.parcel.sha
      -rw-r----- 1 cloudera-scm cloudera-scm      58207 Feb  7 14:41 CDH-5.11.0-1.cdh5.11.0.p0.34-el6.parcel.torrent
      -rw-r--r-- 1 cloudera-scm cloudera-scm    7087088 Feb  7 14:37 FUSION-2.12.example-cdh5.11.0-el6.parcel
      -rw-r--r-- 1 cloudera-scm cloudera-scm         41 Feb  7 14:37 FUSION-2.12.example-cdh5.11.0-el6.parcel.sha
      -rw-r----- 1 cloudera-scm cloudera-scm        454 Feb  7 14:41 FUSION-2.12.0.0.example-el6.parcel.torrent
      -rw-r--r-- 1 cloudera-scm cloudera-scm     544587 Feb  6 11:51 LIVE_HIVE_PROXY-1.0.0-el6.parcel
      -rw-r--r-- 1 cloudera-scm cloudera-scm         41 Feb  6 11:51 LIVE_HIVE_PROXY-1.0.0-el6.parcel.sha
    3. Copy the Custom Service Descriptor (LIVE_HIVE_PROXY-x.x.x.jar) file to the Local Descriptor Repository (normally /opt/cloudera/csd/) on your node. e.g.

      # cd ...
      # cd csd
      wget http://<your-fusion-node.hostname>:8083/ui/downloads/core_plugins/live-hive/parcel_packages/LIVE_HIVE_PROXY-2.x.y.jar
      Resolving <your-fusion-node.hostname>... 10.0.0.1
      Connecting to <your-fusion-node.hostname>.com|10.10.0.1|:8083... connected.
      HTTP request sent, awaiting response... 200 OK
      Length: 4041 (3.9K) [application/java-archive]
      Saving to: LIVE_HIVE_PROXY-2.x.y.jar
      
      100%[=============================================================================================>] 4,041       --.-K/s   in 0s
      
      2018-02-07 16:23:42 (279 MB/s) - LIVE_HIVE_PROXY-2.x.y.jar saved [4041/4041]
    4. Restart the Cloudera server so that Cloudera can see the new parcel and jar, e.g.

      [user@docs-cm ~] service cloudera-scm-server restart
      After restarting the Cloudera Server, the Cloudera Manager Service(CMS) will report a stale config, which requires a restart via Cloudera Manager.

      e.g. Login to Cloudera Manager and click on the stale config spinner.

    Live Hive Plugin installation
    Figure 6. Live Hive Plugin installation - CMS restart
  4. The second installer screen handles Configuration. The first section validates existing configuration to ensure that Hive is set up correctly. Click the Validate button.

    Live Hive Plugin installation
    Figure 7. Live Hive Plugin installation - validation (Screen 2)
    Install a Live Hive Proxy on this host

    The installer lets you choose not to install the Live Hive proxy onto this node. While you must install Live Hive on all nodes, if you don’t wish to use a node to store hive metadata, you can choose to exclude the Live Hive proxy from the installation. If you do this, the node still plays its part in transaction coordination, without keeping a local copy of the replicated data.

    If you deselect Live Hive proxy on ALL nodes, then replication will not work. You must install at least 1 proxy in each zone. Should you have a cluster that doesn’t have a single Live Hive proxy, you will need to perform the following procedure to enable Hive metadata replication.
    Live Hive Proxy Port

    The HTP port used by the Plugin. Default: 9090

    Hive Metastore URI

    The metastore(s) which the Live Hive proxy will send requests to.
    Add additional URIs by clicking the + Add URI button and entering additional URI / port information.

    If you add additional URIs, you must complete the necessary information or remove them. You cannot have an incomplete line.
    Live Hive Plugin installation
    Figure 8. Live Hive Plugin installation - Additional URIs

    Click on Next step to continue.

  5. Step 3 of the installation covers security. If you have not enabled Kerberos on your cluster, you will pass through this step without adding any additional configuration.

    Live Hive Plugin installation
    Figure 9. Live Hive Plugin installation - security disabled (Screen 3)

    If you enable Kerberos, you will need to supply your Kerberos credentials.

    Live Hive Plugin installation
    Figure 10. Live Hive Plugin installation - security enabled (Screen 3)

    Hive Proxy Security

    User

    System user used for Hive Proxy

    Group

    System group for secure access

    Principal name

    The name of the Kerberos principal name for access

    Ensure that you use the same principal as is used for the Hive stack. If you use a different principal then Live Hive will not work due to basic security constraints.
    Manual Kerberos setup

    Tick the manual Kerberos setup

    Provide KDC credentials

    Tick the checkbox to configure KDC credentials

    KDC Credentials

    KDC admin principal

    Admin principal of your KDC, required by the Hadoop manager in order to deploy keytabs for the Live Hive Proxy.

    Password

    Password for the KDC admin principle.

    The above credentials are stored using stored using the Hadoop Manager’s temporary credential mechanism, and as such will be destroyed if either the Hadoop manager is restarted or 90 minutes (by default) have passed.
Keytab file path

The installer now validates that there is read access to the keytab that you specify here.

Metastore Service Principal Name

The installer validates where there are valid principals in the keytab.

Metastore Service Hostname

Enter the hostname of your Hive Metastore service.

  1. The final step is to complete the installation. Click Start Install.

    Live Hive Plugin installation
    Figure 11. Live Hive Plugin installation summary - screen 4

    The following steps are carried out:

    Cloudera parcel distribution and activation

    Distribute and active the Fusion Hive Plugin parcels in Cloudera Manager

    Update cluster HDFS configuration and redeploy

    Restarts the HDFS service configuration and distributes client configs for Fusion and Kerberos RPC privacy (if Kerberos is enabled)

    Install Fusion Hive Plugin service descriptor in Cloudera

    Installs the Fusion Hive Plugin service in Cloudera Manager

    Configure Impala (if installed)

    Configures Cloudera Impala to use Fusion Hive Plugin proxy

    Configure Hive

    Configure Cloudera Hive to use Fusion Hive Plugin proxy

    Restart Hive service

    Restarts the Hive service in Cloudera Manager to distribute update configurations

    Restart Fusion Server

    Complete the plugin installation and restart Fusion Server

  2. The installation will complete with a message "Live Hive installation complete!"

    Live Hive Plugin installation
    Figure 12. Live Hive Plugin installation - Completion

    Click Finish to close the Plugin installer screens.

Now advance to the Activation steps.

4.2.3. Ambari-based steps

Important HDP/Ambari requirement
On HDP you cannot co-locate the Live Hive Plugin proxy on a node that is running the Hive metastore. This is because Ambari uses the value from hive.metastore.uris to determine what port the Metastore should listen on, which would clash with Live Hive Plugin.
Run the installer

Obtain the Live Hive Plugin installer from WANdisco. Open a terminal session on your WANdisco Fusion node and run the installer as follows:

  1. Run the Live Hive Plugin installer on each host required:

    # sudo ./live-hive-installer.sh Enter
  2. The installer will check for components that are necessary for completing the installation:

    # sudo ./live-hive-installer.sh
    Verifying archive integrity... All good.
    Uncompressing WANdisco Live Hive.......................
    
    
        ::   ::  ::     #     #   ##    ####  ######   #   #####   #####   #####
       :::: :::: :::    #     #  #  #  ##  ## #     #  #  #     # #     # #     #
      ::::::::::: :::   #  #  # #    # #    # #     #  #  #       #       #     #
     ::::::::::::: :::  # # # # #    # #    # #     #  #   #####  #       #     #
      ::::::::::: :::   # # # # #    # #    # #     #  #        # #       #     #
       :::: :::: :::    ##   ##  #  ## #    # #     #  #  #     # #     # #     #
        ::   ::  ::     #     #   ## # #    # ######   #   #####   #####   #####
    
    
    
    You are about to install WANdisco Live Hive version 2.0.0
    
    Do you want to continue with the installation? (Y/n)
      wd-live-hive-plugin-2.0.0.tar.gz ... Done
      live-hive-fusion-core-plugin-2.0.0-1233.noarch.rpm ... Done
      storing user packages in '/opt/wandisco/fusion-ui-server/ui-client-platform/downloads/core_plugins/live-hive' ... Done
      live-hive-ui-server-2.0.0-dist.tar.gz ... Done
    All requested components installed.
    Go to your WANdisco Fusion UI Server to complete configuration.
Installer options
View the Installer Options section for details on additional installer functions, including the ability to install selected components.
IMPORTANT: Once you run this installer script, do not restart the Fusion node until you have fully completed the installation steps for this node.
Configure the Live Hive Plugin
  1. Open a session to your WANdisco Fusion UI. You will see message that confirms that the Live Hive Plugin has been detected. Click on Plugins link to review the Plugins page.

    Live Hive Plugin Architecture
    Figure 13. Live Hive Plugin Architecture
  2. The plugin live-hive-plugin now appears on the list. Click the button labelled Install Now.

    Live Hive Plugin Architecture
    Figure 14. Live Hive Plugin Parcel installation
    Live Hive Plugin installation
    Figure 15. Live Hive Plugin installation - Clients (Step 1)
    Stacks

    Stacks need to be placed in the correct directory to make them available to the manager. To do this:

    1. Download the service from the installer client download panel

    2. The services are gz files that will expand to the directories /LIVE_HIVE_PROXY and /LIVE_HIVESERVER2_TEMPLATE.

    3. For HDP, place this directory in /var/lib/ambari-server/resources/stacks/HDP/<version>/services.

    4. Restart the Ambari server.
      Note If using centos6/rhel6 we recommend using the following command to restart:

      initctl restart ambari-server
    5. Check on your Ambari manager that the services are present.

      Stacks
      Figure 16. Stacks present
  3. The second installer screen handles Configuration.

    Live Hive Plugin installation
    Figure 17. Live Hive Plugin installation - Configuration (Step 2)
    Install a Live Hive Proxy on this host

    The installer lets you choose not to install the Live Hive proxy onto this node. While you must install Live Hive on all nodes, if you don’t wish to use a node to store hive metadata, you can choose to exclude the Live Hive proxy from the installation. If you do this, the node still plays its part in transaction coordination, without keeping a local copy of the replicated data.

    If you deselect Live Hive proxy on ALL nodes, then replication will not work. You must install at least 1 proxy in each zone. Should you have a cluster that doesn’t have a single Live Hive proxy, you will need to perform the following procedure to enable Hive metadata replication.
    Live Hive Proxy Port

    The HTP port used by the Plugin. Default: 9090

    Hive Metastore URI

    The metastore(s) which the Live Hive proxy will send requests to.
    Add additional URIs by clicking the + Add URI button and entering additional URI / port information.

    If you add additional URIs, you must complete the necessary information or remove them. You cannot have an incomplete line.
    Live Hive Plugin installation
    Figure 18. Live Hive Plugin installation - Additional URIs

    Click on Next step to continue.

  4. Step 3 of the installation covers security. If you have not enabled Kerberos on your cluster, you will pass through this step without adding any additional configuration.

    Live Hive Plugin installation
    Figure 19. Live Hive Plugin installation - security disabled (Step 3)

    If you enable Kerberos, you will need to supply your Kerberos credentials.

    Live Hive Plugin installation
    Figure 20. Live Hive Plugin installation - security enabled (Step 3)
    Hive Proxy Security

    Kerberos settings for the Hive Proxy.

    User

    The system user for Hive.

    Group

    The system group for Hive.

    Principal name

    The Principal name for the Hive user.

    Ensure that you use the same principal as is used for the Hive stack. If you use a different principal then Live Hive will not work due to basic security constraints.
    Manual Kerberos setup (checkbox)

    Tick this checkbox to provide the Kerberos details for Hive Proxy Kerberos.

    Hive Proxy Kerberos
    Live Hive Plugin installation
    Figure 21. Live Hive Plugin installation - security enabled (Step 3)
    Keytab file path

    The installer now validates that there is read access to the keytab that you specify here.

    Validate first
    You must validate the keytab file before you choose the principal.
    Principal

    Select from the available principals. This is the principal that will be used to connect to the original Hive metastore. Validation checks that the principal is valid.

    Provide KDC credentials (Checkbox)

    Tick the checkbox to provide details for a KDC’s admin principal and password.

    Live Hive Plugin installation
    Figure 22. Live Hive Plugin installation - security enabled (Step 3)

    KDC Credentials

    If Ambari is managing the cluster’s Kerberos implementation, you must provide the following KDC credentials or the plugin installation will fail.
    KDC admin principal

    Admin principal of your KDC, required by the Hadoop manager in order to deploy keytabs for the Live Hive Plugin Proxy.

    Password

    Password for the KDC admin principle.

    The above credentials are stored using stored using the Hadoop Manager’s temporary credential mechanism, and as such will be destroyed if either the Hadoop manager is restarted or 90 minutes (by default) have passed.
  5. The final step is to complete the installation. Click Start Install.

    Live Hive Plugin installation
    Figure 23. Live Hive Plugin installation summary - (Step 4)

    The following steps are carried out:

    Hive Metastore Template Install

    Install Live Hive Metastore Service Template on Ambari.

    Live Hive Proxy Service Install

    Install the Live Hive Proxy Service on Ambari.

    Update Hive Configuration

    Updates the URIs for Hive connections in Ambari.

    Restart HDFS and Hive Service

    Restarts Hive Service in Ambari. NOTE this process can take several minutes to complete.

    Restart Live Hive Proxy Service

    Restarts Live Hive Proxy Service in Ambari, Note this process can take several minutes to complete.

    Restart Fusion Server

    Complete the plugin installation and restart Fusion Server.

  6. The installation will complete with a message "Live Hive installation complete"

    Live Hive Plugin installation
    Figure 24. Live Hive Plugin installation - Completion

    Click Finish to close the Plugin installer screens. You must now activate the plugin.

Instead of installing through the UI, you can install using the silent (scripted) installer. These steps need to be repeated on each node you want the Live Hive plugin installed on.

  1. Obtain the Live Hive Plugin installer from WANdisco and open a terminal session on your WANdisco Fusion node.

  2. Ensure the downloaded file is executable e.g.

    # chmod +x live-hive-installer.sh Enter
  3. Run the Live Hive Plugin installer e.g.

    # sudo ./live-hive-installer.sh Enter
  4. Now place the parcels or stacks in the relevant directory. They can be found in the directory /opt/wandisco/fusion-ui-server/ui-client-platform/downloads/core_plugins/live-hive-<version>. The steps are the same steps as in the UI installer. For more information see Parcels if you are using Cloudera, or Stacks if using Ambari. Ensure that you restart your Cloudera or Ambari server.

  5. Now edit the live_hive_silent_installer.properties file, located in /opt/wandisco/fusion-ui-server/plugins/live-hive-ui-server-<your version>/properties.

    Required fields:
    • live.hive.deploy.proxy - controls whether the installation includes the Live Hive proxy. Default is true only enter false if at least one other Fusion node in the zone is going to run the proxy.

    • live.hive.proxy.thrift.host - is defunct in favor of live.hive.proxy.thrift.uris.

    • live.hive.proxy.keytab - the keytab and subsequent principal that the live hive proxy will use.

    • live.hive.proxy.principal - the principal the live hive proxy will use. This must be in the form user/HOST@REALM.

    • metastore.service.principal - the user portion of the vanilla metastore’s service principal e.g. user. Note this may not be the same as the user/HOST@REALM entered above.

    • live.hive.proxy.remote.thrift.uris - the original Hive Metastore thrift host and port. This must be in the form host:port.
      If vanilla metastore HA is configured, this should be a comma separated list of all existing metastore host:ports

    Optional fields:
    • live.hive.proxy.thrift.port - the port the live hive proxy server will connect to. Default=9090.

    • plugin.hive.metastore.heap.size - maximum Java heap size of the metastore (Gigabytes). Default = 1

    Kerberized - using cluster manager to generate ketab and principals:
    • live.hive.proxy.kerberos.user - default is hive

    • live.hive.proxy.kerberos.group - default is hadoop

    • live.hive.proxy.kerberos.principal.short.name default is - hive

    • live.hive.proxy.kadmin.principal - no default, KDC principal

    • live.hive.proxy.kadmin.password - no default, KDC password

    Kerberized and you don’t want Live Hive Plugin to use the cluster manager for keytab/principal generation:
    • live.hive.kerberos.manual.setup - Set to true

    • live.hive.proxy.keytab - The keytab and subsequent principal that the live hive proxy will use

    • live.hive.proxy.principal - Principal must take the form of user/HOST@REALM

  6. To start the silent installation, go to /opt/wandisco/fusion-ui-server/plugins/live-hive-ui-server-<version> and run:

    # ./scripts/silent_installer_live_hive.sh ./properties/LIVE_HIVE_PROXY-silent-installer.properties Enter
  7. Repeat these steps on each node.

  8. Once the plugin is installed on all relevant nodes, activate the plugin.

Silent installation Known issues
Important workaround for release 1.0
After running a silent installation, the Live Hive plugin will still appear as ready to install, on the Plugin screen.

You must Restart the Fusion node.

If you change any properties in the UI before performing the restart then you will impair the UI server’s ability to see that the Live Hive Plugin is activated due to a loss of the following ui.properties property,

plugin.installed.LiveHiveFusionPlugin=true

This can be fixed with a replacement of the property followed by a fusion-ui-server restart.

  1. Obtain the Live Hive Plugin installer from WANdisco and open a terminal session on your WANdisco Fusion node.

  2. Ensure the downloaded file is executable e.g.

    # chmod +x live-hive-installer.sh Enter
  3. Run the Live Hive Plugin installer e.g.

    # sudo ./live-hive-installer.sh Enter
  4. Now place the parcels or stacks in the relevant directory. They can be found in the directory /opt/wandisco/fusion-ui-server/ui-client-platform/downloads/core_plugins/live-hive-<version>. The steps are the same steps as in the UI installer. For more information see Parcels if you are using Cloudera, or Stacks if using Ambari. Ensure that you restart your Cloudera or Ambari server.

  5. Now edit the live_hive_silent_installer.properties file, located in /opt/wandisco/fusion-ui-server/plugins/live-hive-ui-server-<your version>/properties.

    Required fields:
    • live.hive.deploy.proxy - controls whether the installation includes the Live Hive proxy. Default is true only enter false if at least one other Fusion node in the zone is going to run the proxy.

    • live.hive.proxy.thrift.host - is defunct in favor of live.hive.proxy.thrift.uris.

    • live.hive.proxy.keytab - the keytab and subsequent principal that the live hive proxy will use.

    • live.hive.proxy.principal - the principal the live hive proxy will use. This must be in the form user/HOST@REALM.

    • metastore.service.principal - the user portion of the vanilla metastore’s service principal e.g. user. Note this may not be the same as the user/HOST@REALM entered above.

    • live.hive.proxy.remote.thrift.uris - the original Hive Metastore thrift host and port. This must be in the form host:port.
      If vanilla metastore HA is configured, this should be a comma separated list of all existing metastore host:ports

    Optional fields:
    • live.hive.proxy.thrift.port - the port the live hive proxy server will connect to. Default=9090.

    • plugin.hive.metastore.heap.size - maximum Java heap size of the metastore (Gigabytes). Default = 1

    Kerberized - using cluster manager to generate ketab and principals:
    • live.hive.proxy.kerberos.user - default is hive

    • live.hive.proxy.kerberos.group - default is hadoop

    • live.hive.proxy.kerberos.principal.short.name default is - hive

    • live.hive.proxy.kadmin.principal - no default, KDC principal

    • live.hive.proxy.kadmin.password - no default, KDC password

    Kerberized and you don’t want Live Hive Plugin to use the cluster manager for keytab/principal generation:
    • live.hive.kerberos.manual.setup - Set to true

    • live.hive.proxy.keytab - The keytab and subsequent principal that the live hive proxy will use

    • live.hive.proxy.principal - Principal must take the form of user/HOST@REALM

  6. To start the silent installation, go to /opt/wandisco/fusion-ui-server/plugins/live-hive-ui-server-<version> and run:

    # ./scripts/silent_installer_live_hive.sh ./properties/LIVE_HIVE_PROXY-silent-installer.properties Enter
  7. Repeat these steps on each node.

  8. Once the plugin is installed on all relevant nodes, activate the plugin.

4.3. Activate Live Hive Plugin

After completing the installation on all applicable Fusion nodes, you will need to active Live Hive Plugin before you can use it. Use the following steps to complete the plugin activation.

  1. Log into the WANdisco Fusion UI. On the Settings tab, go to the Plugin Activation link in the Live Hive section of the side menu. The Live Hive Plugin Activation screen will appear.

    Live Hive Plugin installation
    Figure 25. Live Hive Plugin activation - Start
    Ensure that your clusters have been inducted before activating.
    The plugin will not work if you activate before completing the induction of all applicable zones.

    Tick the checkboxes that correspond with the zones that you will to replicate Hive metadata between, then click Activate.

  2. A message will appear at the bottom of the screen that confirms that the plugin is active.

    Live Hive Plugin installation
    Figure 26. Live Hive Plugin activation - Completion
    The plugin is active and cannot be modified.
    You can’t change membership, once activated. See Membership changes

4.4. Validation

The following section offers an optional set of steps for setting up test Livedata replication of Hive metadata and HDFS data.

The procedure also assumes Fusion, your Hadoop manager (in this example, Ambari), and all the (Ambari) services are running and showing up green.

If ‘Live Hiveserver2 Template’ is stopped, it may need to be started from the host screen by clicking on the service instance).

4.4.1. Replication

Hive replication can use the full potential of WANdisco’s LiveData, meaning metadata DDL commands such as Create, Alter, table commands are replicated, in addition to the inserts and data stores in HDFS.

4.4.2. Live Hive Plugin configuration

In order to enable Hive replication, two rules are needed:

  • HCFS rule that covers the location of Hive data in the Hadoop file system.

  • Hive rule that defines patterns to match the database name and table names.

In the following steps we will configure Hive replication for the directory database which will be located in the default /apps/hive/warehouse HDFS path used by Hive in the HDP distribution.

  1. Login to the Fusion UI in a browser.

  2. Click the Replication tab, then click Create.

  3. For the rule Type, ensure Hive is selected.

  4. For the Hive Pattern, enter:

    1. Database name: directory

    2. Table name: *

  5. Click the "Create Rule" button.

    Live Hive validation

  6. The resultant Hive rule will be displayed under the Replication Tab:

    Live Hive validation

  7. Next create the HCFS rule by clicking on the "Create button" a second time.

  8. For the rule Type, ensure HCFS is selected.

  9. Under the Paths heading, enter ‘/apps/hive/warehouse’ (or select the path from the HDFS File Tree), then click the "Add" button.

  10. For the Priority Zone ensure PROD is selected.

  11. Click the "Create 1 rule" button.

    Live Hive validation

  12. You should now see both the Hive and HCFS rule in the Replication Tab:

    Live Hive validation

4.4.3. Creating and Replicating New Hive Databases and Tables

  1. Open a Terminal window on PROD-FUS.FUSIONLAB.TEST

  2. Obtain the beeline connection URL by logging in to Ambari on PROD.FUSIONLAB.TEST, navigating to the Hive Service and clicking on the clipboard icon on the HiveServer2 JDBC URL line:

    Live Hive validation

    create database directory;
    show databases;
    use directory;
    CREATE TABLE IF NOT EXISTS names (id int , fname string, lname string, email string,social string, age string, secret string,license string, ip string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
    show tables;
  3. Log into the beeline shell using the datauser account.

    Live Hive validation

  4. Create a new database and table to use for testing.

    beeline
    !connect <paste URL obtained from previous step>
    show databases;
    use directory;
    show tables;

    Live Hive validation

  5. Obtain the beeline connection URL by logging in to Ambari on DR.FUSIONLAB.TEST, navigating to the Hive Service and clicking on the clipboard icon on the HiveServer2 JDBC URL line.

  6. Open a Terminal on DR-FUS.FUSIONLAB.TEST as datauser and verify replication has occurred by running the following commands:

    Live Hive validation

    beeline
    !connect <paste URL obtained from DR cluster Ambari, specify hdfs user when prompted>
    use directory;
    select count(*) from names where id >=0;
    select * from names where fname='TestUser' order by lname;
  7. If the names table shows up in the show tables command continue to the next step, if not check that your replication rules have been added per the “Live Hive Plugin configuration” section. Repeat the steps using a unique name for database, eg: directory1.

4.4.4. Adding and replicating data to the Hive table

  1. These next steps will demonstrate the Active-Active nature of Fusion’s replication engine by adding data into the “names” table on both PROD and DR clusters. Each data file contains 1 million records and the hive count query should reflect that.

  2. Open a Terminal on PROD-FUS.FUSIONLAB.TEST and DR-FUS.FUSIONLAB.TEST simultaneously.

  3. On PROD-FUS.FUSIONLAB.TEST cluster download the sample data set.

    wget https://s3-us-west-1.amazonaws.com/fusion-demo/datastream1.txt
  4. On DR-FUS.FUSIONLAB.TEST download the sample data set.

    wget https://s3-us-west-1.amazonaws.com/fusion-demo/datastream2.txt
  5. From PROD-FUS.FUSIONLAB.TEST - Put the data file into HDFS in the PROD cluster, so that it can be replicated to the DR cluster and queried from hive in the next step.

    hdfs dfs -put datastream1.txt \
    /apps/hive/warehouse/directory.db/names
  6. From DR-FUS.FUSIONLAB.TEST - Open a beeline shell and run a query on the data set.

    Whereas previously no username or password were specified following the !connect statement, now the hdfs username will be specified:
    beeline
    !connect <paste URL obtained from PROD cluster Ambari, specify hdfs user when prompted>
    use directory;
    select count(*) from names where id >=0;
    select * from names where fname='TestUser' order by lname;
    !quit

    Live Hive validation

    The above output does not include the output from the second select statement, but at the end the following total number of rows should be seen:

    Live Hive validation

  7. From DR-FUS.FUSIONLAB.TEST - Put the data file into HDFS in the DR cluster, so that it can be replicated to the PROD cluster and queried from hive in the next step.

    hdfs dfs -put datastream2.txt \
    /apps/hive/warehouse/directory.db/names
  8. From PROD-FUS.FUSIONLAB.TEST - Open a beeline shell and run a query on the data set.

    Whereas previously no username or password were specified following the !connect statement, now the hdfs username will be specified*:
    select * from names where fname='TestUser' order by lname;
    beeline
    !connect <paste URL obtained from PROD cluster Ambari, specify hdfs user when prompted>
    use directory;
    select count(*) from names where id >=0;
    select * from names where fname='TestUser' order by lname;
    !quit

    Live Hive validation

    the above output does not include the output from the second select statement, but at the end the following total number of rows should be seen:

    Live Hive validation

  9. From DR.FUSIONLAB.TEST verify the result sets match by running the same queries from a hive shell.

    (Result should be 2m records)

    (The results on PROD.FUSIONLAB.TEST and DR.FUSIONLAB.TEST should match even when using alternative fname values)

    !quit
  10. If the results sets match (count shows 2 million total records on each side) you have successfully demonstrated Active-Active Hive replication.

4.4.5. Verify HDFS and Hive Data

  1. Log into Fusion Admin UI on PROD-FUS.FUSIONLAB.TEST

    http://prod-fus.fusionlab.test:8083/
  2. Perform a new HDFS consistency check on the /apps/hive/warehouse directory.

    1. Click on the Replication tab and then click on the link for /apps/hive/warehouse in the HCFS replication rule
.

      Live Hive validation

    2. Click the Consistency Check tab, and then Press the Trigger Check button.

      Live Hive validation

    3. Contents should show Consistent, if not click on the Inconsistent link and review the differences

      Live Hive validation

  3. Perform a new Hive consistency check on the /apps/hive/warehouse directory.

    1. Click on the Replication tab and then click on the link for the db directory in the Hive replication rule.

      Live Hive validation

    2. Click the “Status” tab, then Press the “Trigger check” button.

    3. Contents should show Consistent when hovering over the directory database entry (if not drill into the directory database and review the differences).
      Live Hive validation

4.4.6. Alter and Drop Table Examples

These next steps will demonstrate replicating the Alter and Drop commands.

  1. On PROD-FUS.FUSIONLAB.TEST open a Terminal window.

    beeline
    !connect <paste URL obtained from DR cluster Ambari, specify hdfs user when prompted>
    use directory;
    alter table names rename to people;
  2. On DR-FUS.FUSIONLAB.TEST open a Terminal window to verify the alter command has replicated and run Drop table.

    beeline
    !connect <paste URL obtained from DR cluster Ambari, specify hdfs user when prompted>
    use directory;
    show tables;
    drop table people;

    Live Hive validation

  3. On PROD.FUSIONLAB.TEST verify table people no longer exists.

    show tables;

    Live Hive validation

4.5. Upgrade

Fusion Plugin for Live Hive 2.0 is upgraded by uninstalling the plugin on all nodes, followed by a re-installation, using the standard installation steps for your platform type.

4.5.1. Upgrade from Hive Metastore Plugin

Use the following procedure to upgrade to Live Hive from Hive Metastore Plugin (the precursor to Live Hive).

Uninstall Hive Metastore Plugin (WD Hive)
In summary
  1. Decouple Fusion from the Hive service

  2. Restart Hive service

  3. Stop WD Hive service(s) in the manager

  4. Remove WD Hive services from the manager

  5. Remove WD Hive manager elements

  6. Stop UI, Stop Fusion servers

  7. Remove WD Hive elements from Fusion

  8. Start UI, Fusion servers

On Hadoop Manager

  1. Reset all mentions of the hive.metastore.uris parameter in the Hive service config to their default values.

  2. Restart the Hive service to deploy the config change.

Cloudera Specific Steps
  1. Stop WD Hive Metastore.

  2. Deactive WD_HIVE_METASTORE parcel (Deactive Only), then 'Remove from hosts'; you may then delete the WD Hive Metastore service from the Cloudera UI.

    Upgrade from WDHive
    Figure 27. Live Hive Plugin Upgrade
  3. Remove WD Hive service parcel, e.g.

    find /opt/cloudera/ ! -readable -prune -o -name WD_HIVE* -print

    Remove the files found, e.g.

    rm -rf /opt/cloudera/csd/WD_HIVE_METASTORE-2.11.2.4-cdh5.11.0.jar
    rm -rf /opt/cloudera/parcels/.flood/WD_HIVE_METASTORE-2.11.2.4-cdh5.11.0-el6.parcel.torrent
    rm -rf /opt/cloudera/parcels/.flood/WD_HIVE_METASTORE-2.11.2.4-cdh5.11.0-el6.parcel
    rm -rf /opt/cloudera/parcels/.flood/WD_HIVE_METASTORE-2.11.2.4-cdh5.11.0-el6.parcel/WD_HIVE_METASTORE-2.11.2.4-cdh5.11.0-el6.parcel
Ambari Specific Steps
  1. Stop WD HS2 Template , WD Hive Metastore and WD Hive Metastore Slave services.

  2. Delete WD HS2 Template service

  3. Delete WD Hive Metastore service

  4. Remove WD Hive service stack, e.g.

    find /var/lib/ambari-server/ ! -readable -prune -o -name WD_HIVE* -print

    Remove the files found, e.g.

    /var/lib/ambari-server/resources/stacks/HDP/2.6/services/WD_HIVE_METASTORE
    /var/lib/ambari-server/resources/stacks/HDP/2.6/services/WD_HIVESERVER2_TEMPLATE

On Fusion Node

  1. Bring the Fusion server and UI server to a stop,

    service fusion-server stop
    service fusion-ui-server stop
  2. Remove the WD Hive package, e.g.

    rpm -qa | grep wd-hive
    
    yum remove -y wd-hive-plugin-hdp-x.y.z.-2.11.a.b-c.noarch
    yum remove -y wd-hive-metastore-hdp-x.y.z-2.11.a.b-c.noarch
  3. Remove package files from the node’s file system:

    rm -rf /opt/wandisco/fusion-ui-server/ui-client-platform/plugins/wd-hive-plugin
    rm -rf /opt/wandisco/fusion-ui-server/ui-client-platform/downloads/core_plugins/wd-hive/
    rm -rf /opt/wandisco/fusion-ui-server/plugins/hive-plugin-ui-server-<version>
  4. Start the Fusion servers.

    service fusion-server start
    service fusion-ui-server start

Now that the Hive Metastore plugin has been removed, review your replication rules in the Fusion UI and remove any Hive specific ones which are no longer required.

Live Hive may now be installed by following the instructions located in the Installation section. See Installation.

4.5.2. Upgrade from an earlier version of Live Hive

Use this procedure if you are already running with Live Hive (not WD Hive).

This is not a complete upgrade, only the Live Hive Plugin proxy and Live Hive Plugin Fusion plugin are upgraded, the UI elements remain untouched. The following procedure upgrades a Live Hive Plugin 2.0.0 installation to 2.0.2.1. Contact support if you intend to follow a different upgrade path.
  1. Download the Live Hive Plugin installer and unpack it on all Fusion nodes:

    live-hive-installer.sh --noexec --keep
  2. The unpacked contents which will be used are:

    installer/
    |- defaults
     |- installer
      |- functions.sh
      |- installer.sh
      |- resources
        |- additional
          |- parcel_packages
            |- LIVE_HIVE_PROXY-2.0.2.1-SNAPSHOT-el6.parcel
            |- LIVE_HIVE_PROXY-2.0.2.1-SNAPSHOT-el6.parcel.sha
            |- LIVE_HIVE_PROXY-2.0.2.1-SNAPSHOT.jar
          |- stack_packages
            |- live-hive-proxy-2.0.2.1_SNAPSHOT.stack.tar.gz
            |- live-hiveserver2-template-2.0.2.1_SNAPSHOT.stack.tar.gz
        |- core
          |- live-hive-fusion-core-plugin_2.0.2.1-SNAPSHOT-1162_all.deb
          |- live-hive-fusion-core-plugin-2.0.2.1_SNAPSHOT-1162.noarch.rpm
        |- logo.txt
        |- RPM-GPG-KEY-WANdisco
        |- ui-client
          |- wd-live-hive-plugin-2.0.2.1-SNAPSHOT.tar.gz
        |- ui-server
          |- live-hive-ui-server-2.0.2.1-SNAPSHOT-dist.tar.gz
Upgrade Live Hive service on CDH with parcels
  1. Copy the .parcel and .sha files to /opt/cloudera/parcel-repo on the manager node.

    Parcels will have to be renamed for an EL7 setup as the installer normally sets up hard links to them.
  2. Copy the CSD jar to /opt/cloudera/csd on the manager node.

  3. Restart Cloudera Manager to pick up the new CSD:

    service cloudera-scm-server restart

    This will ensure that Cloudera manager picks up the new CSD.

  4. The new parcels will now appear on the CDH parcels page, Distribute and Acivate the new parcel, resolving any config issues.

  5. The parcel may now be removed from hosts and deleted.

Upgrade Live Hive service on HDP with stacks
  1. Remove the old stacks and unpack the new stacks as per install:

    rm -rf /var/lib/ambari-server/resources/stacks/HDP/2.6/services/LIVE_HIVE*
    tar -C /var/lib/ambari-server/resources/stacks/HDP/2.6/services/ -zxf installer/resources/additional/stack_packages/live-hiveserver2-template-2.0.2.1.stack.tar.gz
    tar -C /var/lib/ambari-server/resources/stacks/HDP/2.6/services/ -zxf installer/resources/additional/stack_packages/live-hive-proxy-2.0.2.1.stack.tar.gz
  2. Restart Ambari Server to pick up the new stacks:

    ambari-server stop && ambari-server start

    Restart the Live Hive proxy and Live Hiveserver2 Template services. This will trigger Ambari to perform the stack upgrades - check versions in the manager UI.

Upgrade Live Hive Plugin on Fusion Nodes
  1. For Fusion Server, perform an RPM upgrade; for UI Server and Client, unpack relevant tarballs:

    service fusion-server stop
    service fusion-ui-server stop
    rpm -Uvh installer/resources/core/live-hive-fusion-core-plugin-2.0.2.1-1293.noarch.rpm
    rm -rf /opt/wandisco/fusion-ui-server/plugins/live-hive-ui-server-2.0.0.0
    rm -rf /opt/wandisco/fusion-ui-server/ui-client-platform/plugins/wd-live-hive-plugin
    tar -zxf installer/resources/ui-client/wd-live-hive-plugin-2.0.2.1.tar.gz -C /opt/wandisco/fusion-ui-server/ui-client-platform/plugins
    tar -zxf installer/resources/ui-server/live-hive-ui-server-2.0.2.1-dist.tar.gz -C /opt/wandisco/fusion-ui-server/plugins
    service fusion-server start
    service fusion-ui-server start
    
    rpm -qa '*live*'
    live-hive-fusion-core-plugin-2.0.2.1-1293.noarch
    live-hive-fusion-core-proxy-2.0.2.1-1293.noarch

4.6. Uninstallation

Use the following section to guide the removal.

Ensure that you contact WANdisco support before running this procedure. The following procedures currently require a lot of manual editing and should not be used without calling upon WANdisco’s support team for assistance.

4.6.1. Service removal

If removing Live Hive from a live cluster (rather than just removing Live Hive from a fusion server for re-installation / troubleshooting purposes) the following steps should be performed before removing the plugin:

  1. Remove the hive.metastore.uris parameter in the General config and Set to recommended the hive.metastore.uris instance in the Hive-2-site config section.

  2. Restart the cluster to deploy the changed config. No Hive clients will now be replicating to the proxy.

  3. Stop and delete the proxy service. On Cloudera, deactivate the LIVE_HIVE_PROXY parcel.

4.6.2. Package removal

Currently there is no programatic method for removing components although you can use the following commands to delete the plugin components, one at a time:

# yum erase -y -q live-hive-fusion-core-plugin
Repository cloudera-manager-5.10.0 is listed more than once in the configuration
Warning: RPMDB altered outside of yum.

  WANdisco Hive Metastore Plugin uninstalled successfully.

rm -rf /opt/wandisco/fusion-ui-server/plugins/live-hive-server-2.12/
rm -rf /opt/wandisco/fusion-ui-server/ui-client-platform/plugins/wd-live-hive-plugin/
rm -rf /opt/wandisco/fusion-ui-server/ui-client-platform/downloads/core_plugins/live-hive/
Revert the HS2 template
  1. Login to Ambari and goto the HS2 template component. Go to the config tab.

  2. Inside Custom live-hive-template-config add property live.hive.hiveserver2.template.revert - Set any value (1 or yes are perfectly fine, the value is not important, but must not be empty)

  3. Save the config and Restart the HS2 template. This will undo the template changes

  4. Restart any Hiveserver2 instances to allow them to revert to Embedded values. The HS2 template will now go into a stopped state on its own

The above steps are provided to ensure a complete and safe removal process. In most cases the steps may not be required as a replacement HS2 template is going to be required.
Remove stacks (Ambari)

These commands are correct for HDP 2.5.3.0 with Ambari 2.4.1.0., and HDP 2.6.0.3 with Ambari 2.5.0.3. If you are using a different version then they may differ slightly.

In the commands below you will need to replace the following:

  • login:password - your details to log in to the Ambari UI

  • AMBARI_SERVER_HOST - the host url of your Ambari server

  • CLUSTER_NAME - The name of the applicable cluster, e.g. HVLV-01 *

    1. Run the following curl command to show existing services.

      curl -v -u login:password -X GET
      http://AMBARI_SERVER_HOST:8080/api/v1/clusters/CLUSTER_NAME/services
    2. Stop the WD Hive Metastore.

      curl -u login:password -H "X-Requested-By: ambari" -X PUT -d
      '{"RequestInfo":{"context":"Stop Service"},"Body":{"ServiceInfo":{"state":"INSTALLED"}}}'
      http://AMBARI_SERVER_HOST:8080/api/v1/clusters/<cluster-name>/services/LIVE_HIVE_PROXY
    3. Stop the WD Hiveserver2 template.

      curl -u login:password -H "X-Requested-By: ambari" -X PUT -d
      '{"RequestInfo":{"context":"Stop Service"},"Body":{"ServiceInfo":{"state":"INSTALLED"}}}'
      http://AMBARI_SERVER_HOST:8080/api/v1/clusters/<cluster-name>/services/LIVE_HIVESERVER2_TEMPLATE
    4. Remove the Hiveserver2 Template.

      curl -u login:password -H "X-Requested-By: ambari" -X DELETE
      http://AMBARI_SERVER_HOST:8080/api/v1/clusters/<cluster-name>/services/LIVE_HIVESERVER2_TEMPLATE
    5. Now delete the Stacks from /var/lib/ambari-server/resources/stacks/HDP/x.y/services/ and restart Ambari Server.

There is also an entry in /opt/wandisco/fusion-ui-server/properties/ui.properties that needs removing to tell the UI the plugin is no longer installed.

Put together, the following commands (run perhaps in a quick bash script) will clear all live hive from a fusion-server node.

The following commands remove the /dcone/db directory which will re-induct the ecosystem and leave all nodes un-inducted with no replicated folders. Remove the "rm -rf /opt/wandisco/fusion/server/dcone/db" command if you do not wish to redeploy from scratch.
service fusion-ui-server stop
service fusion-server stop
rm -rf /opt/wandisco/fusion/server/dcone/db
yum remove -y live-hive-fusion-core-plugin.noarch
rm -rf /opt/wandisco/fusion-ui-server/ui-client-platform/downloads/core_plugins/live-hive/
rm -rf /opt/wandisco/fusion-ui-server/ui-client-platform/plugins/wd-live-hive-plugin/
rm -rf /opt/wandisco/fusion-ui-server/plugins/live-hive-ui-server-2.x.x/
sed -i '/plugin.installed.LiveHiveFusionPlugin=true/d' /opt/wandisco/fusion-ui-server/properties/ui.properties
service fusion-server start
service fusion-ui-server start
  1. Remove any replicated paths related to plugin (i.e. auto-generated paths for tables), and default/hive. You may need to use the REST API to complete this. See Remove a directory from replication.

  2. Check for tasks, wait 2 hours, check again. If the /tasks directory is now empty on ALL nodes, proceed with the following:

  3. Stop Fusion Plugin for Live Hive 2.0, e.g.

    # service fusion-ui-server stop
    # service fusion-server stop
  4. Remove installation components with the following commands,

    yum remove -y live-hive-fusion-core-plugin.noarch
    rm -rf /opt/wandisco/fusion-ui-server/ui-client-platform/downloads/core_plugins/live-hive/
    rm -rf /opt/wandisco/fusion-ui-server/ui-client-platform/plugins/wd-live-hive-plugin/
    rm -rf /opt/wandisco/fusion-ui-server/plugins/live-hive-ui-server-1.0.0-SNAPSHOT/
    sed -i '/LiveHive/d' /opt/wandisco/fusion-ui-server/properties/ui.properties
  5. Now restart, e.g.

    # service fusion-server start
    # service fusion-ui-server start

    The servers will come back, still inducted and with non-hive replication folders still in place.

4.7. Installation Troubleshooting

This section covers any additional settings or steps that you may need to take in order to successfully complete an installation.

If you encounter problems, make sure that you re-check the known issues and pre-requisites before raising a Support request.

4.7.1. Ensure hadoop.proxyuser.hive.hosts is properly configured

The following Hadoop property needs to be checked, when running with the Live Hive plugin. While the settings apply specifically to HDP/Ambari, it may also be necessary to check the property for Cloudera deployments.

Configuration placed in core-site.xml

<property>
  <name>hadoop.proxyuser.hive.hosts</name>
  <value>host1.domain.com,host-LIVE_HIVE_PROXY.organisation.com,host2.domain.com </value>
  <description>
     Hostname from where superuser hive can connect. This
     is required only when installing hive on the cluster.
  </description>
</property>
Proxyuser property
Name

Hive hostname from which the superuser "hive" can connect.

Value

Either a comma-separated list of your nodes or a wildcard. The hostnames should be included for Hiveserver2, Metastore hosts and LiveHive proxy.

Some cluster changes can modify this property
Systems changes to properties such as hadoop.proxyuser.hive.hosts, should be made with great care. If the configuration is not present, impersonation will not be allowed and connection will fail.

There are a number of changes that can be made to a cluster that might impact configuration, e.g.

  • adding a new Ambari component

  • adding an additional instance of a component

  • adding a new service using the Add Service wizard

These additions can result in unexpected configuration changes, based on installed components, available resources or configuration changes. Common changes might include (but are not limited to) changes to Heap setting or changes that impact security parameters, such as the proxyuser values.

Handling configuration changes

If any of the changes, listed in the previous section trigger a system change recommendation, there are two options:

  1. A checkbox (selected by default) allowing you to say Ambari should apply the recommendation. You can uncheck this (or use the bulk uncheck at the top) for these.

    Live Hive Plugin Architecture
    Figure 28. Stopping a system change from altering hadoop.proxyuser.hive.hosts
  2. Manually adjust the recommended value yourself, as you can specify additional properties that Ambari may not be aware of.

The Proxyuser property values should include hostnames for Hiveserver2, Metastore hosts and LiveHive proxy. Accepting recommendations that do not contain this (or the alternative all encompassing wildcard *), will more than likely result in service loss for Hive.

A Hive bug (HIVE-16708) causes the renew_delegation_token to fail. The issue will result in the following error message:
Exception in thread "main" MetaException(message:hdfs@WANDISCO.HADOOP tries to renew a token with renewer hdfs)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$renew_delegation_token_result$renew_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$renew_delegation_token_result$renew_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$renew_delegation_token_result.read(ThriftHiveMetastore.java)
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_renew_delegation_token(ThriftHiveMetastore.java:3841)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.renew_delegation_token(ThriftHiveMetastore.java:3828)

This issue is fixed in Hive 2.3 and 3.0.

5. Operation

This section covers the steps required for setting up and configuring Live Hive Plugin after installation has been completed.

5.1. Setting up Hive Metadata Replication

The configuration section covers the essential steps that you need to take to start replicating Hive metadata. This section covers those steps that are required for replicating Hive Metadata between zones.

Live Hive Plugin can only replicate transactional data, it isn’t intended to sync large blocks of existing data.

Live Hive Plugin requires that you create two kinds of rules in order to replicate Hive metadata.

HCFS Rule

Create a rule that matches the location of your underlying Hive data on the HDFS file-system. This rule handles the actual data replication, without it, a corresponding hive rule will not work. See Creating a HCFS rule.

Hive Rule

Create a rule that uses Hive’s pattern syntax to describe Hive databases and tables. This rule applies to any matching HCFS rule, contextualizing Hive metadata. See Create Hive rule.

5.1.1. Create a HCFS rule

Before you can replicate Hive metadata, you need to set up a replication rule for Hive’s resource location in the underlying file system.

  1. Go to the Fusion UI and click on Replication tab.

    Live Hive Plugin configuration
    Figure 29. Live Hive Plugin - Replication
  2. Select Type "HCFS". Use the HDFS File Tree to navigate and select a file system resource. For the purpose of Hive metadata replication, this resource should correspond with the location of the Hive data to be replicated.

    Live Hive Plugin configuration
    Figure 30. Live Hive Plugin - Replication
  3. Selected resources will appear on the Paths line. Click Add to use the selected resource.

    Live Hive Plugin configuration
    Figure 31. Live Hive Plugin - Replication
    Default database Locations
    • CDH: /user/hive/warehouse

    • HDP: /apps/hive/warehouse

  4. Click Create 1 rule.

    Live Hive Plugin configuration
    Figure 32. Live Hive Plugin - Replication

    The rule will now appear on the Replication Rules table.

Example 1. Advanced Options

Click on the Advanced Options tab to see further options for setting up the rule:

Live Hive Plugin configuration
Figure 33. Live Hive Plugin - Replication

For non-cloud based deployments the advanced options are:

Shared Encrypted KMS

In deployments where multiple zones share a command KMS server, then enable this parameter to specify a virtual prefix path.

Preserve Origin Block Size

The option to preserve the block size from the originating file system is required when Hadoop has been set up to use a columnar storage solution such as Apache Parquet. If you are using a columnar storage format in any of your applications then you should enable this option to ensure that each file sits within the same HDFS block.

Preserve Replication Factor

By default, data that is shared between clusters will follow the local cluster’s replication rules rather than preserve the replication rules of the originating cluster. When this option is enabled, the replication factor of the originating cluster is preserved.

For all deployments the advanced options are:

Schedule Consistency Checks

If you select this option you can to set a consistency check interval that is specific to the rule and overrides the default value set in the Consistency Check section of the Settings tab. The consistency check can be set hourly, weekly or daily.

Live Hive Plugin configuration
Figure 34. Live Hive Plugin - Scheduled Consistency Check
Exclude from replication

This lets you set an "exclude pattern" to indicate files and directories in your Replication Rules that you don’t want to be replicated. See Exclude from replication? for more information.

5.1.2. Create Hive rule

To replicate Hive metadata between zones, a Hive pattern-based rule must be created. This rule uses patterns from Hive’s own DDL (data definition language), which deals with schema(structure) and description, of how the data should reside in the Hive.

For more information about Hive Patterns, used in creating replication rules, see Hive LanguageManual
  1. Go to the Fusion UI and click on Replication tab.

    Live Hive Plugin configuration
    Figure 35. Live Hive Plugin - Replication
  2. Click on Create.

    Live Hive Plugin configuration
    Figure 36. Live Hive Plugin - Create
    No Hive rule option?
    After Live Hive Plugin activation, if you don’t see the option to create Hive-based replication rules, ensure that you have refresh your browser session.
  3. From the Type dropdown, select Hive.

    Live Hive Plugin installation
    Figure 37. Live Hive Plugin activation - Completion
  4. Enter the criteria for the Hive pattern that will be used to identify which metadata will be replicated between nodes.

    Live Hive Plugin configuration
    Figure 38. Live Hive Plugin activation - Completion

    Hive Pattern
    Replication rules can be created to match Hive tables based on the same simple syntax used by Hive for pattern matching, rather than more complex regular expressions. Wildcards in replication rules can only be * for any character(s) or | for a choice.

Examples: employees, emp*, emp*|*ees, all match a table named employees. The asterisk wildcard will select all words beginning with emp, and the pipe "|" will match all words that begin with emp or end with ees.
Database

Name of a Hive database.

Table name

Name of a table in the above database.

Description

A description that will help identify or distinguish the Hive-pattern replication rule.

Click Create to apply the rule.

  1. Once created, Hive data that matches the Hive pattern will automatically have a replication rule created.

    Live Hive Plugin installation
    Figure 39. Replication rules
File system location is currently fixed

In Live Hive 1.x/2.x the File system location is locked to the wildcard .*

This value ensures that the file system location is always found. In a future release the File system location will be opened up for the user to edit.

Description

A description that you provide during the setup of the rule.

Zones

A list of the zones that take part in the replication of this resource.

It’s not currently possible to change the zones of an existing rule.

5.1.3. Delete Hive Rule

You can delete unwanted Hive rules, through the Fusion web UI, using the following procedure.

  1. Navigate to the Fusion UI. Click on the Replication Tab.

  2. Click on the Rule that you want to delete.

    Live Hive Plugin installation
    Figure 40. Live Hive Plugin Delete rule
  3. Click on the Delete Rule button at the bottom of the panel.

    Live Hive Plugin installation
    Figure 41. Live Hive Plugin Delete rule
  4. A warning message "Are you sure you want to delete this rule? Metadata that matches this Hive Pattern rule will stop replicating after deletion. Click Confirm only if you are sure you wish to proceed.

    Live Hive Plugin installation
    Figure 42. Live Hive Plugin Delete rule
  5. The Replication screen will refresh. You can confirm that the deletion was successful if the rule no longer appears on the screen.

    Live Hive Plugin installation
    Figure 43. Live Hive Plugin Delete rule

5.1.4. Review Hive Rule

Review the status of existing Hive rules through the Replication tab of the Fusion UI.

  1. Click on the Hive Rule that you wish to review.

    Live Hive Plugin installation
    Figure 44. Live Hive Plugin Consistency Check
  2. The View screen of the selected Hive pattern will appear.

    This Hive Pattern rule will replicate all databases and tables matching the pattern below as long as their location is already replicated in a HCFS rule.
    Live Hive Plugin installation
    Figure 45. Live Hive Plugin Consistency Check
    Database name

    The name of the Hive database that is getting its Hive metadata replicated.

    Table name

    A table within the above named Hive database for which Hive metadata is replicated.

    Description

    A description of the rule that you provided during its creation to help identify what it does, later.

    Delete Rule

    This button is used to remove the rule. For more details, see Delete Hive Rule.

  3. Click on the Status tab.

    Live Hive Plugin installation
    Figure 46. Live Hive Plugin Consistency Check

    The status provides an up-to-date view of the status of the metadata being replicated. All databases and tables are listed on the screen, along with their latest consistency check results.

We will only replicate objects which match the pattern and already have their location replicated in a HCFS rule. When database is checked for consistency, its non-replicating tables are not considered.
Trigger check

This button triggers a rule-wide consistency check.

Database name

A name of a Hive databases that will be matched the databases that exist in your Hive deployment.

Table name

The name of a table that stored in the above database that you intend to replicate.

File system location

The location of the data in the local file system.

File system location is currently fixed

In Live Hive 1.x/2.x the File system location is locked to the wildcard .*

This value ensures that the file system location is always found. In a future release the File system location will be opened up for the user to edit.

Description

A description that you provide during the setup of the regex rule.

Zones

A list of the zones that take part in the replication of this resource.

It’s not currently possible to change the zones of an existing rule.

5.1.5. Running a consistency Checks

Live Hive Plugin installation
Figure 47. Live Hive Plugin Consistency Check

Live Hive Plugin provides a tool for checking that replica metadata is consistent between zones. Consistency is checked on the tab.

When to complete a consistency check?
  • After adding new metadata data into replicationGroup

  • Periodically, as part of your platform monitoring

  • As part of a system repair/troubleshooting.

Limitation: Consistency checks must be triggered from the Writer node.
The Writer for this zone is noted at the top of the view/Edit tab of each Replication Rule.
This limitation is removed in Live Hive Plugin 2.1.

To complete a check:

  1. Click on the Replication tab.

  2. Click on the status of the applicable Replication Rule. In this example, an unchecked Hive rule.

    Live Hive Plugin installation
    Figure 48. Live Hive Plugin Consistency Check 2
  3. The Consistency Check tab will appear. You can select to Trigger check or trigger blocking check or Reload results if you’ve already completed an earlier check.

    Live Hive Plugin installation
    Figure 49. Live Hive Plugin Consistency Check 1
  4. A completed check will show a list of any objects that have inconsistent states between zones. You can review and compare the differences and then choose whether to proceed with a Repair.

    Live Hive Plugin installation
    Figure 50. Live Hive Plugin Consistency Check 1

5.1.6. Running a repair

In the event that metadata is found to be inconsistent, you can use the repair function to fix the issue.

  1. Identify the nature of the inconsistency from the Detailed View panel. Select the zone that contains the correct version of the metadata, then select what actions you want the repair tool to take.

    Live Hive Plugin installation
    Figure 51. Live Hive Plugin Repair 1
    Recursive

    If checkbox is ticked, this option will cause the selected context i.e database/table/index/partition and all the child objects under it to be made consistent. The default is true, but is ignored if the selected context represents an index or partition.

    Add Missing

    Tick to create any database/table/index/partitions that are missing from a zone depending on the context selected.

    Remove Extra

    Database/Tables/Index/Partitions that exist in the target zone will be removed if they do not exist in the zone selected as your source of truth. Parents of the selected context will never be removed. e.g if a table is selected then it’s parent database will not be removed even if it is missing from the source of truth.

    Update Different

    Database/Tables/Index/Partitions that exist on both the source and target zones will be overwritten to match the source zone. Database/Tables/Index/Partitions that already exist on the target zone will not be modified if this option is left unchecked.

Now click Repair.

  1. You will get a report "Repair successfully triggered". Click on Close.

    Live Hive Plugin installation
    Figure 52. Live Hive Plugin Repair 2
  2. To check if the repair has been successful, re-run the Consistency Check and review the status.

    Live Hive Plugin installation
    Figure 53. Live Hive Plugin Repair 3

5.2. Security

This section explains how you can secure your Live Hive Plugin deployment.

5.2.1. Kerberos

See the WANdisco Fusion User Guide for information about Setting up Kerberos

Known issue (LHV-414)
There’s currently an issue with HDP distributions and the use of an external Hive metastore within Hiveserver2, which causes the beeline connection closure to take 2 minutes.

Workaround
Add the following configuration property to the Live Hive Proxy Config in Ambari or Cloudera Mananger.

live.hive.cluster.delegation.token.delayed.removal.interval.in.seconds=5

Live Hive Plugin

The setting must be more than zero to enable the delay. We recommend a value of 5 seconds.

Secure Impersonation

Normally the Hive user has superuser permissions on the hiveserver2 and hive metastore nodes. If you are installing into different nodes, corresponding proxyuser parameters should also be updated in core-site.xml and kms-site.xml

Set up a proxy user on the NameNode, adding the following properties to core-site.xml on the applicable NameNode(s).

<property>
        <name>hadoop.proxyuser.$USERNAME.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.$USERNAME.groups</name>
        <value>*</value>
</property>
hadoop.proxyuser.$USERNAME.hosts

Defines hosts from which client can be impersonated. $USERNAME, the superuser who wants to act as a proxy to the other users, is usually set as system user “hdfs”. These values are captured by the installer, which can apply these values automatically.

hadoop.proxyuser.$USERNAME.groups

A list of groups whose users the superuser is allowed to act as proxy. Including a wildcard (*), which will mean that proxies of any users are allowed. For example, for the superuser to act as proxy to another user, the proxy actions must be completed on one of the hosts that are listed, and the user must be included in the list of groups. Note that this can be a comma separated list or the noted wildcard (*).

5.3. Troubleshooting

The following tips should help you to understand any issues you might experience with Live Hive Plugin operation:

5.3.1. Check the Release notes

Make sure that you check the latest release notes, which may include references to known issues that could impact Live Hive Plugin.

5.3.2. Check log files

Observe information in the log files, generated for the WANdisco Fusion server and the Fusion Plugin for Live Hive 2.0 to troubleshoot issues at runtime. Exceptions or log entries with a SEVERE label may represent information that can assist in determining the cause of any problem.

As a distributed system, Fusion Plugin for Live Hive 2.0 will be impacted by the operation of the underlying Hive database with which it communicates. You may also find it useful to review log or other information from these endpoints.

Table 1. Log Locations

Log Type

Default Location

Metastore

/var/log/hive

Live Hive Node

/var/log/live-hive-proxy

Hive Server

/var/log/hive

Change the timezone

You can ensure that logging between zones is consistent by making sure that logging is manually updated. By default, Logs use UTC timezone but this can be manually altered through log4j configuration.

To alter the timezone the xxx.layout.ConversionPattern property needs to be overwritten.

log4j.appender.stdout.layout.ConversionPattern=%d{ISO8601}{UTC} %p %c - %t:[%m]%n
{UTC} can be replaced with, for example {GMT} or {ITC+1:30}. If offsetting from a timezone, + or - can be used, hours must be between 0 and 23, and minutes must be between 00 and 59.

This property is located in several properties files. For an example set up these are listed below, but the exact paths may differ for your set up:

  • /var/log/live-hive-proxy

  • /etc/wandisco/fusion/server/log4j.properties

  • /etc/wandisco/fusion/ihc/server/hdp-2.6.0/log4j.properties

  • /opt/wandisco/fusion-ui-server/lib/fusion_ui_log4j.xml

After updating all the relevant files, Live Hive Plugin needs to be restarted for the changes to take effect.

Manager configuration

Cloudera

  1. Login to Cloudera Manager.

  2. In Live Hive Metastore Proxy Logging Advanced Configuration Snippet (Safety Valve) add where the timezone can be specified, i.e. GM+1, in the following example:

log4j.appender.RFA.layout=org.apache.log4j.EnhancedPatternLayout
log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601}{GMT+1} %p %c - [%t]: %m%n

5.3.3. Connection issues

Any metastore connection issues will show in the logs, usually caused by an issue with SASL negotiation / delegation tokens. Start your investigation "from the outside, going inwards", i.e. first metastore, then proxy, then Hive server. It’s always worth trying a restart of the proxy/Fusion server before looking elsewhere.

5.3.4. Plugin initialization failure

Various errors that might occur have been shown to be caused by the Live Hive Plugin not starting properly.

This has the effect of blanking configuration and where there is no configuration available then fusion-server’s request for the property will fail. If there is any process removing old configs the this can cause a race condition as the proxy needs to come up before the fusion-server, or at least before plugin initialisation, to ensure that config is discoverable.

The plugin status may appear as "unknown" on the plugins screen, under Settings.

Solution

Under such circumstances, the workaround would be:

  • Confirm that configuration (i.e. /etc/wandisco/fusion/plugins/hive/live-hive-site2.xml) is not broken.

  • Ensure that the proxy service is started.

  • Restart fusion-server.


6. Reference Guide

6.1. API

Fusion Plugin for Live Hive 2.0 offers increased control and flexibility through a RESTful (REpresentational State Transfer) API.

Below are listed some example calls that you can use to guide the construction of your own scripts and API driven interactions.

API documentation is still in development:
Note that this API documentation continues to be developed. Contact our support team if you have any questions about the available implementation.

Note the following:

  • All calls use the base URI:

    http(s)://<server-host>:8082/plugin/hive/
  • The internet media type of the data supported by the web service is application/xml.

  • The API is hypertext driven, using the following HTTP methods:

Type Action

POST

Create a resource on the server

GET

Retrieve a resource from the server

PUT

Modify the state of a resource

DELETE

Remove a resource

6.1.1. Unsupported operations

As part of Fusion’s replication system, we capture and replicate some "write" operations to an underlying DistributedFileSystem/FileSystem API. However, the truncate command is not currently supported. Do not run this command as your Hive metadata will become inconsistent between clusters.

6.1.2. Example WADL output

You can query the API using the following string:

http://example-livehive-node.domain.com:8082/fusion/application.wadl
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<application xmlns="http://wadl.dev.java.net/2009/02">
    <doc xmlns:jersey="http://jersey.java.net/" jersey:generatedBy="Jersey: 2.25.1 2017-01-19 16:23:50"/>
    <doc xmlns:jersey="http://jersey.java.net/" jersey:hint="This is simplified WADL with user and core resources only. To get full WADL with extended resources use the query parameter detail. Link: http://livehive-host.com:8082/plugin/hive/application.wadl?detail=true"/>;
    <grammars>
        <include href="application.wadl/xsd0.xsd">
            <doc title="Generated" xml:lang="en"/>
        </include>
    </grammars>
    <resources base="http://livehive-host.com:8082/plugin/hive/">
        <resource path="/">
            <resource path="consistencyCheck">
                <method id="startConsistencyCheck" name="POST">
                    <request>
                        <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="dbName" style="query" type="xs:string"/>
                        <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="tableName" style="query" type="xs:string" default=""/>
                        <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="simpleCheck" style="query" type="xs:boolean" default="true"/>
                    </request>
                    <response>
                        <representation mediaType="/"/>
                    </response>
                </method>
                <resource path="{taskIdentity}">
                    <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="taskIdentity" style="template" type="xs:string"/>
                    <method id="getConsistencyCheckReport" name="GET">
                        <request>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="withDiffs" style="query" type="xs:boolean" default="false"/>
                        </request>
                        <response>
                            <representation mediaType="application/xml"/>
                            <representation mediaType="application/json"/>
                        </response>
                    </method>
                </resource>
                <resource path="{taskIdentity}/diffs">
                    <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="taskIdentity" style="template" type="xs:string"/>
                    <method id="getConsistencyReportPage" name="GET">
                        <request>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="pageSize" style="query" type="xs:int" default="2147483647"/>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="page" style="query" type="xs:int" default="0"/>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="dbName" style="query" type="xs:string"/>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="tableName" style="query" type="xs:string"/>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="partitions" style="query" type="xs:boolean" default="false"/>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="indexes" style="query" type="xs:boolean" default="false"/>
                        </request>
                        <response>
                            <representation mediaType="application/xml"/>
                            <representation mediaType="application/json"/>
                        </response>
                    </method>
                </resource>
                <resource path="{ruleIdentity}">
                    <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="ruleIdentity" style="template" type="xs:string"/>
                    <method id="startConsistencyCheckForRule" name="POST">
                        <request>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="dbName" style="query" type="xs:string"/>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="tableName" style="query" type="xs:string"/>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="simpleCheck" style="query" type="xs:boolean" default="true"/>
                        </request>
                        <response>
                            <representation mediaType="/"/>
                        </response>
                    </method>
                </resource>
            </resource>
            <resource path="repair">
                <resource path="{ruleIdentity}">
                    <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="ruleIdentity" style="template" type="xs:string"/>
                    <method id="repair" name="PUT">
                        <request>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="truthZone" style="query" type="xs:string"/>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="dbName" style="query" type="xs:string"/>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="tableName" style="query" type="xs:string"/>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="partName" style="query" type="xs:string"/>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="indexName" style="query" type="xs:string"/>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="recursive" style="query" type="xs:boolean" default="true"/>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="addMissing" style="query" type="xs:boolean" default="true"/>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="removeExtra" style="query" type="xs:boolean" default="true"/>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="updateDifferent" style="query" type="xs:boolean" default="true"/>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="simpleCheck" style="query" type="xs:boolean" default="true"/>
                        </request>
                        <response>
                            <representation mediaType="/"/>
                        </response>
                    </method>
                </resource>
            </resource>
            <resource path="hiveRegex">
                <method id="getReplicationRules" name="GET">
                    <response>
                        <representation mediaType="application/xml"/>
                        <representation mediaType="application/json"/>
                    </response>
                </method>
                <resource path="{id}">
                    <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="id" style="template" type="xs:string"/>
                    <method id="getReplicationRuleById" name="GET">
                        <response>
                            <representation mediaType="application/xml"/>
                            <representation mediaType="application/json"/>
                        </response>
                    </method>
                    <method id="deleteReplicationRuleById" name="DELETE">
                        <response>
                            <representation mediaType="application/xml"/>
                            <representation mediaType="application/json"/>
                        </response>
                    </method>
                </resource>
                <resource path="addHiveRule">
                    <method id="addReplicationRule" name="PUT">
                        <request>
                            <ns2:representation xmlns:ns2="http://wadl.dev.java.net/2009/02" xmlns="" element="hiveRule" mediaType="application/xml"/>
                            <ns2:representation xmlns:ns2="http://wadl.dev.java.net/2009/02" xmlns="" element="hiveRule" mediaType="application/json"/>
                        </request>
                        <response>
                            <representation mediaType="/"/>
                        </response>
                    </method>
                </resource>
                <resource path="active">
                    <method id="getActiveRules" name="GET">
                        <response>
                            <representation mediaType="application/xml"/>
                            <representation mediaType="application/json"/>
                        </response>
                    </method>
                </resource>
            </resource>
            <resource path="hiveReplicatedDirectories">
                <method id="getReplicationRules" name="GET">
                    <response>
                        <representation mediaType="application/xml"/>
                        <representation mediaType="application/json"/>
                    </response>
                </method>
                <resource path="{regexRuleId}">
                    <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="regexRuleId" style="template" type="xs:string"/>
                    <method id="getReplicationRuleById" name="GET">
                        <response>
                            <representation mediaType="application/xml"/>
                            <representation mediaType="application/json"/>
                        </response>
                    </method>
                </resource>
                <resource path="path">
                    <method id="getReplicationRuleByPath" name="GET">
                        <request>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="path" style="query" type="xs:string"/>
                        </request>
                        <response>
                            <representation mediaType="application/xml"/>
                            <representation mediaType="application/json"/>
                        </response>
                    </method>
                </resource>
            </resource>
            <resource path="hiveMetastore">
                <resource path="{ruleId}/{databaseName}/tables/{tableName}">
                    <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="databaseName" style="template" type="xs:string"/>
                    <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="ruleId" style="template" type="xs:string"/>
                    <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="tableName" style="template" type="xs:string"/>
                    <method id="getTable" name="GET">
                        <response>
                            <representation mediaType="application/xml"/>
                            <representation mediaType="application/json"/>
                        </response>
                    </method>
                </resource>
                <resource path="{ruleId}/databases">
                    <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="ruleId" style="template" type="xs:string"/>
                    <method id="getDatabases" name="GET">
                        <request>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="databaseFilter" style="query" type="xs:string"/>
                        </request>
                        <response>
                            <representation mediaType="application/xml"/>
                            <representation mediaType="application/json"/>
                        </response>
                    </method>
                </resource>
                <resource path="{ruleId}/databases/{databaseName}">
                    <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="databaseName" style="template" type="xs:string"/>
                    <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="ruleId" style="template" type="xs:string"/>
                    <method id="getDatabase" name="GET">
                        <response>
                            <representation mediaType="application/xml"/>
                            <representation mediaType="application/json"/>
                        </response>
                    </method>
                </resource>
                <resource path="{ruleId}/{databaseName}/tables">
                    <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="databaseName" style="template" type="xs:string"/>
                    <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="ruleId" style="template" type="xs:string"/>
                    <method id="getTables" name="GET">
                        <request>
                            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="tableFilter" style="query" type="xs:string"/>
                        </request>
                        <response>
                            <representation mediaType="application/xml"/>
                            <representation mediaType="application/json"/>
                        </response>
                    </method>
                </resource>
                <resource path="{ruleId}/summary">
                    <param xmlns:xs="http://www.w3.org/2001/XMLSchema" name="ruleId" style="template" type="xs:string"/>
                    <method id="getRuleSummary" name="GET">
                        <response>
                            <representation mediaType="application/xml"/>
                            <representation mediaType="application/json"/>
                        </response>
                    </method>
                </resource>
            </resource>
        </resource>
    </resources>
</application>

6.1.3. Example REST calls

The following examples illustrate some simple use cases, most are direct calls through a web browser, although for deeper or interactive examples, a curl client may be used.

Optional query params
?dbName= &tableName= &path=
GET /hiveRegex/{hiveRegexRuleId}  > returns a HiveReplicationRuleDTO >
@XmlRootElement(name = "hiveRule")
`@XmlType(propOrder =`
Permitted value
  • private String dbNamePattern = "";

  • private String tableNamePattern = "";

  • private String tableLocationPattern = "";

  • private String membershipIdentity = "";

  • private String ruleIdentity;

  • private String description = "";

List Hive Replicateion Rule DTO
GET /hiveRegex/ > HiveReplicationRulesListDTO
  • (all known rules) list of HiveReplicationRuleDTO (see below for format)

PUT /hiveRegex/addHiveRule/ PAYLOAD HiveReplicationRuleDTO >
{dbNamePattern:'mbdb.*', tableNamePattern:'tabl.*', tableLocationPattern:'.*', membershipIdentity:'824ce758-641c-46d6-9c7d-d2257496734d', ruleIdentity:'6a61c98b-eaea-4275-bf81-0f82b4adaaef', description:'mydbrule'}
GET /hiveRegex/active/ >
  • Returns HiveReplicationRulesListDTO of all ACTIVE hiveRegex rules

GET /hiveReplicatedDirectories > HiveReplicatedDirectoresDTOList<HiveReplicatedDirectoryDTO> :
  • Will get all HCFS replicated dirs that were created via a hive pattern rule automatically upon table creation Returns JSON in format:

{"hiveReplicatedDirectoryDTOList":[{"rd":"ReplicatedDirectoryDTO","propertiesDTO":{"properties":"Map<String, String>"},"consistencyTaskId":"str","consistencyStatus":"str","lastConsistencyCheck":0,"consistencyPending":true}]}
GET /hiveReplicatedDirectories/{regexRuleId}  >
  • Returns same as above but returns only directories created via a given regex Rule Id as a path parameter.

GET /hiveReplicatedDirectories/path?path=/some/location >
  • Returns same as above again but this time with query param of the path of the HCFS location.

Consistency Checks

Perform a consistency check on the database specified. The response will contain the location of the Platform task that is coordinating the consistency check. This task will exist on all nodes in the membership and at completion each task will be in the same state. The {@code taskIdentity} can be used to view the consistency check report using the {@code /hive/consistencyCheck/{taskIdentity} API.

Start a Consistency Check on a particular Hive Database and Table:
 POST /consistencyCheck?dbName=test_db&tableName=test_table1&simpleCheck=true
  • Both tableName and simpleCheck are optional query parameters and if omitted will default to tableName="" and simpleCheck=true

Get the Consistency Check report for a Consistency Check task previously requested by the API above
GET /consistencyCheck/{taskid}?withDiffs=true
  • The withDiffs query parameter is optional and defaults to false if not suppled.

Get part of the consistency check report depending on the query parameters set
 GET /consistencyCheck/{taskId}/diffs?pageSize=10&page=0&dbName=test_db&tableName=test_table&partitions=true&indexes=true
  • Returns part of the diff from the consistency check. The hierarchy is: dbName / tableName / one of [partitions=true or indexes=true].

    dbname

    Name of the database to check.

    tableName

    Name of the database table to check, the default of " " will check all tables. If specified then either partitions or indexes must be specified.

    pageSize

    Optional. Will default to pageSize = 2,147,483,647

    page

    Optional. Will default to page=0.

Repair
Start to repair the specified database and table.
 PUT /repair?truthZone=zone1&dbName=test_db&tableName=test_table&partName=testPart&indexName=test_index]&recursive=true&addMissing=true&removeExtra=true&updateDifferent=true&simpleCheck=true
truthZone (required)

Zone which is the source of truth.

dbName (required)

Database to repair in. Note, this database has to exist on zone where this API call is done.

tableName (optional)
partName (optional)
indexName (optional)
recursive (optional)

Defaults to false.

addMissing (optional)

Defaults to true. If true, the objects, which are missing will be created.

removeExtra (optional)

Defaults to true. If true, the objects, which don’t exist in truthZone will be removed.

updateDifferent (optional)

Defaults to true. If true, the objects which are different will be fixed.

simpleCheck (optional)

Defaults to true. If true then the repair operation will only involve a simple check and not include any extended parameters of the objects being repaired.