logo

WANDISCO FUSION®
PLUGIN FOR LIVE SENTRY

1. Welcome

1.1. Product overview

Use the Fusion Plugin for Live Sentry to extend the WANdisco Fusion server with the ability to replicate policies among Apache Sentry Policy Provider instances. Coordinate activities that modify Sentry policy definitions among multiple instances of the Sentry Policy Provider across separate clusters to maintain common policy enforcement in each cluster. The Fusion Plugin for Live Sentry uses WANdisco Fusion for coordination and replication.

1.2. Documentation guide

This guide contains the following:

Welcome

This chapter introduces this user guide and provides help with how to use it.

Release Notes

Details the latest software release, covering new features, fixes and known issues to be aware of.

Concepts

Explains how Fusion Plugin for Live Sentry through WANdisco Fusion uses WANdisco’s LiveData platform.

Installation

Covers the steps required to install and set up Fusion Plugin for Live Sentry into a WANdisco Fusion deployment.

Operation

The steps required to run, reconfigure and troubleshoot Fusion Plugin for Live Sentry.

1.2.1. Symbols in the documentation

In the guide we highlight types of information using the following call outs:

The alert symbol highlights important information.
The STOP symbol cautions you against doing something.
Tips are principles or practices that you’ll benefit from knowing or using.
The i symbol shows where you can find more information, such as in our online Knowledgebase.

1.3. Contact support

See our online Knowledgebase which contains updates and more information.

If you need more help raise a case on our support website.

1.4. Give feedback

If you find an error or if you think some information needs improving, raise a case on our support website or email docs@wandisco.com.

2. Release Notes

2.1. Live Sentry 5.0.0 Build 317

20 December 2019

For the release notes and information on known issues, please visit the Knowledge base - Fusion Plugin for Live Sentry Release 5.0.0.

3. Concepts

3.1. Product Concepts

Familiarity with the following concepts will improve your use of the Fusion Plugin for Live Sentry.

WANdisco Fusion Plugin

A plugin is used by WANdisco Fusion to extend its functionality. Plugins are loaded by the WANdisco Fusion server on startup.

Apache Sentry

Sentry is a system for defining and enforcing fine-grained authorization against Hadoop resources. Use Sentry to control and enforce privileges on data for authenticated users and applications in a Hadoop cluster. It supports different data models with a modular architecture.

Sentry Server

The Sentry Server manages authorization metadata. It offers a Thrift interface to allow clients to retrieve and manipulate that metadata.

Sentry Authorization

Sentry limits user access to specific resources. Sentry policies are enforced by Sentry Plugins that are specific to the system for which a policy is enforced. Plugins obtain metadata from the Sentry Server to make authorization decisions.

Sentry Role

A set of privileges that combine multiple access rules.

Sentry Privilege

A rule that allows access to an object.

3.2. Product Architecture

WANdisco Fusion provides a LiveData architecture, where data are stored and used in multiple locations, while data are replicated with guaranteed consistency across them all.

The Fusion Plugin for Live Sentry extends that LiveData architecture to metadata managed by Apache Sentry to allow policy changes made in any location to apply consistently across all.

The Fusion Plugin for Live Sentry is a distributed network proxy for the Thrift interface exposed by the Apache Sentry Server. It coordinates and replicates changes made via that interface to ensure that regardless of where or when changes to Sentry policies occur, they result in the same set of policies across multiple environments.

Fusion Plugin for Live Sentry Architecture
Figure 1. Fusion Plugin for Live Sentry Architecture

By implementing this coordination and replication via a proxy to the Sentry server, the Fusion Plugin for Live Sentry provides this capability without any change to the underlying Sentry services. Sentry provides a simple, standard means of directing clients to interact with the Sentry server via the proxy, and the proxy is configured to use the existing Sentry server.

3.3. Deployment Models

3.3.1. Use Cases for the Fusion Plugin for Live Sentry

Replicate policy definitions between multiple Apache Sentry instances in different clusters using the Fusion Plugin for Live Sentry. Change Sentry policies in any cluster to enforce access to cluster resources with the same authorization rights in each environment.

4. Installation

4.1. Pre-requisites

4.1.1. System Requirements

Along with the standard product requirements that you can find on the WANdisco Fusion Deployment Checklist, you also need to ensure that your clusters:

  • Use Cloudera - see the release notes for your Fusion Plugin for Live Sentry version for details of which CDH versions are supported (Note that builds for alternative CDH versions can be made available).

  • Have configured CDH to use Kerberos or LDAP for user authentication.

    The installation steps defined here are for a Kerberized environment. Please contact WANdisco support for information on installation to a cluster that uses LDAP for user authentication.
  • Use Apache Sentry for policy enforcement.

  • Before starting the installation of the Fusion Plugin for Live Sentry, ensure your Fusion servers are inducted between zones.

4.1.2. Sentry Configuration Requirements

If you using CDH 5.13 or higher, then the properties mentioned in this section will be automatically updated during installation when using the Cloudera Manager.

Sentry does not support impersonation/delegation tokens for Thrift authorization so WANdisco Fusion and Live Sentry must be allowed to authorize directly with the Sentry service. This allows WANdisco Fusion to carry out requests from the underlying user.

Ensure these configuration properties for Sentry in the sentry-site.xml file are equivalent in replicated zones:

sentry.service.allow.connect

A comma-separated list of identities that are allowed to connect to the Sentry service.
Example: hive,impala,hue,hdfs,solr,sentry,live_sentry,fusionuser

The live_sentry user is created when installing the Live Sentry Proxy through Cloudera Manager.

sentry.service.admin.group

A comma-separated list of identities that have administrative privileges for the Sentry service.
Example: hive,impala,hue,hdfs,solr,sentry,live_sentry,fusionuser.

The live_sentry user is created when installing the Live Sentry Proxy through Cloudera Manager.

Both of these properties must include the user identities assigned to the Live Sentry Proxy and WANdisco Fusion (fusionuser in this example).

4.2. Installation

The installation of Fusion Plugin for Live Sentry is a 3 step procedure:

4.2.1. Parcel installation

Rename parcel if using RHEL 7/CentOS 7
By default the parcels are el6. If using RHEL 7/CentOS 7, rename the parcels to el7. This will prevent Cloudera throwing an error around the expected parcel name.
  1. Open a terminal session to the location of your parcels repository, it may be your Cloudera Manager server, although the location may have been customized. Ensure that you have suitable permissions for handling files.

  2. Download the relevant installer from customer.wandisco.com.

  3. Make the downloaded installer file executable, e.g.

    chmod +x live-sentry-installer.<version>.sh
  4. Run the installer using an account with appropriate permissions to extract the parcel:

    ./live-sentry-installer.<version>.sh extract-parcel
    If you have not extracted this on the Cloudera Manager (CM) node, you should transfer the tarball to that server.
    
    Perform the following steps on the CM node:
     * unpack the parcel tarball package
     * move the parcel and parcel.sha files to the local repository path for CM
     * use the CM UI to discover the location of the Custom Service Descriptors and copy the extracted CSD to this location
     * restart the cloudera-scm-server and cloudera-scm-agent services
     * using the CM parcel interface, ‘Check for New Parcels’ and then distribute and activate the LIVE_SENTRY service
    
    If the parcel installation was successful, and you have not already done so, you should install the final Fusion UI plugin components on the fusion node(s).
    
    For further guidance and clarifications, go to https://docs.wandisco.com/
  5. Unpack the parcel tarball package, for example:

    tar xvzf LIVE_SENTRY-cdh-<version>_<version>.parcel.tar.gz
  6. Change the ownership of the .parcel and .parcel.sha files so that they match the system account that runs Cloudera Manager:

    chown cloudera-scm:cloudera-scm LIVE_SENTRY
  7. Move the files into the server’s local repository, normally /opt/cloudera/parcel-repo.

  8. On your Cloudera Manager UI, navigate to SettingsCustom Service Descriptors. Find the Local Repository Descriptor Path.

  9. Copy the extracted Custom Service Descriptor file (LIVE_SENTRY-cdhxxx.jar) to the Local Descriptor Repository location.

  10. Restart the cloudera-scm-server and cloudera-scm-agent services.

  11. Open Cloudera Manager and navigate to the Parcels screen by clicking on the Parcel icon.

  12. Click Check for New Parcels.

    Fusion Plugin for Live Sentry
    Figure 2. Check for parcels
  13. The LIVE_SENTRY package is now ready to distribute. Click on the Distribute button to install LIVE_SENTRY from the parcel.

    Fusion Plugin for Live Sentry
    Figure 3. Distribute the parcel
  14. Click on the Activate button to activate LIVE_SENTRY from the parcel.

    Fusion Plugin for Live Sentry
    Figure 4. Activate the parcel

4.2.2. Service installation

Now install Live Sentry as a service

  1. Click Add service

    Fusion Plugin for Live Sentry
    Figure 5. Add Service
  2. Choose LIVE SENTRY and click continue.

    Fusion Plugin for Live Sentry
    Figure 6. Choose Live Sentry
  3. Choose hosts to install the service, at least one host is required. The host for the gateway can also be selected. If the gateway is selected, it should be on the same node where the plugin will be installed. If the same host for the gateway is not selected then proxy-plugin-site.xml has to be generated manually. proxy-plugin-site.xml will be generated in /etc/live-sentry/conf in the gateway host.
    Note: The gateway host should be the same as the Fusion node.

    Fusion Plugin for Live Sentry
    Figure 7. Assign Roles
  4. Configure the service parameters.

  5. You can now see Live Sentry on the Cloudera Manager homepage.

    Fusion Plugin for Live Sentry
    Figure 8. View service on homepage

4.2.3. CLI Installation

  1. Open a terminal session on your WANdisco Fusion node.

  2. Download the installer as above and run the installer using an account with appropriate permissions:

    ./live-sentry-installer.<version>.sh

    The installer will now start.

    Verifying archive integrity... All good.
    Uncompressing WANdisco Live Sentry.......................................
    
        ::   ::  ::     #     #   ##    ####  ######   #   #####   #####   #####
       :::: :::: :::    #     #  #  #  ##  ## #     #  #  #     # #     # #     #
      ::::::::::: :::   #  #  # #    # #    # #     #  #  #       #       #     #
     ::::::::::::: :::  # # # # #    # #    # #     #  #   #####  #       #     #
      ::::::::::: :::   # # # # #    # #    # #     #  #        # #       #     #
       :::: :::: :::    ##   ##  #  ## #    # #     #  #  #     # #     # #     #
        ::   ::  ::     #     #   ## # #    # ######   #   #####   #####   #####
    
    You are about to install WANdisco Live Sentry version 5.0.0.0
    
    Do you want to continue with the installation? (Y/n) Y

    The installer will perform an integrity check and confirm the product version that will be installed. Enter Y to continue the installation.

    Full installation of this plugin currently requires that the appropriate
    'parcel' files are installed on your Cloudera Manager node.
    
    This installer package includes all the currently supported parcels for this.
    
    If you have not already done so, you should run this installer with the
    'extract-parcel' sub-command and follow the instructions it gives. You may
    wish to do this on the Cloudera Manager server itself.
  3. Now navigate to the Cloudera Manager UI and check that the properties listed in Sentry Configuration Requirements and any relevant auto-configured services have been configured. These include sentry.service.allow.connect and sentry.service.admin.group. If yes, then these services needs to be restarted.
    If these properties have not been configured then you will need to do this manually, see the relevant configuration sections for details.

  4. Once Fusion Plugin for Live Sentry installation is complete, restart the WANdisco Fusion server.

4.3. Uninstallation

If you wish to uninstall Live Sentry, please contact WANdisco support. The uninstall procedure currently requires manual editing and should not be done without calling WANdisco’s support team for assistance.

5. Operation

Once Fusion Plugin for Live Sentry installed, restart the WANdisco Fusion server:

service fusion-server restart

You then need to configure your cluster to access the Sentry server via the WANdisco Sentry Proxy. The instructions below are specific to each type of cluster service that can use Sentry for authorization. Your environment may have one or more of these services in use. Apply the instructions below selectively based on the services operating in your clusters.

5.1. Configuration

HDFS and Hue need to be manually configured. Hive, Impala and Solr are auto-configured during installation but instructions are given below on how to manually configure if required.

5.1.1. Services requiring manual configuration

Configure HDFS
  1. Open the Cloudera Manager Administration Console and access the HDFS service configuration tab.

  2. Select Scope ▸ HDFS (Service-Wide).

  3. Locate the Enable Sentry Synchronization property.

  4. Enable Sentry synchronization.

  5. Save these changes.

  6. Restart affected services.

Configure Hue
  1. Open the Cloudera Manager Administration Console and access the Hue service configuration tab.

  2. Select Scope ▸ Hue (Service-Wide).

  3. Locate the Sentry Service property and ensure that "sentry" is enabled.

  4. Locate the Hue Service Advanced Configuration Snippet (Safety Valve) for the hue_safety_value.ini property file and add the properties:

    [libsentry]
      hostname={wd.sentry.proxy.thrift.host}
      port={wd.sentry.proxy.thrift.port}
  5. Location the Hue Service Advanced Configuration Snippet (Safety Valve) for the sentry-site.xml property file and add the properties:

    1. sentry.service.client.server.rpc-address → The WANdisco Sentry Proxy host

    2. sentry.service.client.server.rpc-port → The WANdisco Sentry Proxy port

    3. sentry.service.server.principal → The WANdisco Sentry Proxy principal
      Note: Sentry Proxy and Sentry need to have the same principal.

  6. Save these changes.

  7. Restart affected services.

5.1.2. Auto-configured services

Hive, Impala and Solr are auto-configured during installation. If the automatic configuration script fails then they will need to be manually configured.

Configure Hive
  1. Open the Cloudera Manager Administration Console and access the Hive service configuration tab.

  2. Select Scope ▸ Hive (Service-Wide).

  3. Locate the Sentry Service and ensure that sentry is enabled.

  4. Locate the Hive Advanced Configuration Snippet (Safety Valve) for the sentry-site.xml property file and add the properties:

    1. sentry.service.client.server.rpc-address → The WANdisco Sentry Proxy host

    2. sentry.service.client.server.rpc-port → The WANdisco Sentry Proxy port

      If using CDH 5.13.x or later, the sentry.service.client.server.rpc-address and sentry.service.client.server.rpc-port settings are replaced with a single sentry.service.client.server.rpc-addresses entry with a value in the form <proxy host>:<proxy thrift port>.
    3. sentry.service.server.principal → The WANdisco Sentry Proxy principal

  5. Locate the Server Name for Sentry Authorization for the hive.sentry.server property.

  6. Add the same name in all Fusion-enabled zones for this property (i.e. sentry)

  7. Save these changes.

  8. Restart affected services.

The hive.sentry.server property must have the same value for all Fusion-enabled zones.
Configure Impala
  1. Open the Cloudera Manager Administration Console and access the Impala service configuration tab.

  2. Select Scope ▸ Impala (Service-Wide).

  3. Locate the Sentry Service property and ensure that "sentry" is enabled.

  4. Locate the Impala Service Advanced Configuration Snippet (Safety Valve) for the sentry-site.xml property file and add the properties:

    1. sentry.service.client.server.rpc-address → The WANdisco Sentry Proxy host

    2. sentry.service.client.server.rpc-port → The WANdisco Sentry Proxy port ..

      If using CDH 5.13.x or later, the sentry.service.client.server.rpc-address and sentry.service.client.server.rpc-port settings are replaced with a single sentry.service.client.server.rpc-addresses entry with a value in the form <proxy host>:<proxy thrift port>.
    3. sentry.service.server.principal → The WANdisco Sentry Proxy principal

  5. Save these changes.

  6. Restart affected services.

Configure Solr
  1. Open the Cloudera Manager Administration Console and access the Solr service configuration tab.

  2. Select Scope ▸ Solr (Service-Wide).

  3. Locate the Sentry Service property and ensure that "sentry" is enabled.

  4. Locate the Solr Service Advanced Configuration Snippet (Safety Valve) for the sentry-site.xml property file and add the properties:

    1. sentry.service.client.server.rpc-address → The WANdisco Sentry Proxy host

    2. sentry.service.client.server.rpc-port → The WANdisco Sentry Proxy port ..

      If using CDH 5.13.x or later, the sentry.service.client.server.rpc-address and sentry.service.client.server.rpc-port settings are replaced with a single sentry.service.client.server.rpc-addresses entry with a value in the form <proxy host>:<proxy thrift port>.
    3. sentry.service.server.principal → The WANdisco Sentry Proxy principal

  5. Save these changes.

  6. Restart affected services.

Steps to connect 'solrctl' shell with sentryproxy:

Create the sentry-site.xml in '/tmp/wd-sentry-conf' and update the sentryproxy server values

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
   <property>
      <name>sentry.service.client.server.rpc-address</name>
      <value>{wd.sentry.proxy.thrift.host}</value>
   </property>
   <property>
      <name>sentry.service.client.server.rpc-port</name>
      <value>{wd.sentry.proxy.thrift.port}</value>
   </property>
   <property>
      <name>sentry.service.server.principal</name>
      <value>{wd.sentry.proxy.server.principal}</value>
   </property>
   <property>
      <name>sentry.service.security.mode</name>
      <value>kerberos</value>
   </property>
</configuration>
  1. wd.sentry.proxy.thrift.host → The WANdisco Sentry Proxy host

  2. wd.sentry.proxy.thrift.port → The WANdisco Sentry Proxy port

  3. wd.sentry.proxy.thrift.principal → The WANdisco Sentry Proxy principal

Export the SENTRY_CONF_DIR to point to /tmp/wd-sentry-conf so that the solrctl will load the custom sentry-site.xml instead the default one located in /etc/sentry/conf.

export SENTRY_CONF_DIR=/tmp/wd-sentry-conf

Now run command 'solrctl sentry <cmd>'

solrctl sentry <cmd>

5.1.3. Enable Sentry HA in Cloudera

Fusion Plugin for Live Sentry can be used in a Sentry High Availability (HA) enabled environment.

In Cloudera, the maximum active server instances possible is two. For more information see:

5.1.4. Manual Live Sentry Proxy HA Configuration

If you have configured more than one Live Sentry Proxy during installation via Cloudera Manager, then HA will have already been set up for Live Sentry.

If wanting to manually configure HA for Live Sentry Proxy, see the example below for the properties you will need:

Hive Service Advanced Configuration Snippet (Safety Valve) for sentry-site.xml
sentry.service.server.principal=live_sentry/_HOST@REALM
sentry.service.client.server.rpc-port=8073
sentry.service.client.server.rpc-address=fusion_node1
sentry.service.client.server.rpc-addresses=fusion_node1:8073,fusion_node2:8073

A restart of designated services (including Live Sentry) will be required afterwards.

In order to support Sentry HA, Live Sentry Proxy has a pool of transport connections that are implemented using the Apache Common Pool library.

If wanting to override the default values, properties can be added to the Live Sentry configuration. These will need to be added in the Live Sentry service configuration in Cloudera manager:

LIVE SENTRY PROXY SERVICE Advanced Configuration Snippet (Safety Valve) for proxy-server-site.xml

They also need to be added to the Live Sentry plugin config on all WANdisco Fusion nodes in the zone:

/etc/wandisco/fusion/plugins/live-sentry/proxy-plugin-site.xml

Properties and default values
  • sentry.service.client.server.rpc-connection-timeout

    • Socket connection timeout in milliseconds (default = 200000).

  • sentry.service.client.server.rpc.retry-total

    • Number of retry attempts to connect to the server (default = 5).

  • sentry.service.client.rpc.retry.interval.msec

    • Waiting time after retrying for thread in milliseconds (default = 3000).

  • sentry.service.client.connection.pool.max-total

    • The maximum number of client instances in connection pool (default = -1).

  • sentry.service.client.connection.pool.max-idle

    • The max number of idle client instances in connection pool (default = 100).

  • sentry.service.client.connection.pool.min-idle

    • The min number of idle client instances in connection pool (default = 10).

The properties below are based on the eviction policy of Apache Common Pool:

  • sentry.service.client.connection.pool.eviction.mintime.sec

    • Minimum time in which client instances would be removed from the pool in seconds (default = 120).

  • sentry.service.client.connection.pool.eviction.interval.sec

    • Waiting time for eviction in seconds (default = 60).

If you only have one Sentry server, and do not want to use the Apache Common Pool, then you can disable it by setting the following property to false:

  • sentry.service.client.connection.pool.enabled = false

5.1.6. Changing the timezone

Logs use UTC timezone by default but this can be manually altered through log4j configuration if required. To alter the timezone the xxx.layout.ConversionPattern property needs to be overwritten.

log4j.appender.xxxxxlog.layout.ConversionPattern=%d{ISO8601}{UTC} %p %c - %t:[%m]%n

{UTC} can be replaced with, for example {GMT} or {ITC+1:30}. If offsetting from a timezone, + or - can be used, hours must be between 0 and 23, and minutes must be between 00 and 59.

This property is located in /etc/wandisco/live-sentry-proxy/log4j.properties. After updating the file, the sentryproxy-server needs to be restarted for the changes to take effect.

5.2. Replication

5.2.1. View replication rule

Once Fusion Plugin for Live Sentry is installed, the All Sentry Rules replication rule is visible on the Replication tab of the WANdisco Fusion UI.

Fusion Plugin for Live Sentry
Figure 9. Replication rules

Click on All Sentry Rules to see more details.

Fusion Plugin for Live Sentry
Figure 10. View rule
Type

The type of replication rule, in this case the type is "Sentry".

Sentry Policies

All Sentry policies are included in this single rule so that CDH clusters replicate Apache Ranger policy definitions. The rule controls how the data is replicated between zones and does not have any impact on the policies themselves.

Zones

Lists the zones between which this rule’s associated path is replicated. Note that the "local" label identifies which of the zones that the currently viewed node belongs.

Go back to Rule list - click this button to return to the Replication Rules screen.

Live Sentry Replication Rules

System critical rules, such as the Live Sentry plugin’s default rules are not displayed in the UI due to their sensitive nature. These rules are critical to the working of the plugin and should never be modified.

5.2.2. Consistency check

When to perform a consistency check?
  • After adding new data into replication group

  • Periodically, as part of your platform monitoring

  • As part of a system repair/troubleshooting

To perform a consistency check follow the steps below.

  1. On the Replication tab, click on All Sentry Rules.

    Fusion Plugin for Live Sentry
    Figure 11. Select All Sentry Rules
  2. On the Status tab you can see the results of the previous consistency check. Click Check now to trigger a new check.

    Fusion Plugin for Live Sentry
    Figure 12. Trigger consistency check
  3. The results of the consistency check will now be displayed, the bars will turn yellow if the result is inconsistent. A more detailed report can also be downloaded.

    Fusion Plugin for Live Sentry
    Figure 13. Consistency check result

    If the result of the consistency check is inconsistent, see the make consistent section for what to do next.

Consistency check results

The consistency check lists the results of 5 items.

Groups

A set of users, maintained by the authentication system, who have been granted one or more authorization roles.

Group Roles

Groups can be granted access to a role to provide a set of users with certain privileges.

Roles

A set of privileges to perform applicable actions and any associated resources.

Privileges

An instruction which allows access to an object, these are associated with a role. The value shown is the total number of privileges in a zone.

Role Privileges

This value is the total number of privileges assigned to roles. For example, if all privileges are assigned to 2 roles, this value will be double the number of privileges.

5.2.3. Make consistent

If you have performed a consistency check and the result is inconsistent, follow the steps below to make the zones become consistent.

  1. Select the zone which you want to be the Source of Truth by clicking on the relevant graph.

    Fusion Plugin for Live Sentry
    Figure 14. Make consistent

    The differences between the zones will now be highlighted.

There are 2 methods to make data consistent.

  • The default method does not delete any data in the target zone. When a zone of truth is selected, the bars will highlighted what the outcome will be. It is still possible with this method, to have an inconsistent final result as we are not removing any data, this will be highlighted in orange.

    • If you select the option Do not delete any data in target zones then data will not be removed.

  • The alternative method is to allow data to be deleted from the target zone. This method will always provide a consistent outcome.

    Zone may not become consistent
    If you have selected Do not delete any data in target zones then no data will be removed, however the graphs will not update to reflect this. A consistency check after the Make Consistent has been performed may still return a result of inconsistent as data will not have been removed.
    1. Now click Make Consistent.

    2. The zones are now consistent, depending on the option selected. You can run another consistency check to show this if required.

5.3. Troubleshooting

Observe information in the log files generated for the WANdisco Fusion server and the Fusion Plugin for Live Sentry to troubleshoot issues at runtime. Exceptions or log entries with a ERROR label may represent information that can assist in determining the cause of any problem.

5.3.1. Operational known issues

  • Only All(*) action is assigned to Privilege of type URI in Sentry.
    Underlying Sentry only supports '\*' action for URI as per SENTRY-862.

  • See the Knowledge base for the release notes and any other known issues.