logo

WANDISCO FUSION®
PLUGIN FOR LIVE S3

1. Welcome

1.1. Product overview

Use the Fusion Plugin for Live S3 to replicate data among S3 buckets. Fusion Plugin for Live S3 provides continuous, active replication of LiveData across multiple buckets, which can span regions in AWS, or can be a mix of on-premise S3-compatible storage systems and cloud-hosted services. Applications can modify and access S3 data in any of these buckets while the Fusion Plugin for Live S3 ensures data are available across all of them.

1.2. Documentation guide

This guide contains the following:

Welcome

This chapter introduces this user guide and provides help with how to use it.

Release Notes

Details the latest software release, covering new features, fixes and known issues of which you should be aware.

Concepts

Explains how Fusion Plugin for Live S3 through WANdisco Fusion uses WANdisco’s LiveData platform.

Installation

Covers the steps required to install and set up Fusion Plugin for Live S3 into a WANdisco Fusion deployment.

Operation

The steps required to run, reconfigure and troubleshoot Fusion Plugin for Live S3.

Developer

A guide for developers looking to develop and incorporate their own software for Fusion Plugin for Live S3.

Reference

Additional Fusion Plugin for Live S3 documentation, including documentation for the available REST API.

1.2.1. Symbols in the documentation

In the guide we highlight types of information using the following call outs:

The alert symbol highlights important information.
The STOP symbol cautions you against doing something.
Tips are principles or practices that you’ll benefit from knowing or using.
The i symbol shows where you can find more information, such as in our online Knowledgebase.

1.3. Contact support

See our online Knowledgebase which contains updates and more information.

If you need more help raise a case on our support website.

1.4. Give feedback

If you find an error or if you think some information needs improving, raise a case on our support website or email docs@wandisco.com.

2. Release Notes

2.1. Live S3 2.0.1 Build 328

24 August 2018

WANdisco is pleased to present the first major revision to the Fusion Plugin for Live S3. The 2.0 release of the Fusion Plugin for Live S3 supports the latest version of WANdisco Fusion, 2.12. It includes a handful of new features, issue resolutions, platform support and other enhancements. These release notes include specific information about the product improvements, and should be read in conjunction with the product documentation.

2.1.1. Highlighted New Features

  • Support for Live S3 installation using Debian installer.

  • Support for SELECT Object Content operation.

  • Support for Bucket ENCRYPTION operation.

  • API support for dynamic configuration for specific virtual bucket.

  • API support for selective replication with inclusion and exclusion pattern.

  • Support for non-replicated bucket accessed via Live S3.

  • Support for S3 Compatible storage providers:

    • IBM Cloud Object Storage (COS)

    • Alibaba Cloud Object Storage Service (OSS)

    • Oracle Cloud Object Storage

    • Oracle Cloud Object Storage Classic

    • Minio Gateway for Azure Blob Storage

    • Ceph Object Storage (v0.94.x Hammer Release)

    • Pure FlashBlade Object Storage

    • Dell EMC Elastic Cloud Storage

    • Scality S3 Server Object Storage

2.1.2. System Requirements

Before installing or upgrading, ensure that your systems, software, and hardware meet the requirements. The requirements for WANdisco Fusion are found in the User Guide at http://docs.wandisco.com/bigdata/wdfusion/2.12/#_prerequisites_checklist.

Fusion Plugin for Live S3 is tested on a more limited number of operating systems then the main product. These are:

  • RHEL 6 x86_64

  • RHEL 7 x86_64

  • Oracle Linux 6 x86_64

  • Oracle Linux 7 x86_64

  • CentOS 6 x86_64

  • CentOS 7 x86_64

  • Ubuntu 16.04LTS

  • SLES 11 x86_64

  • SLES 12 x86_64

Unsupported:

  • Ubuntu 14.04LTS

2.1.3. Known Issues

Some S3 concepts do not map directly to an environment that replicates buckets—e.g., bucket creation or deletion, and operations that act on objects by unique identifier provided by the S3 endpoint. There are also some aspects of the S3 API that are not yet supported—e.g., browser-based object upload/creation.

  1. Deletion of specific object versions is not supported because independent S3 buckets assign different version identifiers to replicated instances of an object.

  2. Deletion of tagging for a specific object version is not supported because independent buckets assign different version identifiers to replicated instances of an object.

  3. POST is an alternative to PUT for browser-based uploads. The Fusion Plugin for Live S3 does not support Post object replication with virtual bucket.

  4. POST Object restore allows restoration of a temporary copy of an archived object. It is not supported by the Fusion Plugin for Live S3.

  5. Multi-chunk payloads are not yet supported.

  6. Authentication via an IAM role is not yet supported.

  7. Buckets configured with MFA serial security are not yet supported.

  8. AWS KMS SSE is not yet supported.

  9. Configuration changes through REST API will take effect only when the proxy server is restarted.

  10. The non-replicated bucket has to be created under the default-endpoint-url storage.

3. Concepts

3.1. Product concepts

Familiarity with the following concepts will improve your use of the Fusion Plugin for Live S3.

Virtual Bucket

A virtual bucket is a bucket accessible through the Fusion Plugin for Live S3 that retains and shares data across multiple underlying buckets. Access your data with automatic, continuous and consistent replication using the Fusion Plugin for Live S3. Applications use the virtual bucket when interacting with S3 via the Fusion Plugin for Live S3.

S3 Proxy

Applications use the Fusion Plugin for Live S3 via one or more S3-compatible endpoints that it provides in the S3 proxy, which is the runtime component that proxies access to underlying S3-compatible storage services on behalf of the Fusion Plugin for Live S3.

WANdisco Fusion Plugin

The Fusion Plugin for Live S3 extends the WANdisco Fusion server to support the operation of the S3 proxy. Coordinate activities among multiple S3 proxy instances with the WANdisco Fusion Plugin that is installed for each WANdisco Fusion server.

3.2. S3 Plugin Architecture

Fusion Plugin for Live S3 provides a LiveData architecture, where data are stored and used in multiple locations, while data are replicated with guaranteed consistency across them all.

The Fusion Plugin for Live S3 is a distributed network proxy for the S3 API that uses WANdisco Fusion to replicate data. Replication is performed selectively for S3 buckets, allowing any bucket to have replicas in other locations or S3 providers, including Amazon S3, Dell EMC Elastic Cloud Storage, IBM Cloud Object Storage, Amazon Snowball, Snowball Edge, Virtustream Storage Cloud and more. Requests made from client applications to the S3 endpoints provided by the proxy are coordinated so that activities performed against a single S3 bucket make content consistent across multiple buckets. These buckets can span separate AWS regions, or even be provided by alternative S3 implementations.

Multiple applications can use any of the replicated S3 endpoints at the same time, while WANdisco Fusion ensures that all activities are replicated with active-active consistency across all environments. Each application need only communicate with its local replicated S3 endpoint for objects that it uses to be made consistent across all S3 buckets.

Live S3 Architecture
Figure 1. Live S3 Architecture

Build distributed systems that use S3 with the Fusion Plugin for Live S3. Applications can operate against the same set of S3 objects in multiple locations, reading and writing against their local endpoint, while WANdisco Fusion ensures that the objects are accessible in every location.

Unlike cross-region replication that is native to Amazon S3, the Fusion Plugin for Live S3 supports the use of object replicas:

  • across multiple regions without the use of Object Versioning

  • between buckets within the same region if needed

  • in a multi-directional manner among as many buckets as you need

  • without granting Amazon S3 IAM roles just for the purpose of replication

  • without any need for maintaining metadata information that is specific to replication

  • preventing conflicting modifications to the bucket

  • with S3-compatible systems other than Amazon S3

  • between different S3 providers

3.3. Supported Functionality

3.3.1. S3 Features

The Fusion Plugin for Live S3 supports a broad range of S3-compatible features, including:

Virtual-hosted style URLs

Where the virtual bucket name is part of the domain name in a virtual-hosted–style URL. For example: http://virtualbucket.s3proxyhost.yourdomain.com.

Path style URLs

Where the bucket name is not included in the domain unless as a region-specific endpoint. For example: http://s3proxyhost.yourdomain.com/virtualbucket.

Client request signature validation

Requests are validated for correct credentials before being executed against an underlying S3-compatible storage. Versions 2 and 4 of AWS signatures are supported:

  1. AWS V4 Path Style

  2. AWS V4 Virtual Hosted Style

  3. AWS V4 Pre-signed URLs

  4. AWS V2 Path Style

  5. AWS V2 Virtual Hosted Style

  6. AWS V2 Pre-signed URLs

The Fusion Plugin for Live S3 supports versions 2 and 4 of AWS signatures in Path Style. To enable virtual hosted style, follow the below steps:

  1. Add a tag <path-style-access>false</path-style-access> in /etc/wandisco/fusion/plugins/live-s3/proxy-plugin-site.xml

  2. Add the DNS entry for virtual-hosted–style URL in /etc/hosts. Open /etc/hosts file and add the virtual-hosted–style URL, e.g 0.0.0.0 virtualbucket.s3proxyhost.yourdomain.com

Default config values in AWS CLI

Signature type: v4
Path addressing style: auto

Run the commands below to set the required signature type and addressing style:

Signature type v4: aws configure set default.s3.signature_version s3v4
Signature type v2: aws configure set default.s3.signature_version s3
Enable Path Style: aws configure set default.s3.addressing_style path
Enable Virtual Style: aws configure set default.s3.addressing_style virtual
Payload Options

S3 Payloads can be signed or unsigned in a single chunk. Multi-chunk payloads are not yet supported.

Server-side Encryption

AES-256 server-side encryption is supported. AWS KMS SSE is not yet supported.

IAM role authentication

IAM role-based authentication is not yet supported.

MFA Serial authentication

Buckets configured with MFA serial security are not yet supported.

3.3.2. WANdisco LiveData Features

The Fusion Plugin for Live S3 provide support for additional features beyond those of standard S3 endpoints as a result of supporting LiveData functionality. These include:

Consistency Check

Determine and report on differences in content between replicated buckets.

Repair

Resolve any differences in the content among multiple buckets automatically.

Unidirectional networking not supported
Unidirectional networking is not currently supported for Live S3. A work around is available using sshtunneling and port forwarding, please contact WANdisco support for more information.

3.3.3. S3 API Support

Broad support for the S3 API is offered by the Fusion Plugin for Live S3. Details of the specific operations that clients of the Fusion Plugin for Live S3 can use are provided below. Of note are:

  • Operations that use object identifiers (not keys) being only partially supported

  • POST operations that are partially supported

Operations on Services
Operations on Buckets
Operations on Objects

3.4. Deployment models

3.4.1. Use Cases for the Fusion Plugin for Live S3

Use the Fusion Plugin for Live S3 for a variety of reasons, including:

Heterogenous storage

Your applications may benefit from accessing S3 data in different storage systems, perhaps both on-premises and in the cloud, or with multiple cloud providers to take advantage of cost arbitrage.

Multi-geo applications

Applications that operate in multiple, geographically-separate locations can work with a local S3 endpoint in each location, and ensure that each location has access to the same data.

Improving performance

By having a local replica of a bucket, applications can operate more efficiently than if they need to work with data that are not physically close.

Improved availability

The impact of the failure of a single source of S3 objects can be eliminated by having a strongly-consistent replica of those objects in another source.

Regulatory compliance

Your compliance needs may require that you store multiple copies of data in different locations, or with different service providers.

The Fusion Plugin for Live S3 automates the replication of data across S3 buckets and ensures that they store exactly the same information, even when applications change content in any of the replicated buckets.

4. Installation

4.1. Pre-requisites

Along with the standard product requirements that you can find on the WANdisco Fusion Deployment Checklist, you also need to ensure that you have available:

  • WANdisco Fusion 2.12.x for Local File System or ASF Hadoop 2.7.0

  • Java 1.8

  • One or more compatible providers of an S3 endpoint: AWS S3, AWS Snowball, Virtustream Storage Cloud or HGST Activescale. Note that other providers may be fully compatible with the WANdisco Fusion, and WANdisco will continue to test and validate functionality for a broad range of S3 implementations.

  • Credentials for accessing the S3 endpoints among which replication is required. For AWS S3, this will be in the form of an Access Key and Secret Access Key.

  • Details of the endpoint URL by which applications access the S3 service normally, e.g. s3-us-west-1.amazonaws.com

  • The name of each bucket used

  • Access to the hosts on which each WANdisco Fusion server is operating for the purpose of installation

  • The names of the WANdisco Fusion zones across which replication will occur.

Also note that when using the Live S3 proxy, authentication can be limited depending on your configuration:

  1. For buckets with an assigned virtual bucket, only the accessKey/secretKey pair which is defined in the vbucket configuration can be used to access the proxy. Other accessKey/secretKey will be refused as invalid credentials (even though they might actually be valid with the underlying storage).

  2. For non-replicated buckets, any accessKey/secretKey pair that appears in any vbucket configuration can be used to access the proxy. The same pair will be used when request is passed to the underlying storage.

The Live S3 plugin must be installed in all zones
All WANdisco Fusion servers that participate in S3 replication need to have the plugin installed. While you can be selective about which WANdisco Fusion zones will have S3 objects replicated for each virtual bucket, every WANdisco Fusion server in the network needs the plugin in order to function.

4.2. Installation

Install the Fusion Plugin for Live S3 using a standard RPM- or DEB-based installation process. Configure the plugin with simple command-line tools or manual changes to configuration files that are specific to the plugin.

Ensure you have read all known issues and pre-requisites before beginning installation.

4.2.1. Locate installation components

There are RPM & DEB files that provide installable components for Centos and Ubuntu respectively:

  • fusion-s3-plugin-localfs-2.7.0-2.0.1.0-328.noarch.rpm

  • fusion-s3-proxy-localfs-2.7.0-2.0.1.0-328.noarch.rpm

  • fusion-s3-plugin-localfs-2.7.0_2.0.1.0-328_all.deb

  • fusion-s3-proxy-localfs-2.7.0_2.0.1.0-328_all.deb

Obtain the files so that you can distribute them to the appropriate hosts in your deployment for WANdisco Fusion. The plugin needs to be installed on each WANdisco Fusion server host in your deployment, while the proxy needs to be installed on each machine where you intend to operate an S3 Proxy.

Install as many S3 Proxy instances as you need in each WANdisco Fusion zone. Improve scalability and availability by operating more than one S3 Proxy instance per zone.

4.2.2. Install the plugin

Install fusion-s3-plugin-localfs-2.7.0-2.0.1.0-328.noarch.rpm or fusion-s3-plugin-localfs-2.7.0_2.0.1.0-328_all.deb on each WANdisco Fusion server host as the superuser:

Install the plugin on each WANdisco Fusion server:
# rpm -i fusion-s3-plugin-localfs-2.7.0-2.0.1.0-328.noarch.rpm Enter

or

# dpkg -i fusion-s3-plugin-localfs-2.7.0_2.0.1.0-328_all.deb Enter

4.2.3. Install the proxy

Install fusion-s3-proxy-localfs-2.7.0-2.0.1.0-328.noarch.rpm or fusion-s3-proxy-localfs-2.7.0_2.0.1.0-328_all.deb on each host where you want to operate a S3 Proxy.

Install the proxy on each host required:
# rpm -i fusion-s3-proxy-localfs-2.7.0-2.0.1.0-328.noarch.rpm Enter

or

# dpkg -i fusion-s3-proxy-localfs-2.7.0_2.0.1.0-328_all.deb Enter

4.2.4. Create a replication rule for each virtual bucket

The Fusion Plugin for Live S3 uses a replication rule to coordinate activities against each virtual bucket. Use the WANdisco Fusion UI or API to create a replication rule e.g. /repl. The path and vbucket name do not need to be same.
Note: The Fusion user must have ownership of this path.

4.2.5. Configure the plugin

These steps need to be repeated on all Fusion servers, and the configuration needs to be the same on each.

Change current directory to /etc/wandisco/fusion/plugins/live-s3:

# cd  /etc/wandisco/fusion/plugins/live-s3 Enter

Execute the configuration script configure-proxy-plugin and provide details of how the plugin will operate:

Virtual bucket name

Choose a name that will be the single identifier for the proxy’s virtual bucket. This name will be used by client applications when interacting with replicate S3 objects, and will be available at the endpoints offered by each of the S3 proxy instances.

Number of zones

The configuration script will prompt for further information for each zone:

Zone name

The name of the zone as defined by the WANdisco Fusion configuration.

Bucket name

The name of the underlying S3 bucket for this zone.

Bucket access key

Credentials for the underlying bucket.

Bucket secret access key

Further credentials for the underlying bucket.

S3 Provider

Underlying storage provider.

Bucket region

The region by which the bucket is located.

Bucket endpoint URL

The endpoint by which the bucket can be accessed. This can be in any of the forms: <hostname>, <hostname>:<port>, http(s)://<hostname>, or http(s)://<hostname>:<port>.

An example:

# ./configure-proxy-plugin Enter
Enter the virtual bucket name: vbucket Enter
Number of zones associated with this virtual bucket: 2 Enter
Enter zone-1 values:
Enter the zone name: zone1 Enter
Enter bucket name: zone1bucket Enter
Enter the bucket access key: <zone1bucket access key> Enter
Enter the bucket secret access key: <zone1bucket secret access key> Enter
Enter the bucket region: us-east-1 Enter
Enter the bucket endpoint url: s3.amazonaws.com Enter
Please specify the appropriate provider from the list below:
1) AWS_S3               8) ORACLE_OBJECT_STORAGE
2) AWS_SNOWBALL         9) ORACLE_OBJECT_STORAGE_CLASSIC
3) VIRTUSTREAM          10) MINIO_AZURE
4) ACTIVESCALE          11) SCALITY_S3
5) IBM_COS              12) CEPH_HAMMER
6) DELL_ECS             13) None of the above
7) ALIBABA_OSS
#? 1
Provider selected: AWS_S3Enter

Enter zone-2 values:
Enter the zone name: zone2 Enter
Enter bucket name: zone2bucket Enter
Enter the bucket access key: <zone2bucket access key> Enter
Enter the bucket secret access key: <zone2bucket secret access key> Enter
Enter the bucket region: us-east-1 Enter
Enter the bucket endpoint url: s3-eu-west-1.amazonaws.com Enter
Do you want to add another virtual bucket (yes/no): no Enter
Please specify the appropriate provider from the list below:
1) AWS_S3               8) ORACLE_OBJECT_STORAGE
2) AWS_SNOWBALL         9) ORACLE_OBJECT_STORAGE_CLASSIC
3) VIRTUSTREAM          10) MINIO_AZURE
4) ACTIVESCALE          11) SCALITY_S3
5) IBM_COS              12) CEPH_HAMMER
6) DELL_ECS             13) None of the above
7) ALIBABA_OSS
#? 1
Provider selected: AWS_S3Enter

Do you want to add another virtual bucket (yes/no): no
 ------------------------------------------------------------------------------------------------

 * Bucket details 

   <vbucket name="vbucket">

      <bucket name="zone1bucket">
        <zonename>zone1</zonename>
        <accesskey>*</accesskey>
        <secretaccesskey></secretaccesskey>
        <region>us-east-1</region>
        <endpoint-url>s3.amazonaws.com</endpoint-url>
        <provider>AWS_S3</provider>
      </bucket>

      <bucket name="zone2bucket">
        <zonename>zone2</zonename>
        <accesskey>*</accesskey>
        <secretaccesskey>*</secretaccesskey>
        <region>eu-west-1</region>
        <endpoint-url>s3-eu-west-1.amazonaws.com</endpoint-url>
        <provider>AWS_S3</provider>
      </bucket>


   </vbucket>

 ------------------------------------------------------------------------------------------------

Confirm the s3proxy plugin configuration details (yes/no): yes

S3Proxy plugin configuration done successfully, restart 'fusion-server' to load the plugin configuration

 --------------------------------------------------------------------------------------------
 Note: You can edit the configuration values anytime in: /etc/wandisco/fusion/plugins/live-s3/proxy-plugin-site.xml
       The fusion server must be restarted for the changes to take effect
 --------------------------------------------------------------------------------------------

Once completed, the script will produce the configuration file at /etc/wandisco/fusion/server/s3proxy-plugin-site.xml. You can modify this file later if required. If changed, restart the WANdisco Fusion server as configuration properties are obtained on WANdisco Fusion server startup only.

4.2.6. Configure the proxy

Configuring the proxy needs to be done where the proxy is installed.

Change current directory to /etc/wandisco/live-s3-proxy:

# cd /etc/wandisco/live-s3-proxy Enter

Execute the configuration script configure-s3proxy-server. Provide details for the operation of the S3 proxy:

Server listen host

The network interface on which the proxy should listen for client connections. This can be a specific IP address, network name, or the 0.0.0.0 value if the proxy should listen on all available interfaces.

Server listen port

The IP port used by the proxy to accept client connections.

Enable SSL

The proxy supports HTTP or HTTPS access. SSL should be enabled to offer clients the option of communicating with the proxy via HTTPS. Specifying yes to this setting will require further information on the keystore path and password.

Virtual host name

Proxy server’s DNS compatible host names.

Fusion server host name

The host name of the WANdisco Fusion server associated with this proxy instance.

Fusion server request port

The request port offered by the WANdisco Fusion server.

Virtual bucket name

Use the same virtual bucket name as when configuring the plugin.

Replication path

Replication rule created using the WANdisco Fusion UI.

Default endpoint url

The endpoint url of underlying storage by which the non-replicated buckets can be accessed.

An example:

# ./configure-proxy-server Enter
Enter the S3Proxy server listen host [0.0.0.0]: s3proxydemo.wandisco.com Enter
Enter the S3Proxy server listen port [8081]: 8081 Enter
Do you want to enable ssl (yes/no)?
  [If yes, you need to provide the keystore path and password]: no Enter
Enter Proxy server’s DNS compatible host names (ie., virtual host) [localhost,127.0.0.1]: s3proxydemo.wandisco.com Enter
Enter the fusion server host and port [host:port]: s3proxydemo.wandisco.com:8023 Enter
Is Fusion SSL enabled (yes/no)?
 [If yes, you need to provide the keystore path and encrypted password]: yes Enter
Enter the Truststore file path: /opt/fusionssl/wandisco.ks Enter
Enter the Truststore keytype: JKS Enter
Please enter the password to be encrypted
>
Enter the virtual bucket name: vbucket Enter
Enter the replication path for this virtual bucket (should start with '/'): /repl Enter
Do you want to add another virtual bucket (yes/no): no Enter
Enter the default endpoint url of underlying storage: s3.amazonaws.com Enter
 ------------------------------------------------------------------------------------------------
 * S3Proxy server details *
Proxy server listen host: s3proxydemo.wandisco.com
Proxy server listen port: 8081
Proxy server SSL: false
Proxy server DNS compatible host names: localhost,127.0.0.1,s3proxydemo.wandisco.com
 * Fusion server details *
Fusion server host and port: s3proxydemo.wandisco.com:8023
 * Bucket details *
  <virtualbucket name="vbucket" repl-path="/repl" />
Default endpoint url of underlying storage: s3.amazonaws.com
 ------------------------------------------------------------------------------------------------
 ------------------------------------------------------------------------------------------------
Which user should Live S3Proxy run as? [root]: root Enter
Which group should Live S3Proxy run as? [root]: root Enter
Enter the minimum memory(-Xms) for Live S3Proxy (in MB) [512]: Enter
Enter the maximum memory(-Xmx) for Live S3Proxy (in MB) [1024]: Enter
 -------------------------------------------------
 * Live S3Proxy environment details *
 Run as User: root
 Run as Group: root
 Minimum memory: 512m
 Maximum memory: 1024m
  -------------------------------------------------
Do you confirm the s3proxy server configuration details (yes/no): yes Enter
S3Proxy server configuration done successfully, start 's3proxy-server' to load the configuration
--------------------------------------------------------------------------------------------------------
Note: You can edit the configuration values anytime in: /etc/wandisco/live-s3-proxy/core-site.xml, /etc/wandisco/live-s3-proxy/proxy-server-site.xml
      The s3proxy-server must be restarted for the changes to take effect
  -------------------------------------------------------------------------------------------------------

4.2.7. Enabling SSL

To enable SSL for the proxy server follow the steps below:

  1. Generate SSL certificates in /etc/wandisco/live-s3-proxy/generate-keystore.sh

  2. Reconfigure the proxy server in /etc/wandisco/live-s3-proxy/configure-proxy-server

  3. Connect to the S3Proxy without --no-verify-ssl. The 's3proxy server' can be connected via ssl without --no-verify-ssl by one of the following approaches.

Approach 1: Passing the trusted CA root certificate using the --ca-bundle command line argument

aws s3 ls s3:// --endpoint-url https://s3proxydemo.wandisco.com:8081 --ca-bundle /etc/wandisco/live-s3-proxy/ssl/ca.crt
2018-05-21 13:38:04 vbucket

Approach 2: Set the environment variable AWS_CA_BUNDLE with the absolute path to the trusted CA root certificate.

export AWS_CA_BUNDLE=/etc/wandisco/live-s3-proxy/ssl/ca.crt
echo $AWS_CA_BUNDLE
/etc/wandisco/live-s3-proxy/ssl/ca.crt
aws s3 ls s3:// --endpoint-url https://s3proxydemo.wandisco.com:8081
2018-05-21 13:41:04 vbucket

Approach 3: In the .aws/config file, set the variable ca_bundle with the absolute path of the trusted CA root certificate.

ca_bundle = /etc/wandisco/live-s3-proxy/ssl/ca.crt

cat .aws/config
[default]
output = json
region = us-east-1
ca_bundle = /etc/wandisco/live-s3-proxy/ssl/ca.crt
s3 =
signature_version = s3
addressing_style = path
aws s3 ls s3:// --endpoint-url https://s3proxydemo.wandisco.com:8081
2018-05-21 13:41:04 vbucket

4.2.8. Changing the time zone

Logs use UTC timezone by default but this can be manually altered through log4j configuration if required. To alter the timezone the xxx.layout.ConversionPattern property needs to be overwritten.

log4j.appender.xxxxxlog.layout.ConversionPattern=%d{ISO8601}{UTC} %p %c - %t:[%m]%n

{UTC} can be replaced with, for example {GMT} or {ITC+1:30}. If offsetting from a timezone, + or - can be used, hours must be between 0 and 23, and minutes must be between 00 and 59.

This property is located in /etc/wandisco/live-s3-proxy/log4j.properties. After updating the file, the s3proxy-server needs to be restarted for the changes to take effect.

5. Operation

5.1. Configuration

Once configured, restart the WANdisco Fusion server to use the configuration applied:

# service fusion-server restart Enter

Then start each instance of the S3 proxy:

# service s3proxy-server start Enter

You can validate operation against the virtual buckets defined for the environment using standard S3 client applications, such as the AWS CLI tools, or tools like s3cmd.

5.1.1. Configuring Applications to use the Fusion Plugin for Live S3

Client applications can be configured to use the virtual buckets provided by the Fusion Plugin for Live S3 in multiple ways:

As an HTTP(S) proxy

Applications that would normally communicate directly with the underlying S3 bucket can be directed to use the proxy through standard HTTP(S) proxy configuration. The benefit of this approach is that it requires no change to application code to direct them to the Fusion Plugin for Live S3.

Ensure that the plugin has been configured to accept requests for the original hostname used by client applications. Specify the DNS compatible host names to match when configuring the proxy. Applications can continue to use the original bucket name if it matches the underlying bucket referred to by the proxy.

As a new S3 endpoint

Applications can direct S3 requests directly to the proxy, which provides virtual buckets. Configure your application to refer to the proxy as the S3 endpoint, and use the virtual bucket name.

5.2. Administration

Once configured, applications interact with the virtual buckets that have been configured. Standard S3 API operations are replicated with strong consistency among the underlying buckets, and content associated with objects created is replicated between these buckets.

Use the Fusion Plugin for Live S3 to provide a LiveData environment, where applications can interact with any of the replicated buckets, and each bucket will provide access to the same content regardless of where change is initiated.

5.2.1. Regular Application Operation

Operate your applications without change, whether they are custom applications, standard command line or other tools for working with S3.

5.2.2. Consistency Check

Because applications that do not interact with the Live S3 environment via the Fusion Plugin for Live S3 can modify bucket content without coordination or replication, the product provides a consistency check feature. Use consistency check to report on any differences among the replicated buckets.

Consistency check is provided as a REST API. A consistency check is a potentially long-running task, initiated with a specific REST operation. There are 2 consistency check options:

  1. Provide the path associated with the virtual bucket when initiating the check to specify which virtual bucket to review:

    # curl -i -X POST "http://localhost:8082/plugin/s3proxy/cc?path=/repl-path&vbucket=vbucket" Enter
    HTTP/1.1 202 Accepted
    Content-Location: http://localhost:8082/fusion/task/84b417f3-ec60-11e7-aa4b-0242ac120002
    Content-Length: 0
    Server: Jetty(6.1.26)
  2. To check a particular directory under the virtual bucket associated with a replication path, add the ccpath to the end of the operation:

    # curl -i -X POST "http://localhost:8082/plugin/s3proxy/cc?path=/repl-path&vbucket=vbucket&ccpath=dir1/" Enter
    HTTP/1.1 202 Accepted
    Content-Location: http://localhost:8082/fusion/task/84b417f3-ec60-11e7-aa4b-0242ac120002
    Content-Length: 0
    Server: Jetty(6.1.26)

Access the status of the consistency check with the cc task Id generated during consistency check operation at the location referenced:

# curl -i -X GET "http://localhost:8082/fusion/task/84b417f3-ec60-11e7-aa4b-0242ac120002"" Enter
HTTP/1.1 200 OK
Content-Length: 938
Content-Type: application/xml
Server: Jetty(6.1.26)

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><task><taskId>84b417f3-ec60-11e7-aa4b-0242ac120002</taskId><timeCreated>1514528494614</timeCreated><creatorNodeId>400b7699-56d2-44c5-b07c-f42a760b966f</creatorNodeId><timeUpdated>1514528530417</timeUpdated><isDone>true</isDone><aborted>false</aborted><properties><entry><key>CC_REPORT_PATH</key><value>/vbucket/.fusion/4fc19ba7-e744-11e7-85c7-0242ac120003/metadata/84b417f3-ec60-11e7-aa4b-0242ac120002/cc-report</value></entry><entry><key>TOTAL_INCONSISTENCIES_FOUND</key><value>0</value></entry><entry><key>TASK_TYPE</key><value>S3PROXY_CONSISTENCY_CHECK</value></entry><entry><key>LOCAL_COMPLETE</key><value>1514528530417</value></entry><entry><key>LOCAL_START</key><value>1514528494614</value></entry><entry><key>CONSISTENCY_CHECK_STATUS</key><value>CONSISTENT</value></entry></properties><previousTask xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/></task>

To access the consistency check report use the cc task Id generated during consistency check operation at the location referenced:

# curl -i -X GET "http://localhost:8082/plugin/s3proxy/cc/report/84b417f3-ec60-11e7-aa4b-0242ac120002?path=/repl-path&withConsistencyReport=true" Enter
HTTP/1.1 200 OK
Content-Length: 301
Content-Type: application/xml
Server: Jetty(6.1.26)

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><consistencyReport><path>/vb1</path><state>CONSISTENT</state><taskId>84b417f3-ec60-11e7-aa4b-0242ac120002</taskId><checksumMethod>MD5</checksumMethod><noInconsistencies>0</noInconsistencies><lastCheck>1514548334438</lastCheck></consistencyReport>

5.2.3. Repair

Resolve inconsistencies among buckets with the Repair feature. Initiate a repair with a single REST API invocation, and access the status of the potentially long-running repair task at the location provided in response.

Specify the ccTaskId and the name of the zone that will be used as the source of truth for repair. Repair ensures that each replicated bucket has the same content as the source of truth. This may introduce object deletion and creation in all other replicated zones, so use this feature with care.

There are 3 types of repair:

Recursive

True A repair with recursive true will repair the contents of any subfolders in the target zone.
False A repair with recursive false will not repair the contents of any subfolders in the target zone.

Replace

True A repair with Replace true will overwrite all duplicate files and directories in target zone.
False A repair with Replace false will not overwrite the duplicate files and directories in target zone.

Preserve

True A repair with Preserve true will not remove any data that exists in target zone.
False A repair with Preserve false will remove all the data that exists in target zone.

To start a repair you need to add the cc taskId generated from the consistency check e.g.:

# curl -i -X PUT "http://localhost:8082/plugin/s3proxy/repair/84b417f3-ec60-11e7-aa4b-0242ac120002?srcZone=zone1&preserve=true&recursive=true&replace=false" Enter
HTTP/1.1 202 Accepted
Content-Location: http://localhost:8082/plugin/fusion/task/0088e14d-ec62-11e7-aa4b-0242ac120002
Content-Length: 0
Server: Jetty(6.1.26)

To access the status of the repair task at the location referenced e.g.:

# curl -i -X GET "http://localhost:8082/fusion/task/0088e14d-ec62-11e7-aa4b-0242ac120002" Enter
HTTP/1.1 200 OK
Content-Length: 679
Content-Type: application/xml
Server: Jetty(6.1.26)

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><task><taskId>0088e14d-ec62-11e7-aa4b-0242ac120002</taskId><timeCreated>1514548109259</timeCreated><creatorNodeId>400b7699-56d2-44c5-b07c-f42a760b966f</creatorNodeId><timeUpdated>1514528530417</timeUpdated><isDone>true</isDone><aborted>false</aborted><properties><entry><key>TASK_TYPE</key><value>REPAIR_TASK</value></entry><entry><key>REPAIR_STATUS</key><value>COMPLETED</value></entry><entry><key>LOCAL_COMPLETE</key><value>1514548109903</value></entry><entry><key>LOCAL_START</key><value>1514548109259</value></entry></properties><previousTask xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/></task>

5.2.4. Dynamic configuration of Virtual buckets

Virtual buckets (vbucket) can be added, listed, modified or deleted from the proxy-plugin-site.xml using the REST APIs below.

Add virtual buckets

This REST API adds virtual buckets to the proxy-plugin-site.xml. The details of the virtual bucket to be added have to be manually added to proxy-server-site.xml.

You must restart the S3Proxy server for changes to take effect.

An example xml file with the details of virtual bucket to be added in the proxy-plugin-site.xml:

<s3proxy>

   <vbucket name="vbucket1">

      <bucket name="bucket-a">
        <zonename>zone1</zonename>
        <accesskey></accesskey>
        <secretaccesskey>*</secretaccesskey>
        <region>us-east-1</region>
        <endpoint-url>s3.amazonaws.com</endpoint-url>
        <provider>AWS_S3</provider>
      </bucket>

      <bucket name="bucket-b">
        <zonename>zone2</zonename>
        <accesskey></accesskey>
        <secretaccesskey>*</secretaccesskey>
        <region>eu-west-1</region>
        <endpoint-url>s3-eu-west-1.amazonaws.com</endpoint-url>
        <provider>AWS_S3</provider>
      </bucket>

   </vbucket>

</s3proxy>
# curl -v -i -X PUT -H "Content-Type: application/xml" -d @config.xml "http://localhost:8082/plugin/s3proxy/config/vbucket?path=/repl1" Enter
> Content-Type: application/xml
> Content-Length: 776
>
* upload completely sent off: 776 out of 776 bytes
< HTTP/1.1 202 Accepted
HTTP/1.1 202 Accepted
< Content-Length: 0
Content-Length: 0
< Server: Jetty(6.1.26)
Server: Jetty(6.1.26)
List virtual buckets

This REST API gives a listing of virtual buckets configured in the proxy-plugin-site.xml.

# curl -v -X GET "http://localhost:8082/plugin/s3proxy/config/vbucket?vbucket=vbucket1" Enter
>
< HTTP/1.1 200 OK
< Content-Length: 862
< Content-Type: application/xml
< Server: Jetty(6.1.26)
<

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Vbucket>
<bucket>
<virtualBucket>vbucket1</virtualBucket>
<bucketName>bucket1</bucketName>
<zoneName>zone1</zoneName>
<accessKey></accessKey>
<secretAccessKey></secretAccessKey>
<provider>AWS_S3</provider>
<endPointRegion>us-east-1</endPointRegion>
<endPointUrl>s3.amazonaws.com</endPointUrl>
</bucket>
<bucket>
<virtualBucket>vbucket1</virtualBucket>
<bucketName>bucket2</bucketName>
<zoneName>zone2</zoneName>
<accessKey></accessKey>
<secretAccessKey></secretAccessKey>
<provider>AWS_S3</provider>
<endPointRegion>eu-west-1</endPointRegion>
<endPointUrl>s3-eu-west-1.amazonaws.com</endPointUrl>
</bucket>
</Vbucket>
Modify virtual buckets

This REST API modifies the virtual buckets configured in the proxy-plugin-site.xml. The virtual bucket details for modification has to be given as an xml file, from which the details will be taken and specified virtual bucket will be modified in the proxy-plugin-site.xml.

You must restart the S3Proxy server for changes to take effect.

An example xml file with details of virtual bucket to be modified in the proxy-plugin-site.xml:

<s3proxy>

   <vbucket name="vbucket">

      <bucket name="bucketNew1">
        <zonename>zone1</zonename>
        <accesskey></accesskey>
        <secretaccesskey>*</secretaccesskey>
        <region>us-east-1</region>
        <endpoint-url>s3.amazonaws.com</endpoint-url>
        <provider>AWS_S3</provider>
      </bucket>

      <bucket name="bucketNew2">
        <zonename>zone2</zonename>
        <accesskey></accesskey>
        <secretaccesskey>*</secretaccesskey>
        <region>eu-west-1</region>
        <endpoint-url>s3-eu-west-1.amazonaws.com</endpoint-url>
        <provider>AWS_S3</provider>
      </bucket>

   </vbucket>

</s3proxy>
# curl -v -X POST -H "Content-Type: application/xml" -d @config.xml "http://localhost:8082/plugin/s3proxy/config/vbucket?path=/repl1" Enter
> POST /plugin/s3proxy/config/vbucket/update?path=/repl1 HTTP/1.1
> Host: localhost:8082
> User-Agent: curl/7.47.0
> Accept: /
> Content-Type: application/xml
> Content-Length: 776
>
* upload completely sent off: 776 out of 776 bytes
< HTTP/1.1 202 Accepted
< Content-Length: 0
< Server: Jetty(6.1.26)
Delete virtual buckets

This REST API deletes the virtual buckets configured in the proxy-plugin-site.xml.

You must restart the S3Proxy server for changes to take effect.

You must first manually remove the details of the vbucket to be deleted from the proxy-server-site.xml.

# curl -v -i -X DELETE "http://localhost:8082/plugin/s3proxy/config/vbucket?path=/repl1&vbucket=vbucket1" Enter
* Connected to localhost (10.6.121.44) port 8082 (#0)
> DELETE /plugin/s3proxy/config/vbucket/delete?path=/repl1&vbucket=vbucket1 HTTP/1.1
> Host: localhost:8082
> User-Agent: curl/7.47.0
> Accept: /
>
< HTTP/1.1 202 Accepted
HTTP/1.1 202 Accepted
< Content-Length: 0
Content-Length: 0
< Server: Jetty(6.1.26)
Server: Jetty(6.1.26)

5.2.5. Support for non-replicated buckets

Non-replicated buckets can be accessed via Live S3 using the default endpoint url of underlying storage. The default endpoint url must be added during the proxy configuration. The data uploaded in the non replicated bucket will be redirected to the default endpoint url and will not be coordinated.

We can access the non-replicated buckets via s3proxy with below signature and addressing styles

  1. v2 Path style

  2. v4 Path style

Limitations
  • The non-replicated bucket has to be created under the default-endpoint-url storage.

  • For now only one default-endpoint-url can be configured, even when there are multiple virtual buckets

For Example:

<?xml version="1.0" encoding="UTF-8"?>

<s3proxy>

  <server-config>
    <protocol>http</protocol>
    <listen-host>127.0.0.1</listen-host>
    <listen-port>8080</listen-port>
    <keystore-path />
    <keystore-pass />
    <truststore-path />
    <truststore-pass />
    <virtual-host>localhost,127.0.0.1,s3proxydemo.wandisco.com</virtual-host>
  </server-config>

  <virtualbuckets>

    <virtualbucket name="vbucket1" repl-path="/repl1" />   --> say AWS bucket
    <virtualbucket name="vbucket2" repl-path="/repl2" />   ---> say IBM bucket
    <virtualbucket name="vbucket3" repl-path="/repl3" />   ---> say Scality bucket

    <default-endpoint-url>s3.amazonaws.com</default-endpoint-url>   --> but the default endpoint url can only point any one of the storage.

  </virtualbuckets>

</s3proxy>

5.2.6. Selective Replication

Selective Replication allows you to limit which data are replicated. This is achieved by using inclusion and exclusion rules for each virtual bucket configured. Based on the rule added in the inclusion/exclusion xml, specific live data can be included or excluded from replication. When both inclusion and exclusion rules are equal, then the priority will be given to inclusion rule.

You must restart the S3Proxy server for changes to selective replication to take effect.
Exclusion

The exclusion rule can be added, listed or deleted using REST APIs. The data will not be replicated to the other zones is it matches with the exclusion rule.

Template on exclusion xml:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<exclusions>
  <vbucket name="vbucket">
    <exclusion>/*/.txt</exclusion>
  </vbucket>
  <vbucket name="vbucket-1">
    </exclusion>
  </vbucket>
</exclusions>
Add Exclusion

The exclusion rule requested will be added to the exclusion-rules.xml for the specified virtual bucket.

For Example:

# curl -v -i -X POST 'http://localhost:8082/plugin/s3proxy/config/rules/exclusion?path=/repl_dir&vbucket=vbucket&exclusion=*.txt'
>
< HTTP/1.1 202 Accepted
HTTP/1.1 200 OK
< Content-Length: 0
Content-Length: 0
< Server: Jetty(6.1.26)
Server: Jetty(6.1.26)
List Exclusions

The exclusion rules for the specified virtual bucket will be listed.

# curl -v -i -X GET 'http://localhost:8082/plugin/s3proxy/config/rules/exclusion?vbucket=vbucket'
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Content-Length: 249
Content-Length: 249
< Content-Type: application/xml
Content-Type: application/xml
< Server: Jetty(6.1.26)
Server: Jetty(6.1.26)
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><exclusions><vbucket name="vbucket"><exclusion>//.fusion/</exclusion><exclusion>//.fusion/*</exclusion><exclusion>.txt</exclusion></vbucket></exclusions>
Delete Exclusion

The exclusion rules for the specified virtual bucket will be deleted from the exclusion-rules.xml.

# curl -v -i -X DELETE 'http://localhost:8082/plugin/s3proxy/config/rules/exclusion?path=/repl_dir&vbucket=vbucket&exclusion=/*/.pdf'
>
< HTTP/1.1 202 Accepted
HTTP/1.1 200 OK
< Content-Length: 0
Content-Length: 0
< Server: Jetty(6.1.26)
Server: Jetty(6.1.26)
Inclusion

The inclusion rule can be added, listed or deleted using REST APIs. The data will be replicated to the other zone when it matches with the inclusion rule based on the exclusion rule. The default inclusion rule will be ".*"

When the exclusion and inclusion rules are same, then the priority will be given for inclusion rule

Template on inclusion xml:

inclusion-tmpl.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<inclusions>
  <vbucket name="vbucket">
    <inclusion>dir*</inclusion>
  </vbucket>
  <vbucket name="vbucket-1">
    <inclusion>.*</inclusion>
  </vbucket>
</inclusions>
Add Inclusions

The inclusion rule requested will be added to the inclusion-rules.xml for the specified virtual bucket.

# curl -v -i -X POST 'http://localhost:8082/plugin/s3proxy/config/rules/inclusion?path=/repl_dir&vbucket=vbucket&inclusion=dir/*'
>
< HTTP/1.1 202 Accepted
HTTP/1.1 202 Accepted
< Content-Length: 0
Content-Length: 0
< Server: Jetty(6.1.26)
Server: Jetty(6.1.26)

<
* Connection #0 to host localhost left intact
* Closing connection #0
List Inclusions

The inclusion rules for the specified virtual bucket will be listed.

# curl -v -i -X GET 'http://localhost:8082/plugin/s3proxy/config/rules/inclusion?vbucket=vbucket'
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Content-Length: 139
Content-Length: 139
< Content-Type: application/xml
Content-Type: application/xml
< Server: Jetty(6.1.26)
Server: Jetty(6.1.26)
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><inclusions><vbucket name="vbucket"><inclusion>.</inclusion><inclusion>dir/</inclusion></vbucket></inclusions>
Delete Inclusions

The inclusion rules for the specified virtual bucket will be deleted from the inclusion-rules.xml.

# curl -v -i -X DELETE 'http://localhost:8082/plugin/s3proxy/config/rules/inclusion?path=/repl_dir&vbucket=vbucket&inclusion=dir/*'
>
< HTTP/1.1 202 Accepted
HTTP/1.1 202 Accepted
< Content-Length: 0
Content-Length: 0
< Server: Jetty(6.1.26)
Server: Jetty(6.1.26)

5.3. Troubleshooting

Observe information in the log files generated for the WANdisco Fusion server and the Fusion Plugin for Live S3 to troubleshoot issues at runtime. Exceptions or log entries with a SEVERE label may represent information that can assist in determining the cause of any problem.

As a distributed system, the Fusion Plugin for Live S3 will be impacted by the operation of the underlying S3 endpoints with which it communicates. You may also find it useful to review log or other information from these endpoints.


1. The specified API contains payload as xml configuration with the dependency on destination bucket which has to be in the same region as the source bucket.
2. The specified API contains payload as xml configuration with the dependency on destination bucket which has to be in the different region from the source bucket.
3. PUT Bucket Notification has payload as xml configuration containing topic arn which has the region that has to be same as the bucket region.
4. Deletion of specific object versions is not supported because independent S3 buckets assign different version identifiers to replicated instances of an object.
5. Deletion of tagging for a specific object version is not supported because independent buckets assign different version identifiers to replicated instances of an object.
6. POST is an alternative to PUT for browser-based uploads. The Fusion Plugin for Live S3 does not support Post object replication with virtual bucket.
7. POST Object restore allows restoration of a temporary copy of an archived object. It is not supported by the Fusion Plugin for Live S3.