WANdisco Git MultiSite logo

4. Technical Guide

4.1 Benefits of running with Git MultiSite

This guide runs through everything you need to know to get Git MultiSite deployed. First we'll cover all the things that you need to have in place before you install. We'll then cover a standard installation and setup. Finally we'll look at some possible problems you might experience with some troubleshooting tips.

LAN-speed Performance Dramatically Shortens Development Cycles and Reduces Cost

Zero Downtime and Zero Data Loss

Enables Continuous Availability for Global Software Development

Easy to Administer

No Retraining Required

4.2 How Git MultiSite Works

Git MultiSite provides a toolset for replicating Git repository data in a manner that can maximise performance and efficiency whilst minimising network and hardware resources requirements. The following examples provide you with a starting point for deciding on the best means to enable replication across your development sites.

Replication Model

Per-Repository Replication

Git MultiSite is able to replicate on a per-repository basis. This way each site can host different sets of repositories and replicate repositories this means that you can have different repositories replicate as part of different replication groups.
** Alert! **

Dynamic Membership Evolution

** evolution without stopping work **

No need for a synchronized stop - Git MultiSite allows replication groups to change their membership on-the-fly.

A repository can only replicate to a single replication group at any one time, although it is possible to move between replication groups as required - this can now be done on-the-fly, sites can be added or deleted without the need to pause all replication (with a synchronized stop)

Git MultiSite offers a great deal of flexibility in how repository data is replicated. Before you get started it's a good idea to map out which repositories at which locations you want to replicate.

4.3 Creating resilient Replication Groups

Git MultiSite is able to maintain repository replication (and availability) even after the loss of nodes from a replication group. However, there are some configuration rules that are worth considering:

Rule 1: Understand Learners and Acceptors

The unique Active-Active replication technology used by MultiSite is an evolution of the Paxos algorithm, as such we use some Paxos concepts which are useful to understand:

Learners:

Learners are the nodes that are involved in the actual replication of repository data. When changes are made on the local repository copy these nodes raise a proposal for the changes to be made on all the other copies.

Learner Nodes are required for the actual storage and replication of repository data. You need a learner node at any location where users are working or where you wish to store hot-backups of repositories

Types of Nodes that are learners: Active, Passive

Acceptors:

All changes being made on each repository in exactly the same order is a crucial requirement for maintaining synchronization. Acceptors are nodes that take part in the vote for the order in which proposals are played out.

Acceptor Nodes are required for keeping replication going. You need enough Acceptors to ensure that agreement over proposal ordering can always be met, even accounting for possible node loss. For configurations where there are a an even number of Acceptors it is possible that voting could become tied. For this reason it is possible to make a voter node into a tie-breaker which has slightly more voting power so that it can outvote another single voter node.

Types of nodes that are Acceptors: Voter Only
Nodes that are both an Acceptor and Learner: Active Voter, Passive Voter

Rule 2: Replication groups should have a minimum membership of three learner nodes

Two-node replication groups are not fault tolerant, you should strive to replicate according to the following guideline:

The number of voters required to survive population loss of N voter nodes = 2N+1 where N is the number of failed voter nodes.

So in order to survive the loss of a single voter node you need to have a minimum of 2x1+1= 3 voter nodes

In order to keep on replicating after losing a second voter node you need 5 voter nodes.

Rule 3: Learner Population - resilience vs rightness

During the installation of each of your nodes you are asked to provide a Content Node Count number, this is the number of other learner nodes in the replication group that need to receive the content for a proposal before the proposal can be submitted for agreement.

Setting this number to 1 ensures that replication won't halt if some nodes are behind and have not received replicated content yet. This strategy reduces the chance that a temporary outage or heavily loaded node will stop replication, however, it also increases the risk that repositories will go out of sync (requiring admin-intervention) in the event of an outage.

Rule 4: 2 nodes per site provides resilience and performance benefits

Running with two nodes per site provides two important advantages.

Firstly it provides every site with a local hot-backup of the repository data.

Enables a site to load-balance repository access between the nodes which can improve performance during times of heavy usage.

Providing the nodes are Voters, it increases the voter population and improves resilience for replication.

4.4 Available Node Types

Each replication group consists of a number of sites and a selection of repositories that will be replicated.

There are now some different types of nodes:

** node **Active Node
An Active Node has users who are actively committing to Git repositories, which results in the generation of proposals that are replicated to the other sites. However, it plays no part in getting agreement on the ordering of transactions.
** node **Active Voter
An Active Voter is an Active Node that also votes on the order that transactions are played out. In a replication group with a single Active Voter, it alone decides on ordering. If there's an even number of Active Voters, theres a problem because there's the possibility that the vote will hang with no overall winner.
Passive
A node on which repositories receive updates from other sites, but doesn't permit any changes to its replicas from clients - effectively making its repositories read-only. Passive nodes are ideal for use in providing hot-backup.
Passive Voter
A passive node that also takes part in the vote for transaction ordering agreement.
** node **Voter
A Voter-only node doesn't store repository data, it's only purpose is to accept transactions and cast of vote on transaction ordering. Voter-only nodes add resilience to a replication group as they increase the likelyhood that enough nodes are available to make agreement on transaction ordering

Use for:
  • Dedicated servers for Continuous Integration servers
  • Sharing code with partners or sites that won't be allowed to commit changes back
  • In addition, these nodes could help with HA as they add another voter to a site.
  • ** node **Tiebreaker
    The Tie-breaker site is an active node that only votes when the votes are evenly split, causing a deadlock. The tie-breaker therefore gets the casting vote.
    ** node **Helper
    When adding a new site to an existing replication group you will select an existing site from which you will manually copy or rsync the applicable repository data. This existing site enters the 'helper' mode in which the same relevant repositories will be read-only until they have been synced with the new site. BY relevant we mean that they are replicated in the replication group in which the new site is being added.
    ** node **New
    When a site is added to an existing replication group it enters an 'on-hold' state until repository data has been copied across. Until the process of adding the repository data is complete, New sites will be read-only. Should you leave the Add a Site process before it has completed you will need to manually remove the read-only state from the repository

    Acceptors, Proposers and Learners?

    The table below shows which node roles are acceptors, proposers or learners.

    Node Type Acceptor Proposer Learner
    Active (A) N Y Y
    Active Voter (AV) Y Y Y
    Passive (P) N N Y
    Passive Voter (PV)Y N Y
    Voter Only (V) Y N N

    Key

    Learners are either Active or Passive nodes:
    Learns proposals from other nodes and takes action accordingly. Updates repositories based on proposal (replication).
    Proposers are Active nodes:
    To be able to commit to the node the node must be able to make proposals.
    Acceptors are Voters:
    Accepts proposals from other nodes and whether or not to process or not (ordering/achieve quorum).

    4.5 Disk Usage and Replicated Pushes

    If a file, or set of files, is repeatedly added and removed in Git, the node that is pushed to will store the changes using deltas, resulting in only minor changes to the repository's size.

    If the change is replicated by a push, then it is possible that new blobs will be stored when the file(s) are re-added to the system, meaning the repository size will increase roughly by the size of the files(s) added multiplied by the number of additions.

    Garbage collection (either routine automated housekeeping or manual git gc usage) will reduce the amount of disk space used, to roughly that of the node the changes were originally pushed to.

    If a repo is cloned, rather than pushed, the usage will also reflect the lower figure.

    4.6 Working with non-ASCII character sets

    Commands such as git status use a different method of displaying non-ASCII characters.

    To see the characters rather than escape codes (such as \266) use the following setting on your git client:

    git config core.quotepath false

    See the man git-config page for more details.

    4.7 Content Distribution Policy

    WANdisco's replication protocol separates replication traffic into two streams, the coordination stream which handles agreement between voter nodes and the content distribution stream through which repository changes are passed to all other nodes (that is "learner" nodes that store repository replicas).

    Git MultiSite provides a number of tunable settings that let you apply different policies to content distribution. By default, content distribution is set up to prioritize reliability. You can, instead, set it to prioritize speed of replication. Alternatively you can apply a policy that prioritizes delivery of content to voter nodes. These policies can be applied on a per-site, per-repository and replication group basis providing a fine level of control providing that you take care to set things up properly.

    Changing Content Distribution Policy

    In order to set the policy, you need to make a manual edit to Git MultiSite's Application properties file:

    /opt/wandisco/git-multisite/replicator/properties/application.properties

    Changes require a restart:

    Changing the current strategy requires the modification of properties files that the replicator only reads at start-up. As a result any changes to strategy require that the replicator be restarted before the change will be applied.

    Editable Content Distribution Properties

    content.push.policy=faster
    content.min.learners.required=true
    content.learners.count=1

    Above is an example Content Distribution Policy. We'll breakdown what each of the settings does:

    content.push.policy

    This property sets the priority for Content Distribution. It can have one of three options which set the following behavior. Each option tells MultiSite to use a different calculation for relating replication agreement to the actual delivery of replicated data.

    "reliable" Policy:

    Replication can continue if content available to a sufficient number of learners ( the value of content.learner.count, not including the node itself)

    Reliable is the default setting for the policy. Content must be delivered to a minimum number of nodes, the value of the property (content.min.learners.required, for a proposal to be accepted - which will allow replication to continue.

    Reliable because it greatly reduces the risk that a proposal will be accepted when the corresponding content cannot be delivered (due to a LAN outage etc). This policy is less likely to provide good performance because replication is kept on hold until content has been successfully distributed to a greater number of nodes than would be the case under the "faster" policy.

    Setting the corresponding "content.learner.count" value

    Setting the corresponding "content.min.learners.required" value

    For "reliable" policy that offers the upmost reliability, set this to "true".

    true enforces the requirement

    When content.min.learners.required is set to "true" Git MultiSite will not lower the content.learner.count in light of insufficient learner nodes being available.

    Example:

    content.learner.count=5, content.min.learners.required=true

    After an outage there are now only 4 learner nodes in the replication group - replication will be halted because there aren't enough available learners to validate a content distribution check.

    content.learner.count=5, content.min.learners.required=false

    After an outage there are now only 4 learner nodes in the replication group - replication will not be halted because Git MultiSite will automatically drop the count to ensure that it doesn't exceed the total number of learners in the group.

    "acceptors" Policy:

    Voting can commence if content is delivered to 50% of voters (acceptors), include self if a voter

    Content push policy only deals with delivering content to voters. This policy is ideal if you have a small number of voters. You don't want replication to continue until you have confirmed that at least half the voters have the corresponding payload. This policy supports a "follow-the-sun" approach to replication where the voter status of nodes changes each day to correspond with the locations where most development is being done. This ensures that the sites with the most development activity will get the best commit performance.

    Setting the corresponding "content.learner.count" value

    For "Acceptors" policy this is ignored.

    Setting the corresponding "content.min.learners.required" value

    For "Acceptors" policy this is ignored - learners do not factor into the policy, only voters (acceptors).

    "faster" Policy:

    Replication can continue if content available to x learners (not including self) OR [delivered to half the voters, including self if its a voter] where x = content.learner.count

    Setting the policy to 'faster' lowers the requirement for successful content distribution so that replication can continue when fewer nodes (than under the reliable policy) have received the payload for the proposal. 50% of voters (acceptors) must receive the content. It's faster because if there's a slow or intermittent connection somewhere on the WAN, it wouldn't delay agreement/ slow down replication. It is less reliable because it increases the possibility that the ordering of a proposal can be agreed upon, only for the corresponding content to not get delivered. The result would be a break in replication.

    Setting the corresponding "content.learner.count" value

    Setting the corresponding "content.min.learners.required" value

    For Faster, set this to "false".

    All the acceptors (voters) must also be learners (carry replica data that can be changed).

    If all the acceptors are not learners, we switch to 'reliable' policy with a log message. A node always includes itself in the count - in contrast with the "reliable" policy where a node never includes itself in the count.

    Steps for a policy change

    Use this procedure to change between the above Content Distribution policies.

    1. Make a back up and then edit the /opt/wandisco/git-multisite/replicator/properties/application.properties file (Read more about the properties files).
    2. Change the value of content.min.learners.required, make it "true" for reliability, "false" for speed (default is true).
    3. Save the file and perform a restart of the node.

    Set Policy on a per-state machine basis

    When editing the property, add the state machine identity followed by '.content.push.policy'. e.g.

    <machine_identity_name>.content.push.policy=faster

    The system assigns policy by looking up the state machine policy followed by 'content.push.policy'. If none are available, 'reliable' is chosen. Conditional switch between 'faster' and 'reliable' remains in effect regardless of the policy.

    Example 1 - Faster policy on a 2-node replication group

    Two-node Replication Group, NodeA (Active Voter) and NodeB (Active).

    content.push.policy=faster
    content.min.learners.required=true
    content.learners.count=1

    Example 2 - Acceptors policy on a 4-node replication group

    Four nodes split between two sites. On Site 1 we have NodeA and NodeB, both are Active Voters. On site 2 we have NodeC (AV) and NodeD (A).

    content.push.policy=acceptors
    content.min.learners.required=true
    content.learners.count=1