|
|
|||||||
<DConeNet> <HttpDisable> ... </HttpDisable> <HttpTimedoutMessage> .. </HttpTimedoutMessage>The HttpDisable element can be set to true or false to enable (default is true) or disable the http/web access to the DConeNet port. The HttpTimedoutMessage can be used to setup a customized message that gets displayed in the web browser when an administrative action that requires replication gets a time-out.
Transports are specified within the MemberList/Member element for a given member.
<MemberList>
<Member name="3bfbf219-2918-11d7-80c5-00065be02953">
<Profiles>
<TransportPolicy>..
<Transport>..
...
... <Transport> <SOAPTransport>http://localhost:7005/dcone/services/dconeEndPoint</SOAPTransport> </Transport> ...
......
<Transport>
<DConeNet>
<ListenerIP>sanfranciso-replicator</ListenerIP>
<ListenerPort>6020</ListenerPort>
<TimeToConnect>750</TimeToConnect>
</DConeNet>
</Transport>
<Transport>
<InProcess>true</InProcess>
</Transport>
The various tunables are specified under the XML element Synod.
<Synod>
...
</Synod>
Some of the less commonly used one are specified under org/nirala/synod tree:
<org>
<nirala>
<synod>
....
<Synod>
<AcceptorTimeout>20000</AcceptorTimeout>
<AggregatorTimeout>20000</AggregatorTimeout>
<learnerTimeout>20000</learnerTimeout>
</Synod>
Deactivation policy is specified via :
<Synod>
<ActivationManager>activation-policy</ActivationManager>
...
The activation-policy can be set to:
If using delayed deactivation, a delay in milli-seconds can be specified:
<Synod>
<ActivationManager>org.nirala.synod.activation.Delayed</ActivationManager>
<DeactivationDelay>2000</DeactivationDelay>
It makes sense to use delayed deactivation if remote nodes go down frequently and want to re-learn finished agreements from this node. Delayed deactivation
avoids frequent swapping from disk to memory for such scenarios.
<Synod>
<Optimizations>
<UseFastPropose>true</UseFastPropose>
<UseWeakReservation>true</UseWeakReservation>
<UseDistinguishedRoundNumbers>true</UseDistinguishedRoundNumbers>
<PresumeConsensus>true</PresumeConsensus>
</Optimizations>
</Synod>
<org>
<nirala>
<synod>
<heuristics>
<friendliness>1.0f</friendliness>
<!-- A single multiplier for all the backoff tunables
fixedCompopnentOfInitialValue *= friendliness;
groupSizeMultiplier *= friendliness;
upperBoundMultiple *= friendliness;
gracePeriodMultiplier *= friendliness;
estimatedRoundTripDelayMultiplier *= friendliness;
/-->
</heuristics>
</synod>
You can either specify a friendliness multiplier or modify individual tunables.
In addition to supporting declarative quorum policies in prefs file, developers can easily plug-in custom quorum implementations.
<Quorum>
<!--
Default is
org.nirala.group.quorum.UnanimousResponseQuorum -->
<DefaultClass>org.nirala.group.quorum.MajorityResponseQuorum</DefaultClass>
<DistinguishedNode>3bfbf219-2918-11d7-80c5-00065be02953</DistinguishedNode>
</Quorum>
Preferences file based policies are:
<AgreementManagerList>
<AgreementManager name="52ec6735-ce20-11d9-8e57-00065be02953">
<DisplayName>cvs-am</DisplayName>
<Quorum>
<Schedule>
<at name="12:00:23 AM">
<DistinguishedNode>fb7723de-ce1e-11d9-ae57-00065be02953</DistinguishedNode>
</at>
<at name="12:20 AM">
<DistinguishedNode>659a768d-ce1f-11d9-aeec-00065be02953</DistinguishedNode>
</at>
<at name="12:40 AM">
<DistinguishedNode>3fae40f3-ce20-11d9-8e6a-00065be02953</DistinguishedNode>
</at>
</Schedule>
</Quorum>
</AgreementManager>
</AgreementManagerList>
This schedule will result in distinguished node being changed to fb7723de-ce1e-11d9-ae57-00065be02953 every morning at 12:00:23 AM local time. The time syntax is - hh:mm:ss AM
<ThreadPool>
<AgreementThreads>10</AgreementThreads>
<AgreementBufferSize>1000</AgreementBufferSize>
</ThreadPool>
The AgreementBufferSize is the size of the queue of waiters, when
all the agreement threads are busy. If the queue becomes full, application
will see throttling via proposal submit calls blocking till a slot in
the queue or a thread becomes available.
<MessageQueue>
<MaxThreads>5</MaxThreads>
<BufferSize>5000</BufferSize>
<DepositTimeout>1000</DepositTimeout>
</MessageQueue>
MaxThreads specifies the maximum number of threads this stage will ever use. BufferSize is the size of the queue of waiters. DepositTimeout in milli-seconds specifies the timeout at which the message is discarded. Remember - DCone is tolerant to message loss, so this just helps throttle an over-loaded server.
For both the IO models, DCone supports independent tuning of TCP socket reader and writer stages.
Blocking IO
With an efficient thread scheduler like Linux NPTL(Native Posix Thread Library), blocking IO model can be faster! On a Linux kernel with NPTL, we have seen it scale nicely to over 4000 persistent connections. It is the default model. To use it set UseNonBlockingIO false or don't specify the option.
<DConeNet>
<!-- This translates to thread per connection model for both
network reader and writers /-->
<UseNonBlockingIO>false</UseNonBlockingIO>
<ReadStage>
<!-- In a thread per conn model, this is really the max
conns we can do -->
<MaxThreads>5</MaxThreads>
</ReadStage>
<WriteStage>
<!-- -1 means no timeout /
<ThreadKeepAliveTimeOut>-1</ThreadKeepAliveTimeOut-->
<!-- In a thread per conn model, this is really the max
conns we can do -->
<MaxThreads>5</MaxThreads>
</WriteStage>
</DConeNet>
With blocking IO model, DCone uses a thread-per-connection strategy. MaxThreads specifies the maximum number of connections in the read/write stage. To
control the lifetime of IO threads, you can set a ThreadKeepAliveTimeOut
timeout.
Non-Blocking IO
For very large fan-in/fan-out or long latency connections, non-blocking IO model may give better performance. It certainly scales more gracefully. It uses poll, /dev/poll etc underneath.
<DConeNet>
<!-- Default is blocking IO -->
<UseNonBlockingIO>true</UseNonBlockingIO>
<ConnectionKeepAliveTime>1800000</ConnectionKeepAliveTime>
<ReadStage>
<ReactorKeepAliveTimeOut>300</ReactorKeepAliveTimeOut>
<MaxThreads>5</MaxThreads>
</ReadStage>
<WriteStage>
<!-- -1 means no timeout /-->
<ThreadKeepAliveTimeOut>1000</ThreadKeepAliveTimeOut>
<MaxConnectionsPerThread>6</MaxConnectionsPerThread>
<MaxThreads>5</MaxThreads>
<!-- default is 100 /-->
<MaxWriteMessagesOutstanding>50000</MaxWriteMessagesOutstanding>
<UseBlockingConnect>false</UseBlockingConnect>
</WriteStage>
ConnectionKeepAliveTime specifies the inactivity timeout for an idle persistent connection. If the persistent connection has not seen any read or write activity in the specified time interval, it is closed. This may cause the connectivity agent to create a new connection if the endpoint is left without a single connection. This fact can be used to deal with buggy NAT/port forwarding devices that reset connections without sending a TCP level reset to endpoints.
ReactorKeepAliveTimeOut specifies the idle time for a read reactor. With non-blocking IO, reactive IO is used for reads. Each reactor will handle multiple connections. MaxThreads really means the maximum number of read reactors.
MaxThreads for the write stage specifies the maximum number of write reactors that can be active at the same time. MaxWriteMessagesOutstanding specifies the threshold at which a new write reactor will be created. For a long latency WAN, UseBlockingConnect set to false will ensure TCP connection establishment happens in the background with blocking.
Common Recovery options are:
<Recovery>
<isEnabled>true</isEnabled>
<!-- resetRepositories if true will nuke existing repositories -->
<resetRepositories>true</resetRepositories>
<trackApplicationStatus>true</trackApplicationStatus>
<useJdbc>true</useJdbc>
<proposalRepositoryType>JDBC</proposalRepositoryType>
<agreementRepositoryType>RecoveryJournal</agreementRepositoryType>
<!-- db vendor-neutral config goes here -->
<JdbcStore>
<ds-user></ds-user>
<ds-password></ds-password>
<ds-url>/mysql/synod</ds-url>
</JdbcStore>
</Recovery>
To enable or disable, set isEnabled to true or false. Warning: a node can not recover from failure if this setting is false. resetRepositories lets you reset all the underlying persistent repositories.Warning: if set to true, all recovery context of a node will be deleted at startup time, so, in effect, a node can not recover from failure. trackApplicationStatus is used by applications to track completion of an agreement step from application's view point. DCone will re-deliver the agreement proposal, if the application status bit indicates not done. This is typically used in conjunction with our programmatic API to set the bit from within the application.
useJdbc if set to true will cause JDBC data sources to be registered with DCone's JNDI provider.
agreementRepositoryType and proposalRepositoryType can be set to JDBC or RecoveryJournal.
If a JDBC based recovery repository is being used, JDBC database vendor-neutral configuration can be specified with the Recovery element as:
<Recovery>
....
<!-- db vendor-neutral config goes here -->
<JdbcStore>
<ds-user></ds-user>
<ds-password></ds-password>
<ds-url></ds-url>
</JdbcStore>
</Recovery>
The usual JDBC data source user/password and URL can be specified
here.
<Recovery>
<isEnabled>true</isEnabled>
<useJdbc>false</useJdbc>
<proposalRepositoryType>RecoveryJournal</proposalRepositoryType>
<agreementRepositoryType>RecoveryJournal</agreementRepositoryType>
</Recovery>
<RecoveryJournal>
<UseSynchThread>..</UseSynchThread>
<UseNIO>..</UseNIO>
<JournalDir>..</JournalDir>
<flushMethod>..</flushMethod>
<AlignBlock>..</AlignBlock>
<BlockSize>..</BlockSize>
<BucketSize>..</BucketSize>
<MaxJournalFileSize>...</MaxJournalFileSize>
<DiskMonEnable>..</DiskMonEnable>
<DiskMonInterval>..</DiskMonInterval>
<DiskMonCriticalLevel>..</DiskMonCriticalLevel>
<DiskMonWarningLevel>..</DiskMonWarningLevel>
</RecoveryJournal>
The built-in recovery journal can be configured with the following prefs:
UseSynchThread by default is true. If there is concurrency within the DCone stage, turning this option can improve disk throughput considerably. It allows our persistence mechanism to club multiple small writes into fewer bigger writes. If the application is completely sequential then turning this option off (false) and tuning other prefs may be a better idea.
UseNIO if set to true, lets you take advantage of other IO options like flushMethod ,AlignBlock, mmap IO for reads.
flushMethod can be set to rws or rwd or fsync or fdatasync. The rws and rwd options can only be used with UseSynchThread set to false.
The rws and rwd option basically map to POSIX O_SYNC or O_DSYNC options respectively. What that means it use synchronous IO with or without metadata sync.
The fsync and fdatasync options have the usual POSIX semantics. They can only be used with UseSynchThread and UseNIO set to true.
If using UseNIO option, considerable performance improvement can be obtained via setting AlignBlock to true. Default is false as it results in very large journal files. Default block size is 512 bytes. It can be tuned via BlockSize specified in bytes.
The JournalDir specifies the path to a directory that will contain the DCone journals. Default is to use the value specified by system property, java.io.tmpdir.
The BucketSize option specifies how many objects per bucket. Default is 10000 objects. Setting it to a large value will increase memory footprint, smaller value could lead to more disk IO.
The default MaxJournalFileSize is 500Mbytes. It specifies the threshold at which journal files are rotated.
The DiskMonEnable option is true by default, it turns on disk monitoring for free space. The disk monitoring interval can be specified via DiskMonInterval, defaults to 15 minutes. Interval can be specified in milli-seconds or using our abbreviated syntax - {interval}s
m
h, for example 600s. When the disk usage reaches a warning level, web dashboard will show a red-alert, if an email is specified at startup, an email alert is also generated. The default warning level is 75% disk used. When the disk usage reaches a critical level, defaults to 95% full, the DCone process is shutdown to avoid any corruption issue when disk is full. The levels can be adjusted using DiskMonCriticalLevel and DiskMonWarningLevel. The number specified corresponds to percentage of disk in use.
Distributed Garbage Collection Related Options
For individual repositories, the following options are provided:
<RecoveryJournal>
<AgreementRepository>
<!-- default is 20000 -->
<DeactivationCushion>0</DeactivationCushion>
<!-- default is 3 minutes -->
<DeactivationInterval>2000</DeactivationInterval>
</AgreementRepository>
<ProposalRepository>
<!-- default is 70000 -->
<GCCushion>0</GCCushion>
<!-- default is 10 minutes -->
<GCInterval>5000</GCInterval>
</ProposalRepository>
</RecoveryJournal>
The agreement repository caches information (concluded agreement instances) needed to help other replicas recover from failures. If this information is not cached, it can still be read from disk. DeactivationCushion specifies the minimum number of concluded agreement instances cached. DeactivationInterval specifies how often the concluded agreement instances are identified for removal from the cache.
The proposal repository can be garbage collected locally without needing distributed coordination. The cushion and interval settings have the same semantics as above.
Performance Tips
<Recovery>
<isEnabled>true</isEnabled>
<useJdbc>false</useJdbc>
....
<JdbcStore>
<ds-user></ds-user>
<ds-password></ds-password>
<ds-url>/mysql/synod</ds-url>
<maxConnectionPoolSize>10</maxConnectionPoolSize>
<connectionWaitTime>-1</connectionWaitTime>
<connectionKeepAliveTime>10000</connectionKeepAliveTime>
</JdbcStore>
</Recovery>
<JdbcConfig>
<!-- db Vendor specific config goes here -->
<MySQL>
<!-- Enter the URL to the Synod Recovery Database -->
<URL>jdbc:mysql://localhost:3306/synod</URL>
</MySQL>
</JdbcConfig>
Just specify the URL as required by the vendor of the JDBC driver. Make
sure the jdbc driver is on the classpath of DCone.
DCone has a built-in implementation of a JDBC connection pool. It is on by default with a maxConnectionPoolSize of 100. You can tune the pool prefs by setting connectionWaitTime to the time in milli-seconds to wait to grab a JDBC connection. The connectionKeepAliveTime specified in milli-seconds controls the idle time for a JDBC connection.
<DistributedGC>
<enable>true</enable>
<!-- default is 2 hours -->
<interval>10m</interval>
</DistributedGC>
It is enabled by default. The distributed garbage collection interval can be specified
as a number with a suffix s or m or h to denote seconds, minutes or
hours.
<FileBasedProposal>
<!-- default is false -->
<enableCRC32>true</enableCRC32>
<!-- default is false -->
<enableMD5>true</enableMD5>
</FileBasedProposal>
These prefs are of no consequence to applications which do not use file-based proposals.
Both CRC32 and MD5 checksums are disabled by default.
There is usually no need to enable these preferences, unless corruption of file-based proposals in transit is suspected.
<GlobalSequence>
<HoleFillerTimeout>1m</HoleFillerTimeout>
<DeliverInLocalSequence>true or false</DeliverInLocalSequence>
</GlobalSequence>
HoleFillerTimeout is specified as a number with a suffix s or m or h to denote seconds, minutes or hours. It controls how frequently DCone proactively
tries to learn the outcomes of missing agreements and, if needed, fills up the holes with no-op proposals. Holes can be created if, for example, a node goes down
for some time and then restarts. In the meantime the overall distributed system may have moved ahead (if not using Unanimous quorum). The restarted node then
needs to learn the missing agreements or holes.
DeliverInLocalSequence if set to true, ensures that events submitted for agreement at a local node obey the local submission sequence. If set to false, DCone still ensures a consistent global ordering but ignores the local sequence. The CVS Replicator product sets DeliverInLocalSequence to false, as there isn't any dependency between incoming local CVS requests.
| Copyright © 2005 WANdisco | Sitemap | Privacy Policy | User Agreement | Contact Us |