mirror of
https://github.com/berkeleydb/je.git
synced 2024-11-15 01:46:24 +00:00
331 lines
16 KiB
HTML
331 lines
16 KiB
HTML
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||
<head>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
||
<title>Appendix A. Managing a Failure of the Majority</title>
|
||
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
|
||
<meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
|
||
<link rel="start" href="index.html" title="Getting Started with Berkeley DB, Java Edition High Availability Applications" />
|
||
<link rel="up" href="index.html" title="Getting Started with Berkeley DB, Java Edition High Availability Applications" />
|
||
<link rel="prev" href="groupreset.html" title="Resetting a Replication Group" />
|
||
</head>
|
||
<body>
|
||
<div xmlns="" class="navheader">
|
||
<div class="libver">
|
||
<p>Library Version 12.2.7.5</p>
|
||
</div>
|
||
<table width="100%" summary="Navigation header">
|
||
<tr>
|
||
<th colspan="3" align="center">Appendix A. Managing a Failure of the Majority</th>
|
||
</tr>
|
||
<tr>
|
||
<td width="20%" align="left"><a accesskey="p" href="groupreset.html">Prev</a> </td>
|
||
<th width="60%" align="center"> </th>
|
||
<td width="20%" align="right"> </td>
|
||
</tr>
|
||
</table>
|
||
<hr />
|
||
</div>
|
||
<div class="appendix" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h2 class="title"><a id="election-override"></a>Appendix A. Managing a Failure of the Majority</h2>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<p>
|
||
Normal operation of JE HA requires that at least a simple majority
|
||
of electable nodes be available to form a quorum for election of a
|
||
new Master, or when committing a transaction with default
|
||
durability requirements. The number of electable nodes (the
|
||
Electable Group Size) is obtained from persistent internal metadata
|
||
that is stored in the environment and replicated across all
|
||
members. See <a class="xref" href="lifecycle.html" title="Replication Group Life Cycle">Replication Group Life Cycle</a> for details.
|
||
</p>
|
||
<p>
|
||
Under exceptional circumstances, a simple majority of electable nodes may
|
||
become unavailable for some period of time. With only a minority
|
||
of electable nodes available, the overall availability of the group can be
|
||
adversely affected. For example, the group may be unavailable for
|
||
writes because a master cannot be elected. Also, the Master may be
|
||
unable to satisfy the durability requirements for a transaction
|
||
commit. The group may also be unavailable for reads, because the
|
||
absence of a Master might cause a Replica to be unable to meet
|
||
consistency requirements.
|
||
</p>
|
||
<p>
|
||
To deal with this exceptional circumstance
|
||
— especially if the situation is likely to persist for an
|
||
unacceptably long period of time — JE HA provides a
|
||
mechanism by which you can modify the way in which the number of
|
||
electable nodes, and consequently the quorum requirements for
|
||
elections and commit acknowledgments, is calculated. The escape
|
||
mechanism provides a way to override the normal computation of the
|
||
Electable Group Size. The override is accomplished by specifying
|
||
the size using the mutable replication configuration parameter
|
||
<a class="ulink" href="../java/com/sleepycat/je/rep/ReplicationMutableConfig.html#ELECTABLE_GROUP_SIZE_OVERRIDE" target="_top">ELECTABLE_GROUP_SIZE_OVERRIDE</a>.
|
||
</p>
|
||
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
|
||
<h3 class="title">Note</h3>
|
||
<p>
|
||
You should use this parameter sparingly, if at all. Overriding
|
||
your Electable Group Size can have the consequence of allowing
|
||
your replication group's election participants to elect two Masters
|
||
simultaneously. This is especially likely to occur if a
|
||
majority of the nodes are unavailable due to a network
|
||
partition event, and so all nodes are running but are simply
|
||
not communicating with one another.
|
||
</p>
|
||
<p>
|
||
<span class="emphasis"><em>Be very cautious when using this configuration
|
||
option.</em></span>
|
||
</p>
|
||
</div>
|
||
<div class="sect1" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h2 class="title" style="clear: both"><a id="override-groupsize"></a>Overriding the Electable Group Size</h2>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="toc">
|
||
<dl>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="election-override.html#set-gsize-override">Setting the Override</a>
|
||
</span>
|
||
</dt>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="election-override.html#gsize-override-restore">Restoring the Default State</a>
|
||
</span>
|
||
</dt>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="election-override.html#override-example">Override Example</a>
|
||
</span>
|
||
</dt>
|
||
</dl>
|
||
</div>
|
||
<p>
|
||
When you set <a class="ulink" href="../java/com/sleepycat/je/rep/ReplicationMutableConfig.html#ELECTABLE_GROUP_SIZE_OVERRIDE" target="_top">ELECTABLE_GROUP_SIZE_OVERRIDE</a> to a non-zero value, the
|
||
number that you provide identifies the number of electable nodes that
|
||
are required to meet quorum requirements. This means that the
|
||
internally stored Electable Group Size value is ignored (but
|
||
not changed) when this option is non-zero. By setting
|
||
<a class="ulink" href="../java/com/sleepycat/je/rep/ReplicationMutableConfig.html#ELECTABLE_GROUP_SIZE_OVERRIDE" target="_top">ELECTABLE_GROUP_SIZE_OVERRIDE</a> to the number of electable nodes known to be
|
||
available, the remaining replication group participants can
|
||
make forward progress, both in terms of electing a new
|
||
Master (if this is required) and in terms of meeting durability
|
||
and consistency requirements.
|
||
</p>
|
||
<p>
|
||
When this option is zero (0), then the node will behave
|
||
normally, and the internal Electable Group Size is honored by
|
||
the node. This is the default value and behavior.
|
||
</p>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="set-gsize-override"></a>Setting the Override</h3>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<p>
|
||
To override the internal Electable Group Size value:
|
||
</p>
|
||
<div class="orderedlist">
|
||
<ol type="1">
|
||
<li>
|
||
<p>
|
||
Verify that the simple majority of electable nodes are in fact
|
||
down and cannot elect their own independent Master.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Set <a class="ulink" href="../java/com/sleepycat/je/rep/ReplicationMutableConfig.html#ELECTABLE_GROUP_SIZE_OVERRIDE" target="_top">ELECTABLE_GROUP_SIZE_OVERRIDE</a> to the number of
|
||
electable nodes known to be available. For best
|
||
results, set this override on all available
|
||
electable nodes.
|
||
</p>
|
||
<p>
|
||
It might be sufficient to set <a class="ulink" href="../java/com/sleepycat/je/rep/ReplicationMutableConfig.html#ELECTABLE_GROUP_SIZE_OVERRIDE" target="_top">ELECTABLE_GROUP_SIZE_OVERRIDE</a>
|
||
on just one electable node in order to hold an election, because
|
||
the proposer at that one node can conclude the
|
||
election. However, if the election results in
|
||
Master that is not configured with this override, it
|
||
might result in <a class="ulink" href="../java/com/sleepycat/je/rep/InsufficientAcksException.html" target="_top">InsufficientAcksException</a>s at the Master.
|
||
So, again, set the override on all available
|
||
electable nodes.
|
||
</p>
|
||
</li>
|
||
</ol>
|
||
</div>
|
||
<p>
|
||
Having set the override, the available electable members of the
|
||
replication group can now meet quorum requirements.
|
||
</p>
|
||
</div>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="gsize-override-restore"></a>Restoring the Default State</h3>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<p>
|
||
Having restored the group to a functioning state by use of
|
||
the <a class="ulink" href="../java/com/sleepycat/je/rep/ReplicationMutableConfig.html#ELECTABLE_GROUP_SIZE_OVERRIDE" target="_top">ELECTABLE_GROUP_SIZE_OVERRIDE</a> override, it is desirable
|
||
to return the group to its normal state as soon as possible. The
|
||
normal operating state is one where the Electable Group
|
||
Size is maintained by JE HA, and the override is no longer
|
||
used.
|
||
</p>
|
||
<p>
|
||
To restore the group to its normal operational state, do
|
||
one of the following:
|
||
</p>
|
||
<div class="itemizedlist">
|
||
<ul type="disc">
|
||
<li>
|
||
<p>
|
||
Remove from the group any electable nodes that you
|
||
know will be down for an extended period of time.
|
||
Remove the nodes using the
|
||
<a class="ulink" href="../java/com/sleepycat/je/rep/util/ReplicationGroupAdmin.html#removeMember(java.lang.String)" target="_top">ReplicationGroupAdmin.removeMember()</a> API.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Bring up electable nodes as they once again come on
|
||
line, so that they can join the functioning group.
|
||
This must be done carefully one node at a time in
|
||
order to avoid the small possibility that a majority of the
|
||
downed nodes hold an election amongst themselves
|
||
and elect a second Master.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Perform some combination of node removal and
|
||
bringing up nodes which were previously down.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</div>
|
||
<p>
|
||
As soon as there is a sufficient number of electable nodes
|
||
up and running that election quorum requirements can be met in the
|
||
absence of the override, the override can be removed, and
|
||
normal HA operations resumed.
|
||
</p>
|
||
</div>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="override-example"></a>Override Example</h3>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<p>
|
||
Consider a group consisting of 5 electable nodes:
|
||
<code class="literal">n1</code>-<code class="literal">n5</code>. Suppose a
|
||
simple majority of the nodes
|
||
(<code class="literal">n3</code>-<code class="literal">n5</code>) have become
|
||
unavailable.
|
||
</p>
|
||
<p>
|
||
If one of the nodes in
|
||
<code class="literal">n3</code>-<code class="literal">n5</code> was the
|
||
Master, then nodes <code class="literal">n1</code> and
|
||
<code class="literal">n2</code> will try to hold an election, and
|
||
fail due to the lack of a quorum. We now carry out the steps described, above:
|
||
</p>
|
||
<div class="orderedlist">
|
||
<ol type="1">
|
||
<li>
|
||
<p>
|
||
Verify that <code class="literal">n3</code>-<code class="literal">n5</code> are down.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Set <a class="ulink" href="../java/com/sleepycat/je/rep/ReplicationMutableConfig.html#ELECTABLE_GROUP_SIZE_OVERRIDE" target="_top">ELECTABLE_GROUP_SIZE_OVERRIDE</a> to 2. Do this
|
||
at both <code class="literal">n1</code> and <code class="literal">n2</code>.
|
||
You can do this dynamically using JConsole, or by
|
||
setting the property in the <code class="filename">je.properties</code> file and
|
||
restarting the node.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
<code class="literal">n1</code> and <code class="literal">n2</code>
|
||
will choose a new Master, say, <code class="literal">n1</code>.
|
||
<code class="literal">n1</code> can now process write
|
||
operations, and <code class="literal">n2</code> can
|
||
acknowledge transaction commits.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Suppose that <code class="literal">n3</code> is now repaired.
|
||
You can bring it back online and it will
|
||
automatically locate the new Master and join the
|
||
group. As is normal, it will catch up to
|
||
<code class="literal">n1</code> and <code class="literal">n2</code> in
|
||
the replication stream, and then begin
|
||
acknowledging commits as requested by
|
||
<code class="literal">n1</code>.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
We now have three electable nodes that are operational. Because
|
||
we have a true simple majority of electable nodes available, we
|
||
can now reset <a class="ulink" href="../java/com/sleepycat/je/rep/ReplicationMutableConfig.html#ELECTABLE_GROUP_SIZE_OVERRIDE" target="_top">ELECTABLE_GROUP_SIZE_OVERRIDE</a> to 0
|
||
(do this on <code class="literal">n1</code> and <code class="literal">n2</code>),
|
||
which causes the replication group to resume normal
|
||
operations. Note that <code class="literal">n1</code> remains
|
||
the Master.
|
||
</p>
|
||
</li>
|
||
</ol>
|
||
</div>
|
||
<p>
|
||
If <code class="literal">n2</code> was the Master at the time of the
|
||
failure, then the situation is similar, except that an
|
||
election is not held. In this case, <code class="literal">n2</code> will continue to
|
||
remain the Master throughout the entire process described
|
||
above. However, <code class="literal">n2</code> might not be able to meet quorum
|
||
requirements for transaction commits until step 2 (above) is
|
||
performed.
|
||
</p>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="navfooter">
|
||
<hr />
|
||
<table width="100%" summary="Navigation footer">
|
||
<tr>
|
||
<td width="40%" align="left"><a accesskey="p" href="groupreset.html">Prev</a> </td>
|
||
<td width="20%" align="center"> </td>
|
||
<td width="40%" align="right"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td width="40%" align="left" valign="top">Resetting a Replication Group </td>
|
||
<td width="20%" align="center">
|
||
<a accesskey="h" href="index.html">Home</a>
|
||
</td>
|
||
<td width="40%" align="right" valign="top"> </td>
|
||
</tr>
|
||
</table>
|
||
</div>
|
||
</body>
|
||
</html>
|