je/docs/ReplicationGuide/election-override.html
2021-06-06 13:46:45 -04:00

332 lines
16 KiB
HTML
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Appendix A. Managing a Failure of the Majority</title>
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
<meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
<link rel="start" href="index.html" title="Getting Started with Berkeley DB, Java Edition High Availability Applications" />
<link rel="up" href="index.html" title="Getting Started with Berkeley DB, Java Edition High Availability Applications" />
<link rel="prev" href="groupreset.html" title="Resetting a Replication Group" />
</head>
<body>
<div xmlns="" class="navheader">
<div class="libver">
<p>Library Version 12.2.7.5</p>
</div>
<table width="100%" summary="Navigation header">
<tr>
<th colspan="3" align="center">Appendix A. Managing a Failure of the Majority</th>
</tr>
<tr>
<td width="20%" align="left"><a accesskey="p" href="groupreset.html">Prev</a> </td>
<th width="60%" align="center"> </th>
<td width="20%" align="right"> </td>
</tr>
</table>
<hr />
</div>
<div class="appendix" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h2 class="title"><a id="election-override"></a>Appendix A. Managing a Failure of the Majority</h2>
</div>
</div>
</div>
<p>
Normal operation of JE HA requires that at least a simple majority
of electable nodes be available to form a quorum for election of a
new Master, or when committing a transaction with default
durability requirements. The number of electable nodes (the
Electable Group Size) is obtained from persistent internal metadata
that is stored in the environment and replicated across all
members. See <a class="xref" href="lifecycle.html" title="Replication Group Life Cycle">Replication Group Life Cycle</a> for details.
</p>
<p>
Under exceptional circumstances, a simple majority of electable nodes may
become unavailable for some period of time. With only a minority
of electable nodes available, the overall availability of the group can be
adversely affected. For example, the group may be unavailable for
writes because a master cannot be elected. Also, the Master may be
unable to satisfy the durability requirements for a transaction
commit. The group may also be unavailable for reads, because the
absence of a Master might cause a Replica to be unable to meet
consistency requirements.
</p>
<p>
To deal with this exceptional circumstance
— especially if the situation is likely to persist for an
unacceptably long period of time — JE HA provides a
mechanism by which you can modify the way in which the number of
electable nodes, and consequently the quorum requirements for
elections and commit acknowledgments, is calculated. The escape
mechanism provides a way to override the normal computation of the
Electable Group Size. The override is accomplished by specifying
the size using the mutable replication configuration parameter
<a class="ulink" href="../java/com/sleepycat/je/rep/ReplicationMutableConfig.html#ELECTABLE_GROUP_SIZE_OVERRIDE" target="_top">ELECTABLE_GROUP_SIZE_OVERRIDE</a>.
</p>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>
You should use this parameter sparingly, if at all. Overriding
your Electable Group Size can have the consequence of allowing
your replication group's election participants to elect two Masters
simultaneously. This is especially likely to occur if a
majority of the nodes are unavailable due to a network
partition event, and so all nodes are running but are simply
not communicating with one another.
</p>
<p>
<span class="emphasis"><em>Be very cautious when using this configuration
option.</em></span>
</p>
</div>
<div class="sect1" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h2 class="title" style="clear: both"><a id="override-groupsize"></a>Overriding the Electable Group Size</h2>
</div>
</div>
</div>
<div class="toc">
<dl>
<dt>
<span class="sect2">
<a href="election-override.html#set-gsize-override">Setting the Override</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="election-override.html#gsize-override-restore">Restoring the Default State</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="election-override.html#override-example">Override Example</a>
</span>
</dt>
</dl>
</div>
<p>
When you set <a class="ulink" href="../java/com/sleepycat/je/rep/ReplicationMutableConfig.html#ELECTABLE_GROUP_SIZE_OVERRIDE" target="_top">ELECTABLE_GROUP_SIZE_OVERRIDE</a> to a non-zero value, the
number that you provide identifies the number of electable nodes that
are required to meet quorum requirements. This means that the
internally stored Electable Group Size value is ignored (but
not changed) when this option is non-zero. By setting
<a class="ulink" href="../java/com/sleepycat/je/rep/ReplicationMutableConfig.html#ELECTABLE_GROUP_SIZE_OVERRIDE" target="_top">ELECTABLE_GROUP_SIZE_OVERRIDE</a> to the number of electable nodes known to be
available, the remaining replication group participants can
make forward progress, both in terms of electing a new
Master (if this is required) and in terms of meeting durability
and consistency requirements.
</p>
<p>
When this option is zero (0), then the node will behave
normally, and the internal Electable Group Size is honored by
the node. This is the default value and behavior.
</p>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="set-gsize-override"></a>Setting the Override</h3>
</div>
</div>
</div>
<p>
To override the internal Electable Group Size value:
</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>
Verify that the simple majority of electable nodes are in fact
down and cannot elect their own independent Master.
</p>
</li>
<li>
<p>
Set <a class="ulink" href="../java/com/sleepycat/je/rep/ReplicationMutableConfig.html#ELECTABLE_GROUP_SIZE_OVERRIDE" target="_top">ELECTABLE_GROUP_SIZE_OVERRIDE</a> to the number of
electable nodes known to be available. For best
results, set this override on all available
electable nodes.
</p>
<p>
It might be sufficient to set <a class="ulink" href="../java/com/sleepycat/je/rep/ReplicationMutableConfig.html#ELECTABLE_GROUP_SIZE_OVERRIDE" target="_top">ELECTABLE_GROUP_SIZE_OVERRIDE</a>
on just one electable node in order to hold an election, because
the proposer at that one node can conclude the
election. However, if the election results in
Master that is not configured with this override, it
might result in <a class="ulink" href="../java/com/sleepycat/je/rep/InsufficientAcksException.html" target="_top">InsufficientAcksException</a>s at the Master.
So, again, set the override on all available
electable nodes.
</p>
</li>
</ol>
</div>
<p>
Having set the override, the available electable members of the
replication group can now meet quorum requirements.
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="gsize-override-restore"></a>Restoring the Default State</h3>
</div>
</div>
</div>
<p>
Having restored the group to a functioning state by use of
the <a class="ulink" href="../java/com/sleepycat/je/rep/ReplicationMutableConfig.html#ELECTABLE_GROUP_SIZE_OVERRIDE" target="_top">ELECTABLE_GROUP_SIZE_OVERRIDE</a> override, it is desirable
to return the group to its normal state as soon as possible. The
normal operating state is one where the Electable Group
Size is maintained by JE HA, and the override is no longer
used.
</p>
<p>
To restore the group to its normal operational state, do
one of the following:
</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>
Remove from the group any electable nodes that you
know will be down for an extended period of time.
Remove the nodes using the
<a class="ulink" href="../java/com/sleepycat/je/rep/util/ReplicationGroupAdmin.html#removeMember(java.lang.String)" target="_top">ReplicationGroupAdmin.removeMember()</a> API.
</p>
</li>
<li>
<p>
Bring up electable nodes as they once again come on
line, so that they can join the functioning group.
This must be done carefully one node at a time in
order to avoid the small possibility that a majority of the
downed nodes hold an election amongst themselves
and elect a second Master.
</p>
</li>
<li>
<p>
Perform some combination of node removal and
bringing up nodes which were previously down.
</p>
</li>
</ul>
</div>
<p>
As soon as there is a sufficient number of electable nodes
up and running that election quorum requirements can be met in the
absence of the override, the override can be removed, and
normal HA operations resumed.
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="override-example"></a>Override Example</h3>
</div>
</div>
</div>
<p>
Consider a group consisting of 5 electable nodes:
<code class="literal">n1</code>-<code class="literal">n5</code>. Suppose a
simple majority of the nodes
(<code class="literal">n3</code>-<code class="literal">n5</code>) have become
unavailable.
</p>
<p>
If one of the nodes in
<code class="literal">n3</code>-<code class="literal">n5</code> was the
Master, then nodes <code class="literal">n1</code> and
<code class="literal">n2</code> will try to hold an election, and
fail due to the lack of a quorum. We now carry out the steps described, above:
</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>
Verify that <code class="literal">n3</code>-<code class="literal">n5</code> are down.
</p>
</li>
<li>
<p>
Set <a class="ulink" href="../java/com/sleepycat/je/rep/ReplicationMutableConfig.html#ELECTABLE_GROUP_SIZE_OVERRIDE" target="_top">ELECTABLE_GROUP_SIZE_OVERRIDE</a> to 2. Do this
at both <code class="literal">n1</code> and <code class="literal">n2</code>.
You can do this dynamically using JConsole, or by
setting the property in the <code class="filename">je.properties</code> file and
restarting the node.
</p>
</li>
<li>
<p>
<code class="literal">n1</code> and <code class="literal">n2</code>
will choose a new Master, say, <code class="literal">n1</code>.
<code class="literal">n1</code> can now process write
operations, and <code class="literal">n2</code> can
acknowledge transaction commits.
</p>
</li>
<li>
<p>
Suppose that <code class="literal">n3</code> is now repaired.
You can bring it back online and it will
automatically locate the new Master and join the
group. As is normal, it will catch up to
<code class="literal">n1</code> and <code class="literal">n2</code> in
the replication stream, and then begin
acknowledging commits as requested by
<code class="literal">n1</code>.
</p>
</li>
<li>
<p>
We now have three electable nodes that are operational. Because
we have a true simple majority of electable nodes available, we
can now reset <a class="ulink" href="../java/com/sleepycat/je/rep/ReplicationMutableConfig.html#ELECTABLE_GROUP_SIZE_OVERRIDE" target="_top">ELECTABLE_GROUP_SIZE_OVERRIDE</a> to 0
(do this on <code class="literal">n1</code> and <code class="literal">n2</code>),
which causes the replication group to resume normal
operations. Note that <code class="literal">n1</code> remains
the Master.
</p>
</li>
</ol>
</div>
<p>
If <code class="literal">n2</code> was the Master at the time of the
failure, then the situation is similar, except that an
election is not held. In this case, <code class="literal">n2</code> will continue to
remain the Master throughout the entire process described
above. However, <code class="literal">n2</code> might not be able to meet quorum
requirements for transaction commits until step 2 (above) is
performed.
</p>
</div>
</div>
</div>
<div class="navfooter">
<hr />
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left"><a accesskey="p" href="groupreset.html">Prev</a> </td>
<td width="20%" align="center"> </td>
<td width="40%" align="right"> </td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Resetting a Replication Group </td>
<td width="20%" align="center">
<a accesskey="h" href="index.html">Home</a>
</td>
<td width="40%" align="right" valign="top"> </td>
</tr>
</table>
</div>
</body>
</html>