je/docs/ReplicationGuide/lifecycle.html
2021-06-06 13:46:45 -04:00

622 lines
30 KiB
HTML
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Replication Group Life Cycle</title>
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
<meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
<link rel="start" href="index.html" title="Getting Started with Berkeley DB, Java Edition High Availability Applications" />
<link rel="up" href="introduction.html" title="Chapter 1. Introduction" />
<link rel="prev" href="datamanagement.html" title="Managing Data Guarantees" />
<link rel="next" href="progoverview.html" title="Chapter 2. Replication API First Steps" />
</head>
<body>
<div xmlns="" class="navheader">
<div class="libver">
<p>Library Version 12.2.7.5</p>
</div>
<table width="100%" summary="Navigation header">
<tr>
<th colspan="3" align="center">Replication Group Life Cycle</th>
</tr>
<tr>
<td width="20%" align="left"><a accesskey="p" href="datamanagement.html">Prev</a> </td>
<th width="60%" align="center">Chapter 1. Introduction</th>
<td width="20%" align="right"> <a accesskey="n" href="progoverview.html">Next</a></td>
</tr>
</table>
<hr />
</div>
<div class="sect1" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h2 class="title" style="clear: both"><a id="lifecycle"></a>Replication Group Life Cycle</h2>
</div>
</div>
</div>
<div class="toc">
<dl>
<dt>
<span class="sect2">
<a href="lifecycle.html#lifecycle-terms">Terminology</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="lifecycle.html#nodestates">Node States</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="lifecycle.html#lifecycle-new">New Replication Group Startup</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="lifecycle.html#lifecycle-established">Subsequent Startups</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="lifecycle.html#lifecycle-nodestartup">Replica Startup</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="lifecycle.html#lifecycle-masterfailover">Master Failover</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="lifecycle.html#twonode">Two Node Groups</a>
</span>
</dt>
</dl>
</div>
<p>
This section describes how your replication group behaves
over the course of the application's lifetime. Startup is
described, both for new nodes as well as for existing nodes
that are restarting. This section also describes Master
failover.
</p>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="lifecycle-terms"></a>Terminology</h3>
</div>
</div>
</div>
<p>
Before continuing, it is necessary to define some terms
used in this document as they relate to
node membership in a replication group.
</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>
Add/Remove
</p>
<p>
When we say that a node has been persistently
<span class="emphasis"><em>added</em></span> to a replication group,
this means that it has become a persistent member of
the group. Regardless of whether the node is running
or otherwise reachable by the group, once it has been
added to the group it remains a member of the group.
If the added node is an electable node, the group size
used during elections, or transaction commit
acknowledgements, is increased by one. Note that
secondary nodes are not persistent members of the
replication group, so they are not considered to be
persistently added or removed.
</p>
<p>
A node that has been persistently added to a
replication group remains a member of that group
until it is explicitly <span class="emphasis"><em>removed</em></span>
from the group. Once a node has been removed from
the group, it is no longer a member of the group. If
the node that was removed was an electable node, the
group size used during elections, or transaction
commit acknowledgements, is decreased by one.
</p>
</li>
<li>
<p>
Join/Leave
</p>
<p>
We say that a member has <span class="emphasis"><em>joined</em></span> the
replication group when it starts up and begins
operating in the group as an active node.
Electable and secondary nodes join a replication
group by successfully opening a
<a class="ulink" href="../java/com/sleepycat/je/rep/ReplicatedEnvironment.html" target="_top">ReplicatedEnvironment</a> handle. Monitor nodes are
not considered to join a replication group because
they do not actively participate in replication or
elections.
</p>
<p>
A member, then, <span class="emphasis"><em>leaves</em></span> a
replication group by shutting down, or losing the
network contact that allows it to operate as an
active member of the group. When operating
normally, member nodes leave a replication group by
closing their last <a class="ulink" href="../java/com/sleepycat/je/rep/ReplicatedEnvironment.html" target="_top">ReplicatedEnvironment</a> handle.
</p>
<p>
Joining or leaving a group does not change the
electable group size, and so the number of nodes
required to hold an election, as well as the
number of nodes required to acknowledge
transaction commits, does not change.
</p>
</li>
</ul>
</div>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="nodestates"></a>Node States</h3>
</div>
</div>
</div>
<p>
Member nodes can be in the following states:
</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>
Master
</p>
<p>
When in the Master state, a member node can service read and
write requests. At any given time, there can be only one node in the
Master state in the replication group.
</p>
</li>
<li>
<p>
Replica
</p>
<p>
Member nodes in the Replica state can only service
read requests. All of the electable nodes other
than the Master, and all of the secondary nodes,
should be in the Replica state.
</p>
</li>
<li>
<p>
Unknown
</p>
<p>
The member node is not aware of a Master and is actively
trying to discover or elect a Master. A node in this
state is constantly striving to transition to the
more productive Master or Replica state.
</p>
<p>
A node in the Unknown state can still process read
transactions if the node can satisfy its transaction
consistency requirements.
</p>
</li>
<li>
<p>
Detached
</p>
<p>
The member node has been shutdown (that is, it has
left the group, but it has not been removed from the
group — see the previous section). It is still
a member of the replication group, but is not active
in elections or replicating data. Note that
secondary nodes do not remain members when they are
in the detached state; when they lose contact with
the Master, they are no longer considered members of
the group.
</p>
</li>
</ul>
</div>
<p>
Note that from time to time this documentation uses the
term <span class="emphasis"><em>active node</em></span>. An active node is a
member node that is in the Master, Replica or Unknown
state. More to the point, an active node is a node that is
available to participate in elections — if it is an
electable node — and in data replication. Monitor
nodes are not considered active and do not report their
state.
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="lifecycle-new"></a>New Replication Group Startup</h3>
</div>
</div>
</div>
<p>
The first time you start up a replication group using an
electable node, the group
exists (for at least a small time) as a group of size one. At
this time, the single node belonging to the group becomes the
Master. So long as there is only one electable node in the
replication group, that one node behaves as if it is a
non-replicated application. There are some differences in the
format of the log file that the application maintains, but it
otherwise behaves identically to a non-replicated transactional
application.
</p>
<p>
Subsequently, upon startup a new node must be given the contact
information for at least one currently active node in the
replication group in order for it to be added to the
group. The new node contacts this active node
who will identify the Master for the new node.
</p>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>
As is the case with elections, an electable node cannot
be added to the replication group unless a simple
majority of electable nodes are active at the time that
it starts up. If too many nodes are down or otherwise
unavailable, you cannot add a new electable node to the
group.
</p>
</div>
<p>
The new node then contacts the Master, and provides all
necessary identification information about itself to the
Master. This includes host and port information, the
node's unique name, and the replication group name. For
electable nodes, the Master stores this identifying
information about the node persistently, meaning the
effective number of electable members of the replication
group has just grown by one. For secondary nodes, the
information about the node is only maintained while the
secondary node is active; the number of electable
members does not change.
</p>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>
Note that the new electable node is now a permanent member
of the replication group until you manually remove
it. This is true even if you shutdown the node for a long
time. See <a class="xref" href="utilities.html#node-addremove" title="Adding and Removing Nodes from the Group">Adding and Removing Nodes from the Group</a> for details.
</p>
</div>
<p>
Once the new node is an established member of the group, the
Master provides the Replica with the logical logs needed to
replicate the environment. The sequence of logical log
records sent from the Master to the Replica constitutes the
<span class="emphasis"><em>Replication Stream</em></span>. At this time, the
node is said to have <span class="emphasis"><em>joined</em></span> the group.
Once a replication stream is established, it is maintained until either the
Replica or the Master goes down.
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="lifecycle-established"></a>Subsequent Startups</h3>
</div>
</div>
</div>
<p>
Each node stores information about
other persistent replication group members in its replicated
environment so that this information is available to it
upon restart.
</p>
<p>
When a node that is already an established
member of a replication group is restarted, the node uses its
knowledge of other members of the replication group to
locate the Master. It does this by by querying the
members of the group to locate the current Master. If it
finds a Master, the node joins the group and proceeds to operate in the
group as a Replica.
</p>
<p>
If a Master is not available and the restarting node is an
electable node, the node initiates an election so as to
establish a Master. If a simple majority of electable
nodes are available for the election, a Master is
elected. If the restarting node is elected Master, it then
waits for Replicas to connect to it so that it can supply
them a replication stream. If the restarting node is a
secondary node, then it continues to try to find the
Master, waiting for the electable nodes to elect a Master
as needed.
</p>
<p>
Under ordinary circumstances, if a Master cannot be
determined for some reason, the restarting node will fail to
open. However, you can permit the node to instead open
in the UNKOWN state. While in this state, the node is
persistently attempting to find a Master, but it is also
available for read-only requests.
</p>
<p>
To configure a node in this way, use the
<a class="ulink" href="../java/com/sleepycat/je/rep/ReplicationConfig.html#setConfigParam(java.lang.String,java.lang.String)" target="_top">ReplicationConfig.setConfigParam()</a> method to set the
<a class="ulink" href="../java/com/sleepycat/je/rep/ReplicationConfig.html#ENV_UNKNOWN_STATE_TIMEOUT" target="_top">ReplicationConfig.ENV_UNKNOWN_STATE_TIMEOUT</a> parameter.
This parameter requires you to define a Master election
timeout period. If this election timeout expires while
the node is attempting to restart, then the node opens in
the UNKNOWN state instead of failing its open operation
entirely.
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="lifecycle-nodestartup"></a>Replica Startup</h3>
</div>
</div>
</div>
<p>
Regardless of how it happens, when a node joins
a replication group, it contacts the
Master and then goes through the following three steps:
</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>
Handshake
</p>
<p>
The Replica sends the Master its configuration
information, along with the unique name
associated with the Replica's environment. This
name is a pseudo-randomly generated Universal
Unique Identifier (UUID).
</p>
<p>
This handshake establishes the node as a valid
member of the group. It is used both by new nodes
joining the group for the first time, and by
existing nodes that are simply restarting.
</p>
<p>
In addition, during this handshake process, the
Master and Replica nodes will compare their
clocks. If the clocks are too far off from one
another, the handshake will fail and the Replica
node will fail to start up. See
<a class="xref" href="timesync.html" title="Time Synchronization">Time Synchronization</a>
for more information.
</p>
</li>
<li>
<p>
Replication Stream Sync-Up
</p>
<p>
The Replica sends the Master its current position
in the replication stream sequence. The Master
and Replica then negotiate a point in the
replication stream that the Master can use as a
starting point to resume the flow of logical
records to the Replica.
</p>
<p>
Note that normally this sync-up process will be
transparent to your application. However, in rare
cases the sync-up may require that committed
transactions be undone.
</p>
<p>
Also, if the Replica has been offline for a long
time, it is possible that the Master can no
longer supply the Replica with the required contiguous
interval of the replication stream. (This can
happen due to log cleaning on the Master.) In
this case, the log files must be copied to the
restarting node from some other up-to-date node
in the replication group. See
<a class="xref" href="logfile-restore.html" title="Restoring Log Files">Restoring Log Files</a>
for details.
</p>
</li>
<li>
<p>
Steady state replication stream flow
</p>
<p>
Once the Replica has successfully started up and
joined the group, the
Master maintains a flow of log records to the
Replica. Beyond that, the Master will request
acknowledgements from electable Replicas whenever the
Master needs to meet transaction commit
durability requirements.
</p>
</li>
</ol>
</div>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="lifecycle-masterfailover"></a>Master Failover</h3>
</div>
</div>
</div>
<p>
A Master failing or shutting down causes all of the replication streams
between the Master and its various Replicas to terminate.
In reaction, the Replicas transition to the Unknown state
and the electable nodes initiate an election.
</p>
<p>
An election can be held if at least a simple majority of
the replication group's electable nodes are active. The
electable node
that wins the election transitions to the Master state,
and all other active nodes transition to the Replica
state.
</p>
<p>
Upon transitioning to the Replica state, nodes connect to
the new Master and proceed through the handshake,
sync-up, replication replay process described in the
previous section.
</p>
<p>
If no Master can be elected (because a majority of electable nodes
are not available to participate in the election), then
the nodes remain in the Unknown state until such a time
as a Master can be elected. In this state, the nodes
might be able to service read-only requests, but the
replication group is incapable of servicing write
requests. Read requests can be serviced so long as the
transaction's consistency requirements can be met (see
<a class="xref" href="consistency.html" title="Managing Consistency">Managing Consistency</a>).
</p>
<p>
Note that the JE Replication application needs to make
provisions for the following state transitions after
failover:
</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>
A node that transitions from the Replica state to
the Master state as a result of a failover needs
to start accepting update requests. There are
several ways to determine whether a node can
handle update requests. See
<a class="xref" href="replicawrites.html" title="Managing Write Requests at a Replica">Managing Write Requests at a Replica</a>
for more information.
</p>
</li>
<li>
<p>
If a node remains in the Replica state after a
failover, the failover should be transparent to
the application. However, an application may need
to take corrective action in the rare situation
where the sync-up process has to roll back
committed transactions.
</p>
<p>
See <a class="xref" href="txnrollback.html" title="Managing Transaction Rollbacks">Managing Transaction Rollbacks</a>
for an example of how handle a transaction commit roll back.
</p>
</li>
</ul>
</div>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="twonode"></a>Two Node Groups</h3>
</div>
</div>
</div>
<p>
Replication groups comprised of just two electable nodes
represents a unique corner case for JE replication. In
order to elect a master, usually a simple majority of
electable nodes must be available to participate in an
election. However, for replication groups of size two, if
even one electable node is unavailable for the election then
by default it is impossible to hold an election.
</p>
<p>
However, for some classes of application, it is desirable
for the application to proceed with operations using just
one electable node. That is, the application trades off the
durability guarantees offered by using two electable nodes
for the higher availability permissible by allowing the
application to run with just one of the nodes.
</p>
<p>
JE allows you to do this by designating one of the nodes
in a two electable node group as a <span class="emphasis"><em>primary
node</em></span>. When the non-primary node of the pair is
not available, the number of nodes required for a simple
majority is reduced from two to one by the primary
node. Consequently, the primary node is able to elect itself
as the Master. It can then commit transactions that require
a simple majority to acknowledge commits. When the
non-primary node becomes available again, the number of
nodes required for a simple majority at the primary once
again reverts to two.
</p>
<p>
At any given time, there must be either zero or one
electable nodes designated as the primary node, but it is up
to your application to make sure both nodes are not
erroneously designated as the primary. Your application must
be very careful not to mistakenly designate two nodes as the
primary. If this happened, and the two nodes could not
communicate with one another (due to a network malfunction
of some kind, for example), they could both then consider
themselves to be Masters and start accepting write
requests. This violates a fundamental requirement that at
any given instant in time, there should be exactly one node
that is permitted to perform writes on the replicated
environment.
</p>
<p>
Note that the non-primary electable node always needs two
electable nodes for a simple majority, so it can never
become the Master in the absence of the primary node. If the
primary node fails, you can make provisions to swap the
primary and non-primary designations so that the surviving
node is now the primary. This swap must be performed
carefully so as to ensure that both nodes are not
concurrently designated the primary. The most important
thing is that the failed node comes up as the non-primary
after it has been repaired.
</p>
<p>
For more information on using two-node groups, see
<a class="xref" href="two-node.html" title="Configuring Two-Node Groups">Configuring Two-Node Groups</a>.
</p>
</div>
</div>
<div class="navfooter">
<hr />
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left"><a accesskey="p" href="datamanagement.html">Prev</a> </td>
<td width="20%" align="center">
<a accesskey="u" href="introduction.html">Up</a>
</td>
<td width="40%" align="right"> <a accesskey="n" href="progoverview.html">Next</a></td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Managing Data Guarantees </td>
<td width="20%" align="center">
<a accesskey="h" href="index.html">Home</a>
</td>
<td width="40%" align="right" valign="top"> Chapter 2. Replication API First Steps</td>
</tr>
</table>
</div>
</body>
</html>