je/docs/ReplicationGuide/introduction.html
2021-06-06 13:46:45 -04:00

588 lines
28 KiB
HTML
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Chapter 1. Introduction</title>
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
<meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
<link rel="start" href="index.html" title="Getting Started with Berkeley DB, Java Edition High Availability Applications" />
<link rel="up" href="index.html" title="Getting Started with Berkeley DB, Java Edition High Availability Applications" />
<link rel="prev" href="preface.html" title="Preface" />
<link rel="next" href="datamanagement.html" title="Managing Data Guarantees" />
</head>
<body>
<div xmlns="" class="navheader">
<div class="libver">
<p>Library Version 12.2.7.5</p>
</div>
<table width="100%" summary="Navigation header">
<tr>
<th colspan="3" align="center">Chapter 1. Introduction</th>
</tr>
<tr>
<td width="20%" align="left"><a accesskey="p" href="preface.html">Prev</a> </td>
<th width="60%" align="center"> </th>
<td width="20%" align="right"> <a accesskey="n" href="datamanagement.html">Next</a></td>
</tr>
</table>
<hr />
</div>
<div class="chapter" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h2 class="title"><a id="introduction"></a>Chapter 1. Introduction</h2>
</div>
</div>
</div>
<div class="toc">
<p>
<b>Table of Contents</b>
</p>
<dl>
<dt>
<span class="sect1">
<a href="introduction.html#overview">Overview</a>
</span>
</dt>
<dd>
<dl>
<dt>
<span class="sect2">
<a href="introduction.html#intro_repgroup">Replication Group Members</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="introduction.html#repenvirons">Replicated Environments</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="introduction.html#masterselect">Selecting a Master</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="introduction.html#replicationstreams">Replication Streams</a>
</span>
</dt>
</dl>
</dd>
<dt>
<span class="sect1">
<a href="datamanagement.html">Managing Data Guarantees</a>
</span>
</dt>
<dd>
<dl>
<dt>
<span class="sect2">
<a href="datamanagement.html#durability-intro">Durability</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="datamanagement.html#consistency-intro">Managing Data Consistency</a>
</span>
</dt>
</dl>
</dd>
<dt>
<span class="sect1">
<a href="lifecycle.html">Replication Group Life Cycle</a>
</span>
</dt>
<dd>
<dl>
<dt>
<span class="sect2">
<a href="lifecycle.html#lifecycle-terms">Terminology</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="lifecycle.html#nodestates">Node States</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="lifecycle.html#lifecycle-new">New Replication Group Startup</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="lifecycle.html#lifecycle-established">Subsequent Startups</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="lifecycle.html#lifecycle-nodestartup">Replica Startup</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="lifecycle.html#lifecycle-masterfailover">Master Failover</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="lifecycle.html#twonode">Two Node Groups</a>
</span>
</dt>
</dl>
</dd>
</dl>
</div>
<p>
This book provides a thorough introduction to
replication as used with Berkeley DB, Java Edition (JE). It begins by offering a
general overview to replication and the benefits it provides. It also
describes the APIs that you use to implement replication, and it
describes architecturally the things that you need to do to your
application code in order to use the replication APIs.
</p>
<p>
You should understand the concepts from the <em class="citetitle">Berkeley DB, Java Edition Getting Started with Transaction Processing</em>
guide before reading this book.
</p>
<div class="sect1" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h2 class="title" style="clear: both"><a id="overview"></a>Overview</h2>
</div>
</div>
</div>
<div class="toc">
<dl>
<dt>
<span class="sect2">
<a href="introduction.html#intro_repgroup">Replication Group Members</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="introduction.html#repenvirons">Replicated Environments</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="introduction.html#masterselect">Selecting a Master</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="introduction.html#replicationstreams">Replication Streams</a>
</span>
</dt>
</dl>
</div>
<p>
Welcome to the JE High Availability (HA) product. JE HA
is a replicated, single-master, embedded database engine based
on Berkeley DB, Java Edition. JE HA offers important improvements in
application availability, as well as offering improved read
scalability and performance. JE HA does this by extending
the data guarantees offered by a traditional transactional
system to processes running on multiple physical hosts.
</p>
<p>
The JE replication APIs allow you to distribute your
database contents (performed on a read-write Master) to one or
more read-only <span class="emphasis"><em>Replicas</em></span>.
For this reason, JE's replication implementation is said to be a
<span class="emphasis"><em>single master, multiple replica</em></span> replication strategy.
</p>
<p>
Replication offers your application a number of benefits that
can be a tremendous help. Primarily, replication's benefits
revolve around performance, but there is also a benefit in
terms of data durability guarantees.
</p>
<p>
Briefly, some of the reasons why you might choose to implement
replication in your JE application are:
</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>
Improved application availability.
</p>
<p>
By spreading your data across multiple
machines, you can ensure that your
application's data continues to be
available even in the event of a
hardware failure on any given machine in
the replication group.
</p>
</li>
<li>
<p>
Improve read performance.
</p>
<p>
By using replication you can spread data reads across
multiple machines on your network. Doing so allows you
to vastly improve your application's read performance.
This strategy might be particularly interesting for
applications that have readers on remote network nodes;
you can push your data to the network's edges thereby
improving application data read responsiveness.
</p>
</li>
<li>
<p>
Improve transactional commit performance
</p>
<p>
In order to commit a transaction and achieve a
transactional durability guarantee, the commit must be
made <span class="emphasis"><em>durable</em></span>. That is, the commit
must be written to disk (usually, but not always,
synchronously) before the application's thread of
control can continue operations.
</p>
<p>
Replication allows you to batch disk I/O so that it is
performed as efficiently as possible while still
maintaining a degree of durability by <span class="emphasis"><em>committing
to the network</em></span>. In other words, you relax
your transactional durability guarantees on the machine
where you perform the database write,
but by virtue of replicating the data across the
network you gain some additional durability guarantees
beyond what is provided locally.
</p>
</li>
<li>
<p>
Improve data durability guarantee.
</p>
<p>
In a traditional transactional application, you commit your
transactions such that data modifications are saved to
disk. Beyond this, the durability of your data is
dependent upon the backup strategy that you choose to
implement for your site.
</p>
<p>
Replication allows you to increase this durability
guarantee by ensuring that data modifications are
written to multiple machines. This means that multiple
disks, disk controllers, power supplies, and CPUs are
used to ensure that your data modification makes it to
stable storage. In other words, replication allows you
to minimize the problem of a single point of failure
by using more hardware to guarantee your data writes.
</p>
<p>
If you are using replication for this reason, then you
probably will want to configure your application such
that it waits to hear about a successful commit from
one or more replicas before continuing with the next
operation. This will obviously impact your
application's write performance to some degree
— with the performance penalty being largely dependent
upon the speed and stability of the network connecting
your replication group.
</p>
</li>
</ul>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="intro_repgroup"></a>Replication Group Members</h3>
</div>
</div>
</div>
<p>
Processes that take part in a JE HA application are
generically called <span class="emphasis"><em>nodes</em></span>. Most nodes
serve as a read-only Replica. One node in the HA
application can perform database writes. This is the Master node.
</p>
<p>
The sum totality of all the nodes taking part in the
replicated application is called the <span class="emphasis"><em>replication
group</em></span>. While it is only a logical entity
(there is no object that you instantiate and destroy which
represents the replication group), the replication group is
the first-order element of management for a replicated HA
application. It is very important to remember that the
replication group is persistent in that it exists
regardless of whether its member nodes are currently
running. In fact, nodes that have been added to a replication
group (with the exception of Secondary nodes) will remain in
the group until they are manually removed from the group by
you or your application's administrator.
</p>
<p>
Replication groups consist of electable nodes and,
optionally, Monitor and Secondary nodes.
</p>
<p>
<span class="emphasis"><em>Electable</em></span> nodes are replication group
members that can be elected to become the group's Master node
through a <span class="emphasis"><em>replication election</em></span>. Electable
nodes are also the group members that vote in these elections.
If an electable node is not a Master, then it serves in the
replication group as a read-only Replica. Electable nodes have
access to a JE environment, and are persistent members of
the replication group. Electable nodes that are Replicas also
participate in transaction durability decisions by providing the
master with acknowledgments of transaction commits.
</p>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>
Beyond Master and Replica, a node can also be in
several other states. See
<a class="xref" href="lifecycle.html" title="Replication Group Life Cycle">Replication Group Life Cycle</a>
for more information.
</p>
</div>
<p>
Most of the nodes in a replication group are electable
nodes, but it is possible to have nodes of the other types
as well.
</p>
<p>
<span class="emphasis"><em>Secondary nodes</em></span> also have access to a
JE environment, but can only serve as read-only replicas,
not masters, and do not participate in elections. Secondary
nodes can be used to provide read-only data access from
locations with higher latency network connections to the rest
of the replication group without introducing communication
delays into elections. Secondary nodes are not persistent
members of the replication group; they are only considered
members when they are connected to the current master.
Secondary nodes do not participate in transaction durability
decisions.
</p>
<p>
<span class="emphasis"><em>Monitor nodes</em></span> do not have access to a
JE environment and do not participate in elections. For
this reason, they cannot serve as either a Master or a
Replica. Instead, they merely monitor the composition of the
replication group as changes are made by adding and removing
electable nodes, joining and leaving of electable and
secondary nodes, and as elections are held to select a new
Master. Monitor nodes are therefore used by applications
external to the JE replicated application to route data
requests to the various members of the replication
group. Monitor nodes are persistent members of the
replication group. Monitor nodes do not participate in
transaction durability decisions.
</p>
<p>
Note that all nodes in a replication group have a unique
group-wide name. Further, all replication groups are also
assigned a unique name. This is necessary because it is
possible for a single process to have access to multiple
replication groups. Further, any given collection of
hardware can be running multiple replication groups (a
production and a test group, for example.) By uniquely
identifying the replication group with a unique name, it is
possible for JE HA to internally check that nodes have
not been misconfigured and so make sure that messages are
being routed to the correct location.
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="repenvirons"></a>Replicated Environments</h3>
</div>
</div>
</div>
<p>
All electable and secondary nodes must have access to a
database environment. Further, no node can share a database
environment with another node.
</p>
<p>
More to the point, in order to create an electable or
secondary node in a replication group, you use a
specialized form of the environment handle:
<a class="ulink" href="../java/com/sleepycat/je/rep/ReplicatedEnvironment.html" target="_top">ReplicatedEnvironment</a>.
</p>
<p>
There is no JE-specified limit to the number of
environments which can join a replication group.
The only limitation here is one of resources —
network bandwidth, for example.
</p>
<p>
We discuss <a class="ulink" href="../java/com/sleepycat/je/rep/ReplicatedEnvironment.html" target="_top">ReplicatedEnvironment</a> handle usage in
<a class="xref" href="progoverview.html#repenv" title="Using Replicated Environments">Using Replicated Environments</a>.
For an introduction to database environments, see the
<em class="citetitle">Getting Started with Berkeley DB, Java Edition</em> guide.
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="masterselect"></a>Selecting a Master</h3>
</div>
</div>
</div>
<p>
Every replication group is allowed one and only one
Master. Masters are selected by
holding an <span class="emphasis"><em>election</em></span>. All such
elections are performed by the underlying Berkeley DB, Java Edition
replication code.
</p>
<p>
When a node joins a replication group, it attempts to
locate the Master. If it is the first electable node added
to the replication group, then it automatically becomes
the Master. If it is an electable node, but is not the
first to startup in the replication group and it cannot
locate the Master, it calls for an election. Further, if
at any time the Master becomes unavailable to the
replication group, the electable replicas will call for an
election.
</p>
<p>
When holding an election, election participants vote on
who should be the Master. Among the electable nodes
participating in the election, the node with the most
up-to-date set of logs will win the election. In order
to win an election, a node must win a simple majority
of the votes.
</p>
<p>
Usually JE requires a majority of electable nodes to be
available to hold an election. If a simple majority is
not available, then the replication group will no
longer be able to accept write requests as there will
be no Master.
</p>
<p>
Note that an electable node is part of the replication
group even if it is currently not running or is
otherwise unreachable by the rest of the replication
group. Membership of electable nodes in the replication
group is persistent; once an electable node joins the
group, it remains in the group regardless of its current
state. The only way an electable node leaves a
replication group is if you manually remove it from the
group (see
<a class="xref" href="utilities.html#node-addremove" title="Adding and Removing Nodes from the Group">Adding and Removing Nodes from the Group</a>
for details). This is a very important point to remember
when considering elections. <span class="emphasis"><em>An election cannot be held
if the majority of electable nodes in the group are not
running or are otherwise unreachable.</em></span>
</p>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>
There are two circumstances under which a majority
of electable nodes need not be available in order
to hold an election. The first is for the special
circumstance of the two-node group. See <a class="xref" href="two-node.html" title="Configuring Two-Node Groups">Configuring Two-Node Groups</a> for
details.
</p>
<p>
The second circumstance is if you explicitly relax
the requirement for a majority of electable nodes to
be available in order to hold an election. This is a
dangerous thing to do, and your replication group
should rarely (if ever) be configured this way. See
<a class="xref" href="election-override.html" title="Appendix A. Managing a Failure of the Majority">Managing a Failure of the Majority</a>
for more information.
</p>
</div>
<p>
Once a node has been elected Master, it remains in that
role until the replication group has a reason to hold
another election. Currently, the only reason why the group
will try to elect a new Master is if the current Master
becomes unavailable to the group. This can happen
because you shutdown the current Master, the current Master
crashes due to bugs in your application code, or a network
outage causes the current Master to be unreachable by a
majority of the electable nodes in your replication group.
</p>
<p>
In the event of a tie in the number of votes, JE's
underlying implementation of the election code will
pick the Master. Moreover, the election code will
always make a consistent choice when settling a tie.
That is, all things being even, the same node will
always be picked to win a tied election.
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="replicationstreams"></a>Replication Streams</h3>
</div>
</div>
</div>
<p>
Write transactions can only be performed at the Master.
The results of these transactions are replicated to
Replicas using a logical replication stream.
</p>
<p>
Logical replication streams are performed over a TCP/IP
connection. The stream contains a description of the
logical changes (for example, insert, update or delete)
operations that were performed on the database as a result
of the transaction commit. Each such replicated change is
assigned a group-wide unique identifier called a Virtual
Log Sequence Number (VLSN). The VLSN can be used to locate
the replicated change in the log files associated with any
member of the group. Through the use of the VLSN, each
operation described by the replication stream can be
replayed at each Replica using an efficient internal replay
mechanism.
</p>
<p>
A consequence of this logical replaying of a transaction is
that physical characteristics of the log files contained at
the Replicas can be different across the replication group.
The data contents of the environments found across the replication
group, however, should be identical.
</p>
<p>
Note that there is a process by which a non-replicated
environment can be converted such that it has the log
structure and metadata required for replication. See
<a class="xref" href="enablerep.html" title="Converting Existing Environments for Replication">Converting Existing Environments for Replication</a>
for more information.
</p>
</div>
</div>
</div>
<div class="navfooter">
<hr />
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left"><a accesskey="p" href="preface.html">Prev</a> </td>
<td width="20%" align="center"> </td>
<td width="40%" align="right"> <a accesskey="n" href="datamanagement.html">Next</a></td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Preface </td>
<td width="20%" align="center">
<a accesskey="h" href="index.html">Home</a>
</td>
<td width="40%" align="right" valign="top"> Managing Data Guarantees</td>
</tr>
</table>
</div>
</body>
</html>