je/docs/ReplicationGuide/datamanagement.html
2021-06-06 13:46:45 -04:00

279 lines
14 KiB
HTML
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Managing Data Guarantees</title>
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
<meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
<link rel="start" href="index.html" title="Getting Started with Berkeley DB, Java Edition High Availability Applications" />
<link rel="up" href="introduction.html" title="Chapter 1. Introduction" />
<link rel="prev" href="introduction.html" title="Chapter 1. Introduction" />
<link rel="next" href="lifecycle.html" title="Replication Group Life Cycle" />
</head>
<body>
<div xmlns="" class="navheader">
<div class="libver">
<p>Library Version 12.2.7.5</p>
</div>
<table width="100%" summary="Navigation header">
<tr>
<th colspan="3" align="center">Managing Data Guarantees</th>
</tr>
<tr>
<td width="20%" align="left"><a accesskey="p" href="introduction.html">Prev</a> </td>
<th width="60%" align="center">Chapter 1. Introduction</th>
<td width="20%" align="right"> <a accesskey="n" href="lifecycle.html">Next</a></td>
</tr>
</table>
<hr />
</div>
<div class="sect1" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h2 class="title" style="clear: both"><a id="datamanagement"></a>Managing Data Guarantees</h2>
</div>
</div>
</div>
<div class="toc">
<dl>
<dt>
<span class="sect2">
<a href="datamanagement.html#durability-intro">Durability</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="datamanagement.html#consistency-intro">Managing Data Consistency</a>
</span>
</dt>
</dl>
</div>
<p>
All replicated applications are first transactional
applications. This means that you have the standard data
guarantee issues to consider, all of which have to do with
how durable and consistent you want your data to be. Of
course, considerations of this nature also play a role in
your application's performance. These issues are even more
important for replicated applications because replication
adds additional dimensions to them.
</p>
<p>
Notably, in a replicated application you must decide how
durable your data is, by deciding how careful the Master will
be to make sure a data write has been written to disk on its
various Replica nodes before completing the transaction.
</p>
<p>
Consistency also adds an additional dimension in a replicated
application, because now you must decide how consistent the
various nodes in the replication group will be relative to
the Master at any given time. If no writes are being
performed on the Master, all Replicas will eventually catch
up to the Master and so be completely consistent with it.
But for most HA applications, writes are occurring on the
Master, and so it is possible for some number of your
Replicas to lag behind the Master. What you have to decide,
then, is how sensitive your application is to this kind of
temporary inconsistency.
</p>
<p>
Note that your consistency requirements can be gated by your
durability requirements. Durability, in turn, can be gated by
any concerns you might have on write throughput. At the same
time, your consistency requirement can have an affect on the
read performance of your Replicas. It is
therefore a mistake to think about any one of these
requirements in the absence of the others.
</p>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="durability-intro"></a>Durability</h3>
</div>
</div>
</div>
<p>
One of the reasons you might be writing a replicated
application is to achieve a higher durability guarantee
than you can get with a traditional transactional
application. In a traditional application, your data's
durability is a function of how you perform your
transactional commits, and how frequently you perform
your backups. For this class of application, the
strongest durability guarantee you can have is to use
synchronous commits (the commit does not
complete until the data is written to disk), coupled with
very frequent backups of your environment.
</p>
<p>
The problem with a stand-alone application in which you
are seeking a very high durability guarantee is that your
write throughput will suffer. Synchronous commits
require disk writes, and disk I/O is one of the most
expensive operations you can ask a database to perform.
</p>
<p>
In order to increase write throughput in your
transactional application, you may decide to use
asynchronous commits that do not require the disk I/O to
complete before the transaction commit completes.
The problem with this is that your application can
potentially crash before a transaction has been
completely written to disk. This represents a loss of
data, which is to say the data is not durable.
</p>
<p>
Replication can help with your data durability in a
couple of ways. Most importantly, replication allows you to
<span class="emphasis"><em>commit to the network</em></span>. This means
that when your Master commits a transaction, the results
of that commit are sent to one or more nodes available
over the network. Consequently, multiple disks, disk
controllers, power supplies, and CPUs are used to ensure
the data modification makes it to stable storage.
</p>
<p>
Usually JE makes the commit operation on the Master
wait until it receives acknowledgements from some number
of electable nodes before returning from the
operation. However, if you want to increase write
throughput, you can configure your Master to proceed
without acknowledgements, and so return immediately from
the commit operation (once the commit operation has met
the local durability requirement). The price that you pay
for this is a reduced durability guarantee. How reduced
the guarantee is, is a function of the number of electable
nodes in your replication group (the more you have, the
higher your durability guarantee is) and the quality and
stability of your network.
</p>
<p>
Alternatively, you can obtain an
extremely high durability guarantee by configuring the Master
to wait for all electable nodes to acknowledge a commit
operation before returning from the operation. The price
you pay for this very high guarantee is greatly reduced
write throughput.
</p>
<p>
For information on configuring and managing durability
guarantees for your replicated application, see
<a class="xref" href="txn-management.html#durability" title="Managing Durability">Managing Durability</a>.
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="consistency-intro"></a>Managing Data Consistency</h3>
</div>
</div>
</div>
<p>
Data consistency means that the data you thought you
wrote to your environment is in fact written to your
environment. It also means that you will never find
partial records written to your environment.
</p>
<p>
In a replicated application, consistency also means that
data which is available on the Master is also available
on the Replicas.
</p>
<p>
A simple transactional application offers consistency
guarantees that are enforced when you commit a
transaction. Your replicated application also offers this
consistency guarantee (because it is also a transactional
application). For this reason, the environment on the
Master is always absolutely consistent. But beyond that, you need to manage
consistency for data across all the nodes in your
replication group.
</p>
<p>
When you commit a transaction on the Master, your
Replica nodes may or may not have the data changes
performed by that transaction at the end of the commit.
Whether they do depends on how high a durability
guarantee you implemented for your Master (see the
previous section). If, for example, you configured your
Master to require acknowledgements from all electable
nodes before returning from the commit, then the data
will be consistently available across all of those nodes
in the replication group, although not necessarily by
secondary nodes. However, if you configured the Master
such that no acknowledgements are necessary, then your
data is probably not consistent across the replication
group.
</p>
<p>
To ensure that read transactions on the Replicas see a
sufficiently consistent view of the environment, you can
set a consistency policy for each transaction. This
policy describes how current the Replica must be before a
transaction can be initiated on it. If the Replica is not
current enough, the start of the transaction is delayed
until the Replica has caught up.
</p>
<p>
There are two possible consistency policies. First, there
is a time-based policy that describes how far back in
time the Replica is allowed to lag behind the Master.
Secondly, you can use a commit-based consistency
policy that is based on the commit of a specified
transaction. This policy is used to ensure the Replica is
at least current enough to have the changes made by a
specific transaction, and by all transactions committed
prior to the specified transaction. The start of a
transaction on a Replica can be delayed until the Replica
can meet the consistency policy defined for that transaction.
</p>
<p>
This means that a stringent consistency policy can affect
your Replica's read throughput. Transactions, even
read-only transactions, cannot begin until the Replica is
consistent <span class="emphasis"><em>enough</em></span>. So if you have a
Replica that has lagged far behind the Master, and which
is having trouble catching up due to network latency or
other issues, then read requests may stall, and perhaps
even time out, which will affect the latency of your
Replica's read requests, and perhaps even its
overall availability for read requests. For this reason,
give careful consideration to how well you want your
Replica to perform on reads, versus how consistent you
want the Replica to be with other nodes in the
replication group.
</p>
<p>
For more information on managing consistency in your
replicated application, see
<a class="xref" href="consistency.html" title="Managing Consistency">Managing Consistency</a>.
</p>
</div>
</div>
<div class="navfooter">
<hr />
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left"><a accesskey="p" href="introduction.html">Prev</a> </td>
<td width="20%" align="center">
<a accesskey="u" href="introduction.html">Up</a>
</td>
<td width="40%" align="right"> <a accesskey="n" href="lifecycle.html">Next</a></td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Chapter 1. Introduction </td>
<td width="20%" align="center">
<a accesskey="h" href="index.html">Home</a>
</td>
<td width="40%" align="right" valign="top"> Replication Group Life Cycle</td>
</tr>
</table>
</div>
</body>
</html>