libdb/docs/programmer_reference/rep_lease.html

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>Master Leases</title>
    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
    <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
    <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
    <link rel="up" href="rep.html" title="Chapter 12.  Berkeley DB Replication" />
    <link rel="prev" href="rep_trans.html" title="Transactional guarantees" />
    <link rel="next" href="rep_ryw.html" title="Read your writes consistency" />
  </head>
  <body>
    <div xmlns="" class="navheader">
      <div class="libver">
        <p>Library Version 11.2.5.3</p>
      </div>
      <table width="100%" summary="Navigation header">
        <tr>
          <th colspan="3" align="center">Master Leases</th>
        </tr>
        <tr>
          <td width="20%" align="left"><a accesskey="p" href="rep_trans.html">Prev</a> </td>
          <th width="60%" align="center">Chapter 12.
		Berkeley DB Replication
        </th>
          <td width="20%" align="right"> <a accesskey="n" href="rep_ryw.html">Next</a></td>
        </tr>
      </table>
      <hr />
    </div>
    <div class="sect1" lang="en" xml:lang="en">
      <div class="titlepage">
        <div>
          <div>
            <h2 class="title" style="clear: both"><a id="rep_lease"></a>Master Leases</h2>
          </div>
        </div>
      </div>
      <div class="toc">
        <dl>
          <dt>
            <span class="sect2">
              <a href="rep_lease.html#masterlease_change_groupsize">Changing Group Size</a>
            </span>
          </dt>
        </dl>
      </div>
      <p>
    Some applications have strict requirements about the consistency of
    data read on a master site.  Berkeley DB provides a mechanism called
    master leases to provide such consistency.  Without master leases, it
    is sometimes possible for Berkeley DB to return old data to an
    application when newer data is available due to unfortunate scheduling
    as illustrated below:
</p>
      <div class="orderedlist">
        <ol type="1">
          <li><span class="bold"><strong>Application on master site</strong></span>:   Read data item
        <span class="emphasis"><em>foo</em></span> via Berkeley DB <a href="../api_reference/C/dbget.html" class="olink">DB-&gt;get()</a> or <a href="../api_reference/C/dbcget.html" class="olink">DBC-&gt;get()</a> call.
    </li>
          <li><span class="bold"><strong>Application on master site</strong></span>: sleep, get descheduled, etc.
    </li>
          <li><span class="bold"><strong>System</strong></span>: Master changes role, becomes a client.
    </li>
          <li><span class="bold"><strong>System</strong></span>: New site is elected master.
    </li>
          <li><span class="bold"><strong>System</strong></span>: New master modifies data item
        <span class="emphasis"><em>foo</em></span>.
    </li>
          <li><span class="bold"><strong>Application</strong></span>: Berkeley DB returns old data for
        <span class="emphasis"><em>foo</em></span> to application.
    </li>
        </ol>
      </div>
      <p>
    By using master leases, Berkeley DB can provide guarantees about the
    consistency of data read on a master site.  The master site can be
    considered a recognized authority for the data and consequently can
    provide authoritative reads.  Clients grant master leases to a master
    site.  By doing so, clients acknowledge the right of that site to
    retain the role of master for a period of time.  During that period of
    time, clients cannot elect a new master, become master, nor grant their
    lease to another site.
</p>
      <p>
    By holding a collection of granted leases, a master site can guarantee
    to the application that the data returned is the current, authoritative
    value.  As a master performs operations, it continually requests
    updated grants from the clients.  When a read operation is required,
    the master guarantees that it holds a valid collection of lease grants
    from clients before returning data to the application.  By holding
    leases, Berkeley DB provides several guarantees to the application:
</p>
      <div class="orderedlist">
        <ol type="1">
          <li>
        Authoritative reads: A guarantee that the data being read by the
        application is the current value.
    </li>
          <li>
            <p>
            Durability from rollback: A guarantee that the data being
            written or read by the application is permanent across a
            majority of client sites and will never be rolled back.
        </p>
            <p>
            The rollback guarantee also depends on the <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> flag.  The
            guarantee is effective as long as there isn't total replication group
            failure while clients have granted leases but are holding the updates
            in their cache.  The application must weigh the performance impact of
            synchronous transactions against the risk of total replication group
            failure.  If clients grant a lease while holding updated data in cache,
            and total failure occurs, then the data is no longer present on the
            clients and rollback can occur if the master also crashes.
        </p>
            <p>
            The guarantee that data will not be rolled back applies only to data
            successfully committed on a master.  Data read on a client, or read
            while ignoring leases can be rolled back.
        </p>
          </li>
          <li>
            <p>
            Freshness: A guarantee that the data being read by the
            application on the <span class="emphasis"><em>master</em></span> is up-to-date
            and has not been modified or removed during the read.
        </p>
            <p>
            The read authority is only on the master.  Read operations on a
            client always ignore leases and consequently, these operations
            can return stale data.
        </p>
          </li>
          <li>
            <p>
            Master viability: A guarantee that a current master with valid
            leases cannot encounter a duplicate master situation.
        </p>
            <p>
            Leases remove the possibility of a duplicate master situation that
            forces the current master to downgrade to a client.  However, it is
            still possible that old masters with expired leases can discover a
            later master and return <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_DUPMASTER" class="olink">DB_REP_DUPMASTER</a> to the application.
        </p>
          </li>
        </ol>
      </div>
      <p>
    There are several requirements of the application using leases:
</p>
      <div class="orderedlist">
        <ol type="1">
          <li>
        Replication Manager applications must configure a majority (or
        larger) acknowledgement policy via the <a href="../api_reference/C/repmgrset_ack_policy.html" class="olink">DB_ENV-&gt;repmgr_set_ack_policy()</a> method.
        Base API applications must implement and enforce such a policy on
        their own.
    </li>
          <li>
        Base API applications must return an error from the send callback
        function when the majority acknowledgement policy is not met for
        permanent records marked with <a href="../api_reference/C/reptransport.html#transport_DB_REP_PERMANENT" class="olink">DB_REP_PERMANENT</a>.  Note that the
        Replication Manager automatically fulfills this requirement.
    </li>
          <li>
        Base API applications must set the number of sites in the group
        using the <a href="../api_reference/C/repnsites.html" class="olink">DB_ENV-&gt;rep_set_nsites()</a> method before starting replication and cannot
        change it during operation.
    </li>
          <li>
        Using leases in a replication group is all or none.  Behavior is
        undefined when some sites configure leases and others do not.  Use
        the <a href="../api_reference/C/repconfig.html" class="olink">DB_ENV-&gt;rep_set_config()</a> method to turn on leases.
    </li>
          <li>
        The configured lease timeout value must be the same on all sites
        in a replication group, set via the <a href="../api_reference/C/repset_timeout.html" class="olink">DB_ENV-&gt;rep_set_timeout()</a> method.
    </li>
          <li>
        The configured clock_scale_factor value must be the same on all
        sites in a replication group.  This value defaults to no skew, but
        can be set via the <a href="../api_reference/C/repclockskew.html" class="olink">DB_ENV-&gt;rep_set_clockskew()</a> method.
    </li>
          <li>
        Applications that care about read guarantees must perform all read
        operations on the master.  Reading on a client does not guarantee
        freshness.
    </li>
          <li>
        The application must use elections to choose a master site.  It
        must never simply declare a master without having won an election
        (as is allowed without Master Leases).
    </li>
        </ol>
      </div>
      <p>
    Master leases are based on timeouts.  Berkeley DB assumes that time
    always runs forward.  Users who change the system clock on either
    client or master sites when leases are in use void all guarantees and
    can get undefined behavior.  See the <a href="../api_reference/C/repset_timeout.html" class="olink">DB_ENV-&gt;rep_set_timeout()</a> method for more
    information.
</p>
      <p>
    Applications using master leases should be prepared to handle
    <code class="literal">DB_REP_LEASE_EXPIRED</code> errors from read operations
    on a master and from the <a href="../api_reference/C/txncommit.html" class="olink">DB_TXN-&gt;commit()</a> method.
</p>
      <p>
    Read operations on a master that should not be subject to leases can
    use the <a href="../api_reference/C/dbget.html#get_DB_IGNORE_LEASE" class="olink">DB_IGNORE_LEASE</a> flag to the <a href="../api_reference/C/dbget.html" class="olink">DB-&gt;get()</a> method.  Read operations
    on a client always imply leases are ignored.
</p>
      <p>
    Master lease checks cannot succeed until a majority of sites have
    completed client synchronization.  Read operations on a master performed
    before this condition is met can use the <a href="../api_reference/C/dbget.html#get_DB_IGNORE_LEASE" class="olink">DB_IGNORE_LEASE</a> flag to
    avoid errors.
</p>
      <p>
    Clients are forbidden from participating in elections while they have
    an outstanding lease granted to a master.  Therefore, if the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a>
    method is called, then Berkeley DB will block, waiting until its lease
    grant expires before participating in any election.  While it waits,
    the client attempts to contact the current master.  If the client finds
    a current master, then it returns from the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a> method.  When
    leases are configured and the lease has never yet been granted (on
    start-up), clients must wait a full lease timeout before participating
    in an election.
</p>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="masterlease_change_groupsize"></a>Changing Group Size</h3>
            </div>
          </div>
        </div>
        <p>
            If you are using master leases and you change the size of your
            replication group, there is a remote possibility that you can
            lose some data previously thought to be durable. This is only
            true for users of the Base API.
        </p>
        <p>
            The problem can arise if you are removing sites from your
            replication group. (You might be increasing the size of your
            site overall, but if you remove all of the wrong sites you can
            lose data.)
        </p>
        <p>
            Suppose you have a replication group with five sites; A, B, C,
            D and E; and you are using a quorum acknowledgement policy. Then:
        </p>
        <div class="orderedlist">
          <ol type="1">
            <li>
              <p>
                    Master A replicates a transaction to replicas B and C.
                    Those sites acknowledge the write activity.
                </p>
            </li>
            <li>
              <p>
                    Sites D and E do not receive the transaction. However,
                    B and C have acknowledged the transaction, which means the
                    acknowledgement policy is met and so the transaction is
                    considered durable.
                </p>
            </li>
            <li>
              <p>
                    You shutdown sites B and C. Now only A has the transaction.
                </p>
            </li>
            <li>
              <p>
                    You increase the size of your replication group to 3
                    using <a href="../api_reference/C/repnsites.html" class="olink">DB_ENV-&gt;rep_set_nsites()</a>.
                </p>
            </li>
            <li>
              <p>
                    You shutdown or otherwise lose site A.
                </p>
            </li>
            <li>
              <p>
                    Sites D and E hold an election. Because the size of the
                    replication group is 3, they have enough sites to
                    successfully hold an election. However, neither site
                    has the transaction in question. In this way, the
                    transaction can become lost.
                </p>
            </li>
          </ol>
        </div>
        <p>
            An alternative scenario exists where you do not change the size
            of your replication group, or you actually increase the size of
            your replication group, but in the process you happen to remove
            the exact wrong sites:
        </p>
        <div class="orderedlist">
          <ol type="1">
            <li>
              <p>
                    Master A replicates a transaction to replicas B and C.
                    Those sites acknowledge the write activity.
                </p>
            </li>
            <li>
              <p>
                    Sites D and E do not receive the transaction. However,
                    B and C have acknowledged the transaction, which means the
                    acknowledgement policy is met and so the transaction is
                    considered durable.
                </p>
            </li>
            <li>
              <p>
                    You shutdown sites B and C. Now only A has the transaction.
                </p>
            </li>
            <li>
              <p>
                    You add three new sites to your replication group: F,
                    G and H, increasing the size of your replication group
                    to 6 using <a href="../api_reference/C/repnsites.html" class="olink">DB_ENV-&gt;rep_set_nsites()</a>.
                </p>
            </li>
            <li>
              <p>
                    You shutdown or otherwise lose site A before F, G and H
                    can be fully populated with data.
                </p>
            </li>
            <li>
              <p>
                    Sites D, E, F, G and H hold an election. Because the size of the
                    replication group is 6, they have enough sites to
                    successfully hold an election. However, none of these sites
                    has the transaction in question. In this way, the
                    transaction can become lost.
                </p>
            </li>
          </ol>
        </div>
        <p>
            This scenario represents a race condition that would be highly
            unlikely to be seen outside of a lab environment. To minimize
            the chance of this race condition occurring to the absolute
            minimum, do one or more of the following when using master
            leases with the Base API:
        </p>
        <div class="orderedlist">
          <ol type="1">
            <li>
              <p>
                    Require all sites to acknowledge transaction commits.
                </p>
            </li>
            <li>
              <p>
                    Never change the size of your replication group unless
                    all sites in the group are running and communicating
                    normally with one another.
                </p>
            </li>
            <li>
              <p>
                    Don't remove (or replace) a large percentage of your
                    sites from your replication group unless all sites in
                    the group are running and communicating normally with
                    one another. If you are going to remove a large
                    percentage of your sites from your replication group,
                    try removing just one site at a time, pausing in
                    between each removal to give the replication group a
                    chance to fully distribute all writes before removing
                    the next site.
                </p>
            </li>
          </ol>
        </div>
      </div>
    </div>
    <div class="navfooter">
      <hr />
      <table width="100%" summary="Navigation footer">
        <tr>
          <td width="40%" align="left"><a accesskey="p" href="rep_trans.html">Prev</a> </td>
          <td width="20%" align="center">
            <a accesskey="u" href="rep.html">Up</a>
          </td>
          <td width="40%" align="right"> <a accesskey="n" href="rep_ryw.html">Next</a></td>
        </tr>
        <tr>
          <td width="40%" align="left" valign="top">Transactional guarantees </td>
          <td width="20%" align="center">
            <a accesskey="h" href="index.html">Home</a>
          </td>
          <td width="40%" align="right" valign="top"> Read your writes consistency</td>
        </tr>
      </table>
    </div>
  </body>
</html>