mirror of
https://github.com/berkeleydb/libdb.git
synced 2024-11-16 17:16:25 +00:00
487 lines
25 KiB
HTML
487 lines
25 KiB
HTML
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||
<head>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
||
<title>Designing Your Application for Recovery</title>
|
||
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
|
||
<meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
|
||
<link rel="start" href="index.html" title="Getting Started with Berkeley DB Transaction Processing" />
|
||
<link rel="up" href="filemanagement.html" title="Chapter 5. Managing DB Files" />
|
||
<link rel="prev" href="recovery.html" title="Recovery Procedures" />
|
||
<link rel="next" href="hotfailover.html" title="Using Hot Failovers" />
|
||
</head>
|
||
<body>
|
||
<div xmlns="" class="navheader">
|
||
<div class="libver">
|
||
<p>Library Version 11.2.5.3</p>
|
||
</div>
|
||
<table width="100%" summary="Navigation header">
|
||
<tr>
|
||
<th colspan="3" align="center">Designing Your Application for Recovery</th>
|
||
</tr>
|
||
<tr>
|
||
<td width="20%" align="left"><a accesskey="p" href="recovery.html">Prev</a> </td>
|
||
<th width="60%" align="center">Chapter 5. Managing DB Files</th>
|
||
<td width="20%" align="right"> <a accesskey="n" href="hotfailover.html">Next</a></td>
|
||
</tr>
|
||
</table>
|
||
<hr />
|
||
</div>
|
||
<div class="sect1" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h2 class="title" style="clear: both"><a id="architectrecovery"></a>Designing Your Application for Recovery</h2>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="toc">
|
||
<dl>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="architectrecovery.html#multithreadrecovery">Recovery for Multi-Threaded Applications</a>
|
||
</span>
|
||
</dt>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="architectrecovery.html#multiprocessrecovery">Recovery in Multi-Process Applications</a>
|
||
</span>
|
||
</dt>
|
||
</dl>
|
||
</div>
|
||
<p>
|
||
When building your DB application, you should consider how you will run recovery. If you are building a
|
||
single threaded, single process application, it is fairly simple to run recovery when your application first
|
||
opens its environment. In this case, you need only decide if you want to run recovery every time you open
|
||
your application (recommended) or only some of the time, presumably triggered by a start up option
|
||
controlled by your application's user.
|
||
</p>
|
||
<p>
|
||
However, for multi-threaded and multi-process applications, you need to carefully consider how you will
|
||
design your application's startup code so as to run recovery only when it makes sense to do so.
|
||
</p>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="multithreadrecovery"></a>Recovery for Multi-Threaded Applications</h3>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<p>
|
||
If your application uses only one environment handle, then handling recovery for a multi-threaded
|
||
application is no more difficult than for a single threaded application. You simply open the environment
|
||
in the application's main thread, and then pass that handle to each of the threads that will be
|
||
performing DB operations. We illustrate this with our final example in this book (see
|
||
<a class="xref" href="txnexample_c.html" title="Transaction Example">Transaction Example</a>
|
||
|
||
|
||
|
||
for more information).
|
||
</p>
|
||
<p>
|
||
Alternatively, you can have each worker thread open its own environment handle. However, in this case,
|
||
designing for recovery is a bit more complicated.
|
||
</p>
|
||
<p>
|
||
Generally, when a thread performing database operations fails
|
||
or hangs, it is frequently best to simply
|
||
restart the application and run recovery upon application
|
||
startup as normal. However, not all applications can afford
|
||
to restart because a single thread has misbehaved.
|
||
</p>
|
||
<p>
|
||
If you are attempting to continue operations in the face of a misbehaving thread,
|
||
then at a minimum recovery must be run if a thread performing database operations fails or hangs.
|
||
</p>
|
||
<p>
|
||
Remember that recovery clears the environment of all
|
||
outstanding locks, including any that might be outstanding
|
||
from an aborted thread. If these locks are not cleared,
|
||
other threads performing database operations can back up
|
||
behind the locks obtained but never cleared by the failed
|
||
thread. The result will be an application that hangs
|
||
indefinitely.
|
||
</p>
|
||
<p>
|
||
To run recovery under these circumstances:
|
||
</p>
|
||
<div class="orderedlist">
|
||
<ol type="1">
|
||
<li>
|
||
<p>
|
||
Suspend or shutdown all other threads performing
|
||
database operations.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Discarding any open environment handles. Note that
|
||
attempting to gracefully close these handles may be
|
||
asking for trouble; the close can fail if the
|
||
environment is already in need of recovery. For
|
||
this reason, it is best and easiest to simply discard the handle.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Open new handles, running recovery as you open
|
||
them.
|
||
See <a class="xref" href="recovery.html#normalrecovery" title="Normal Recovery">Normal Recovery</a> for more information.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Restart all your database threads.
|
||
</p>
|
||
</li>
|
||
</ol>
|
||
</div>
|
||
<p>
|
||
A traditional way to handle this activity is to spawn a watcher thread that is responsible for making
|
||
sure all is well with your threads, and performing the above actions if not.
|
||
</p>
|
||
<p>
|
||
However, in the case where each worker thread opens and maintains its own environment handle, recovery
|
||
is complicated for two reasons:
|
||
</p>
|
||
<div class="orderedlist">
|
||
<ol type="1">
|
||
<li>
|
||
<p>
|
||
For some applications and workloads, it might be
|
||
worthwhile to give your database threads the
|
||
ability to gracefully finalize any on-going
|
||
transactions. If this is the case, your
|
||
code must be capable of signaling each thread
|
||
to halt DB activities and close its
|
||
environment. If you simply run recovery against the
|
||
environment, your database threads will
|
||
detect this and fail in the midst of performing their
|
||
database operations.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Your code must be capable of ensuring only one
|
||
thread runs recovery before allowing all other
|
||
threads to open their respective environment
|
||
handles. Recovery should be single threaded because when
|
||
recovery is run against an environment, it is
|
||
deleted and then recreated. This will cause all
|
||
other processes and threads to "fail" when they
|
||
attempt operations against the newly recovered
|
||
environment. If all threads run recovery
|
||
when they start up, then it is likely that some
|
||
threads will fail because the environment that they
|
||
are using has been recovered. This will cause the thread to have to re-execute its own recovery
|
||
path. At best, this is inefficient and at worst it could cause your application to fall into an
|
||
endless recovery pattern.
|
||
</p>
|
||
</li>
|
||
</ol>
|
||
</div>
|
||
</div>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="multiprocessrecovery"></a>Recovery in Multi-Process Applications</h3>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<p>
|
||
Frequently, DB applications use multiple processes to interact with the databases. For example, you may
|
||
have a long-running process, such as some kind of server, and then a series of administrative tools that
|
||
you use to inspect and administer the underlying databases. Or, in some web-based architectures, different
|
||
services are run as independent processes that are managed by the server.
|
||
</p>
|
||
<p>
|
||
In any case, recovery for a multi-process environment is complicated for two reasons:
|
||
</p>
|
||
<div class="orderedlist">
|
||
<ol type="1">
|
||
<li>
|
||
<p>
|
||
In the event that recovery must be run, you might
|
||
want to notify processes interacting with the environment
|
||
that recovery is about to occur and give them a
|
||
chance to gracefully terminate. Whether it is
|
||
worthwhile for you to do this is entirely dependent
|
||
upon the nature of your application. Some
|
||
long-running applications with multiple processes
|
||
performing meaningful work might want to do this.
|
||
Other applications with processes performing database
|
||
operations that are likely to be harmed by error conditions in other
|
||
processes will likely find it to be not worth the
|
||
effort. For this latter group, the chances of
|
||
performing a graceful shutdown may be low anyway.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Unlike single process scenarios, it can quickly become wasteful for every process interacting
|
||
with the databases to run recovery when it starts up. This is partly because recovery
|
||
<span class="emphasis"><em>does</em></span> take some amount of time to run, but mostly you want to
|
||
avoid a situation where your server must
|
||
reopen all its environment handles just because you fire up a command line database
|
||
administrative utility that always runs recovery.
|
||
</p>
|
||
</li>
|
||
</ol>
|
||
</div>
|
||
<p>
|
||
DB offers you two methods by which you can manage recovery for multi-process DB applications.
|
||
Each has different strengths and weaknesses, and they are described in the next sections.
|
||
</p>
|
||
<div class="sect3" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h4 class="title"><a id="mp_recover_effects"></a>Effects of Multi-Process Recovery</h4>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<p>
|
||
Before continuing, it is worth noting that the following sections describe recovery processes than
|
||
can result in one process running recovery while other processes are currently actively performing
|
||
database operations.
|
||
</p>
|
||
<p>
|
||
When this happens, the current database operation will
|
||
abnormally fail, indicating a DB_RUNRECOVERY condition.
|
||
This means that your application should immediately abandon any database operations that it may have
|
||
on-going, discard any environment handles it has opened, and obtain and open new handles.
|
||
</p>
|
||
<p>
|
||
The net effect of this is that any writes performed by unresolved transactions will be lost.
|
||
For persistent applications (servers, for example), the services it provides will also be
|
||
unavailable for the amount of time that it takes to complete a recovery and for all participating
|
||
processes to reopen their environment handles.
|
||
</p>
|
||
</div>
|
||
<div class="sect3" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h4 class="title"><a id="db_register"></a>Process Registration</h4>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<p>
|
||
One way to handle multi-process recovery is for every process to "register" its environment. In
|
||
doing so, the process gains the ability to see if any other applications are using the
|
||
environment and, if so, whether they have suffered an abnormal termination. If an abnormal
|
||
termination is detected, the process runs recovery; otherwise, it does not.
|
||
</p>
|
||
<p>
|
||
Note that using process registration also ensures that
|
||
recovery is serialized across applications. That is,
|
||
only one process at a time has a chance to run
|
||
recovery. Generally this means that the first process
|
||
to start up will run recovery, and all other processes
|
||
will silently not run recovery because it is not
|
||
needed.
|
||
</p>
|
||
<p>
|
||
To cause your application to register its environment, you specify
|
||
<span>
|
||
the <code class="literal">DB_REGISTER</code> flag when you open your environment.
|
||
You may also specify <code class="literal">DB_RECOVER</code>. However, it is an error to specify
|
||
<code class="literal">DB_RECOVER_FATAL</code> when using
|
||
the <code class="literal">DB_REGISTER</code> flag.
|
||
</span>
|
||
|
||
|
||
If during the open, DB determines that recovery must be run, it will automatically run the correct
|
||
type of recovery for you, so long as you specify normal recovery
|
||
on your environment open. If you do not specify normal recovery, and you register your environment,
|
||
then no recovery is run if the registration process identifies a need for it. In this case,
|
||
the environment open simply fails by
|
||
<span>returning <code class="literal">DB_RUNRECOVERY</code>.</span>
|
||
|
||
</p>
|
||
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
|
||
<h3 class="title">Note</h3>
|
||
<p>
|
||
If you do not specify normal recovery when you open your first registered environment
|
||
in the application, then that application will fail the environment open by
|
||
<span>returning <code class="literal">DB_RUNRECOVERY</code>.</span>
|
||
|
||
This is because the first process to register must create an internal
|
||
registration file, and recovery is forced when that file is created. To
|
||
avoid an abnormal termination of the environment open, specify recovery on
|
||
the environment open for at least the first process starting in your
|
||
application.
|
||
</p>
|
||
</div>
|
||
<p>
|
||
In addition, if you specify <code class="literal">DB_ENV_FAILCHK</code>
|
||
when you register your environment, then a fail check is performed on
|
||
environment open (fail checks are described in the next section). If, during the
|
||
fail check process, an abnormal termination is detected for any of the processes
|
||
involved in the application, DB releases any read locks held by the dead
|
||
process and performs transaction aborts as necessary. This is done in an attempt
|
||
to clean up the environment.
|
||
</p>
|
||
<p>
|
||
In this situation, if a general cleanup of the
|
||
environment is not possible and normal recovery is not specified on environment
|
||
open, then the open will abort,
|
||
<span>returning <code class="literal">DB_RUNRECOVERY</code>.</span>
|
||
|
||
However, if this situation occurs and recovery was specified, then the appropriate type of recovery
|
||
(normal or fatal) is run so as to bring the environment back to a healthy state.
|
||
</p>
|
||
<p>
|
||
Be aware that there are some limitations/requirements if you want your various processes to
|
||
coordinate recovery using registration:
|
||
</p>
|
||
<div class="orderedlist">
|
||
<ol type="1">
|
||
<li>
|
||
<p>
|
||
There can be only one environment handle per
|
||
environment per process. In the case of multi-threaded
|
||
processes, the environment handle must be shared across threads.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
All processes sharing the environment must use registration. If registration is
|
||
not uniformly used across all participating processes, then you can see inconsistent results
|
||
in terms of your application's ability to recognize that recovery must be run.
|
||
</p>
|
||
</li>
|
||
</ol>
|
||
</div>
|
||
</div>
|
||
<div class="sect3" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h4 class="title"><a id="failchk"></a>Failure Checking</h4>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<p>
|
||
For very large and robust multi-process applications, the most common way to ensure all the
|
||
processes are working as intended is to make use of a watchdog process. To assist a watchdog
|
||
process, DB offers a failure checking mechanism.
|
||
</p>
|
||
<p>
|
||
When a thread of control fails with open environment handles, the result is that there may be
|
||
resources left locked or corrupted. Other threads of control may encountered these unavailable resources
|
||
quickly or not at all, depending on data access patterns.
|
||
</p>
|
||
<p>
|
||
In any case, the DB failure checking mechanism allows a watchdog to detect if an environment is
|
||
unusable as a result of a thread of control failure. It should be called periodically
|
||
(for example, once a minute) from the watchdog process. If the environment is deemed unusable, then
|
||
the watchdog process is notified that recovery should be run. It is then up to the watchdog to
|
||
actually run recovery. It is also the watchdog's responsibility to decide what to do about currently
|
||
running processes before running recovery. The watchdog could, for example, attempt to
|
||
gracefully shutdown or kill all relevant processes before running recovery.
|
||
</p>
|
||
<p>
|
||
Note that failure checking need not be run from a separate process, although conceptually that is
|
||
how the mechanism is meant to be used. This same mechanism could be used in a multi-threaded
|
||
application that wants to have a watchdog thread.
|
||
</p>
|
||
<p>
|
||
To use failure checking you must:
|
||
</p>
|
||
<div class="orderedlist">
|
||
<ol type="1">
|
||
<li>
|
||
<p>
|
||
<span>
|
||
Provide an <code class="function">is_alive()</code> call back using the
|
||
|
||
<code class="methodname">Dbenv::set_isalive()</code>
|
||
method.
|
||
</span>
|
||
|
||
|
||
DB uses this method to determine whether a specified process and thread
|
||
is alive when the failure checking is performed.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Possibly provide a
|
||
|
||
<span>
|
||
<code class="literal">thread_id</code> callback
|
||
</span>
|
||
|
||
|
||
|
||
|
||
that uniquely identifies a process
|
||
and thread of control. This
|
||
<span>callback</span>
|
||
|
||
|
||
|
||
is only necessary if the standard process and thread
|
||
identification functions for your platform are not sufficient to for use by failure
|
||
checking. This is rarely necessary and is usually because the thread and/or process ids
|
||
used by your system cannot fit into an unsigned integer.
|
||
</p>
|
||
<p>
|
||
You provide this callback using the
|
||
|
||
<code class="methodname">DbEnv::set_thread_id()</code>
|
||
method. See the API reference for this method for more information on when setting a thread
|
||
id callback might be necessary.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Call the
|
||
|
||
<code class="methodname">DbEnv::failchk()</code>
|
||
|
||
|
||
|
||
method periodically. You can do this either periodically (once per minute, for example), or
|
||
whenever a thread of control exits for your application.
|
||
</p>
|
||
<p>
|
||
If this method determines that a thread of control exited holding read locks, those locks
|
||
are automatically released. If the thread of control exited with an unresolved transaction,
|
||
that transaction is aborted. If any other problems exist beyond these such that the
|
||
environment must be recovered, the method will
|
||
<span>return <code class="literal">DB_RUNRECOVERY</code>.</span>
|
||
|
||
|
||
</p>
|
||
</li>
|
||
</ol>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="navfooter">
|
||
<hr />
|
||
<table width="100%" summary="Navigation footer">
|
||
<tr>
|
||
<td width="40%" align="left"><a accesskey="p" href="recovery.html">Prev</a> </td>
|
||
<td width="20%" align="center">
|
||
<a accesskey="u" href="filemanagement.html">Up</a>
|
||
</td>
|
||
<td width="40%" align="right"> <a accesskey="n" href="hotfailover.html">Next</a></td>
|
||
</tr>
|
||
<tr>
|
||
<td width="40%" align="left" valign="top">Recovery Procedures </td>
|
||
<td width="20%" align="center">
|
||
<a accesskey="h" href="index.html">Home</a>
|
||
</td>
|
||
<td width="40%" align="right" valign="top"> Using Hot Failovers</td>
|
||
</tr>
|
||
</table>
|
||
</div>
|
||
</body>
|
||
</html>
|