libdb/docs/programmer_reference/transapp_archival.html

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>Database and log file archival</title>
    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
    <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
    <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
    <link rel="up" href="transapp.html" title="Chapter 11.  Berkeley DB Transactional Data Store Applications" />
    <link rel="prev" href="transapp_checkpoint.html" title="Checkpoints" />
    <link rel="next" href="transapp_logfile.html" title="Log file removal" />
  </head>
  <body>
    <div xmlns="" class="navheader">
      <div class="libver">
        <p>Library Version 11.2.5.2</p>
      </div>
      <table width="100%" summary="Navigation header">
        <tr>
          <th colspan="3" align="center">Database and log file archival</th>
        </tr>
        <tr>
          <td width="20%" align="left"><a accesskey="p" href="transapp_checkpoint.html">Prev</a> </td>
          <th width="60%" align="center">Chapter 11. 
		Berkeley DB Transactional Data Store Applications
        </th>
          <td width="20%" align="right"> <a accesskey="n" href="transapp_logfile.html">Next</a></td>
        </tr>
      </table>
      <hr />
    </div>
    <div class="sect1" lang="en" xml:lang="en">
      <div class="titlepage">
        <div>
          <div>
            <h2 class="title" style="clear: both"><a id="transapp_archival"></a>Database and log file archival</h2>
          </div>
        </div>
      </div>
      <p>
        The third component of the administrative infrastructure, archival
        for catastrophic recovery, concerns the recoverability of the
        database in the face of catastrophic failure.  Recovery after
        catastrophic failure is intended to minimize data loss when
        physical hardware has been destroyed — for example, loss of a
        disk that contains databases or log files.  Although the
        application may still experience data loss in this case, it is
        possible to minimize it.
    </p>
      <p>
        First, you may want to periodically create snapshots (that is,
        backups) of your databases to make it possible to recover from
        catastrophic failure.  These snapshots are either a standard
        backup, which creates a consistent picture of the databases as of a
        single instant in time; or an on-line backup (also known as a
        <span class="emphasis"><em>hot</em></span> backup), which creates a consistent
        picture of the databases as of an unspecified instant during the
        period of time when the snapshot was made.  The advantage of a hot
        backup is that applications may continue to read and write the
        databases while the snapshot is being taken.  The disadvantage of a
        hot backup is that more information must be archived, and recovery
        based on a hot backup is to an unspecified time between the start
        of the backup and when the backup is completed.
    </p>
      <p>
        Second, after taking a snapshot, you should periodically archive
        the log files being created in the environment.  It is often
        helpful to think of database archival in terms of full and
        incremental filesystem backups.  A snapshot is a full backup,
        whereas the periodic archival of the current log files is an
        incremental backup.  For example, it might be reasonable to take a
        full snapshot of a database environment weekly or monthly, and
        archive additional log files daily.  Using both the snapshot and
        the log files, a catastrophic crash at any time can be recovered to
        the time of the most recent log archival; a time long after the
        original snapshot.
    </p>
      <p>
        When incremental backups are implemented using this procedure, it
        is important to know that a database copy taken prior to a bulk
        loading event (that is, a transaction started with the
        <a href="../api_reference/C/txnbegin.html#txnbegin_DB_TXN_BULK" class="olink">DB_TXN_BULK</a> flag) can no longer be used as the target of an
        incremental backup.  This is true because bulk loading omits
        logging of some record insertions, so these insertions cannot be
        rolled forward by recovery.  It is recommended that a full backup
        be scheduled following a bulk loading event.
    </p>
      <p>
        To create a standard backup of your database that can be used to
        recover from catastrophic failure, take the following steps:
    </p>
      <div class="orderedlist">
        <ol type="1">
          <li>
            Commit or abort all ongoing transactions.
        </li>
          <li>
            Stop writing your databases until the backup has completed.
            Read-only operations are permitted, but no write operations and
            no filesystem operations may be performed (for example, the
            <a href="../api_reference/C/envremove.html" class="olink">DB_ENV-&gt;remove()</a> and <a href="../api_reference/C/dbopen.html" class="olink">DB-&gt;open()</a> methods may not be called).
        </li>
          <li>
            Force an environment checkpoint (see the <a href="../api_reference/C/db_checkpoint.html" class="olink">db_checkpoint</a> utility for
            more information).
        </li>
          <li>
            <p>
                Run the <a href="../api_reference/C/db_archive.html" class="olink">db_archive</a> utility with option <span class="bold"><strong>-s</strong></span> to identify all the database data
                files, and copy them to a backup device such as CD-ROM, alternate
                disk, or tape.
            </p>
            <p>
                If the database files are stored in a separate directory from the
                other Berkeley DB files, it may be simpler to archive the directory
                itself instead of the individual files (see <a href="../api_reference/C/envset_data_dir.html" class="olink">DB_ENV-&gt;set_data_dir()</a> for
                additional information).  
            </p>
            <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
              <h3 class="title">Note</h3>
              <p>
                    If any of the database files did not have an open <a href="../api_reference/C/db.html" class="olink">DB</a>
                    handle during the lifetime of the current log files,
                    the <a href="../api_reference/C/db_archive.html" class="olink">db_archive</a> utility will not list them in its output. This
                    is another reason it may be simpler to use a separate
                    database file directory and archive the entire
                    directory instead of archiving only the files listed by
                    the <a href="../api_reference/C/db_archive.html" class="olink">db_archive</a> utility.
                </p>
            </div>
          </li>
          <li>
            Run the <a href="../api_reference/C/db_archive.html" class="olink">db_archive</a> utility with option <span class="bold"><strong>-l</strong></span> to identify all the log files,
            and copy the last one (that is, the one with the highest
            number) to a backup device such as CD-ROM, alternate disk, or
            tape.
        </li>
        </ol>
      </div>
      <p>
        To create a <span class="emphasis"><em>hot</em></span> backup of your database that
        can be used to recover from catastrophic failure, take the
        following steps:
    </p>
      <div class="orderedlist">
        <ol type="1">
          <li>
            Set the <a href="../api_reference/C/envset_flags.html#set_flags_DB_HOTBACKUP_IN_PROGRESS" class="olink">DB_HOTBACKUP_IN_PROGRESS</a> flag in the environment.
            This affects the behavior of transactions started with the
            <a href="../api_reference/C/txnbegin.html#txnbegin_DB_TXN_BULK" class="olink">DB_TXN_BULK</a> flag.
        </li>
          <li>
            <p>
                Archive your databases, as described in the previous step
                #4.  You do not have to halt ongoing transactions or force
                a checkpoint.  As this is a hot backup, and the databases
                may be modified during the copy, it is critical that
                database pages be read atomically as described by
                <a class="xref" href="transapp_reclimit.html" title="Berkeley DB recoverability">Berkeley DB recoverability</a>.
            </p>
            <p>
                Note that only UNIX based systems are known to support the
                atomicity of reads.  These systems include: Solaris, Mac OSX,
                HPUX and various BSD based systems.  Linux and Windows
                based systems do not support atomic filesystem reads
                directly.  The XFS file system supports atomic reads
                despite the lack of it in Linux.  On systems that do not
                support atomic file system reads, the <a href="../api_reference/C/db_hotbackup.html" class="olink">db_hotbackup</a> utility 
                should be used or a tool can be constructed calling the
                <a href="../api_reference/C/db_copy.html" class="olink">db_copy()</a> function.
            </p>
          </li>
          <li>
            <p>
                Archive <span class="bold"><strong>all</strong></span> of the log files.
                The order of these two operations is required, and the database
                files must be archived <span class="bold"><strong>before</strong></span>
                the log files.  This means that if the database files and log
                files are in the same directory, you cannot simply archive the
                directory; you must make sure that the correct order of
                archival is maintained.
            </p>
            <p>
                To archive your log files, run the <a href="../api_reference/C/db_archive.html" class="olink">db_archive</a> utility using the
                <span class="bold"><strong>-l</strong></span> option to identify all the
                database log files, and copy them to your backup media.  If the
                database log files are stored in a separate directory from the
                other database files, it may be simpler to archive the
                directory itself instead of the individual files (see the
                <a href="../api_reference/C/envset_lg_dir.html" class="olink">DB_ENV-&gt;set_lg_dir()</a> method for more information).
            </p>
          </li>
          <li>
           Reset the <a href="../api_reference/C/envset_flags.html#set_flags_DB_HOTBACKUP_IN_PROGRESS" class="olink">DB_HOTBACKUP_IN_PROGRESS</a> flag.
        </li>
        </ol>
      </div>
      <p>
        To minimize the archival space needed for log files when doing a
        hot backup, run db_archive to identify those log files which are
        not in use.  Log files which are not in use do not need to be
        included when creating a hot backup, and you can discard them or
        move them aside for use with previous backups (whichever is
        appropriate), before beginning the hot backup.
    </p>
      <p>
        After completing one of these two sets of steps, the database
        environment can be recovered from catastrophic failure (see 
        <a class="xref" href="transapp_recovery.html" title="Recovery procedures">Recovery procedures</a> 
        for more information).
    </p>
      <p>
        To update either a hot or cold backup so that recovery from
        catastrophic failure is possible to a new point in time, repeat
        step #2 under the hot backup instructions and archive 
        <span class="bold"><strong>all</strong></span> of the log files in the
        database environment.  Each time both the database and log files
        are copied to backup media, you may discard all previous database
        snapshots and saved log files.  Archiving additional log files does
        not allow you to discard either previous database snapshots or log
        files.  Generally, updating a backup must be integrated with the
        application's log file removal procedures.
    </p>
      <p>
        The time to restore from catastrophic failure is a function of the
        number of log records that have been written since the snapshot was
        originally created.  Perhaps more importantly, the more separate
        pieces of backup media you use, the more likely it is that you will
        have a problem reading from one of them.  For these reasons, it is
        often best to make snapshots on a regular basis.
    </p>
      <p>
        <span class="bold"><strong>Obviously, the reliability of your archive
            media will affect the safety of your data.  For archival
            safety, ensure that you have multiple copies of your database
            backups, verify that your archival media is error-free and
            readable, and that copies of your backups are stored
            offsite!</strong></span>
    </p>
      <p>
        The functionality provided by the <a href="../api_reference/C/db_archive.html" class="olink">db_archive</a> utility is also available
        directly from the Berkeley DB library.  The following code fragment
        prints out a list of log and database files that need to be
        archived:
    </p>
      <pre class="programlisting">void
log_archlist(DB_ENV *dbenv)
{
    int ret;
    char **begin, **list;

    /* Get the list of database files. */
    if ((ret = dbenv-&gt;log_archive(dbenv,
        &amp;list, DB_ARCH_ABS | DB_ARCH_DATA)) != 0) {
        dbenv-&gt;err(dbenv, ret, "DB_ENV-&gt;log_archive: DB_ARCH_DATA");
        exit (1);
    }
    if (list != NULL) {
        for (begin = list; *list != NULL; ++list)
            printf("database file: %s\n", *list);
        free (begin);
    }

    /* Get the list of log files. */
    if ((ret = dbenv-&gt;log_archive(dbenv,
        &amp;list, DB_ARCH_ABS | DB_ARCH_LOG)) != 0) {
        dbenv-&gt;err(dbenv, ret, "DB_ENV-&gt;log_archive: DB_ARCH_LOG");
        exit (1);
    }
    if (list != NULL) {
        for (begin = list; *list != NULL; ++list)
            printf("log file: %s\n", *list);
        free (begin);
    }
}</pre>
    </div>
    <div class="navfooter">
      <hr />
      <table width="100%" summary="Navigation footer">
        <tr>
          <td width="40%" align="left"><a accesskey="p" href="transapp_checkpoint.html">Prev</a> </td>
          <td width="20%" align="center">
            <a accesskey="u" href="transapp.html">Up</a>
          </td>
          <td width="40%" align="right"> <a accesskey="n" href="transapp_logfile.html">Next</a></td>
        </tr>
        <tr>
          <td width="40%" align="left" valign="top">Checkpoints </td>
          <td width="20%" align="center">
            <a accesskey="h" href="index.html">Home</a>
          </td>
          <td width="40%" align="right" valign="top"> Log file removal</td>
        </tr>
      </table>
    </div>
  </body>
</html>