libdb/docs/programmer_reference/general_am_conf.html

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>General access method configuration</title>
    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
    <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
    <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
    <link rel="up" href="am_conf.html" title="Chapter 2.  Access Method Configuration" />
    <link rel="prev" href="am_conf_logrec.html" title="Logical record numbers" />
    <link rel="next" href="bt_conf.html" title="Btree access method specific configuration" />
  </head>
  <body>
    <div xmlns="" class="navheader">
      <div class="libver">
        <p>Library Version 11.2.5.3</p>
      </div>
      <table width="100%" summary="Navigation header">
        <tr>
          <th colspan="3" align="center">General access method configuration</th>
        </tr>
        <tr>
          <td width="20%" align="left"><a accesskey="p" href="am_conf_logrec.html">Prev</a> </td>
          <th width="60%" align="center">Chapter 2. 
		Access Method Configuration
        </th>
          <td width="20%" align="right"> <a accesskey="n" href="bt_conf.html">Next</a></td>
        </tr>
      </table>
      <hr />
    </div>
    <div class="sect1" lang="en" xml:lang="en">
      <div class="titlepage">
        <div>
          <div>
            <h2 class="title" style="clear: both"><a id="general_am_conf"></a>General access method configuration</h2>
          </div>
        </div>
      </div>
      <div class="toc">
        <dl>
          <dt>
            <span class="sect2">
              <a href="general_am_conf.html#am_conf_pagesize">Selecting a page size</a>
            </span>
          </dt>
          <dt>
            <span class="sect2">
              <a href="general_am_conf.html#am_conf_cachesize">Selecting a cache size</a>
            </span>
          </dt>
          <dt>
            <span class="sect2">
              <a href="general_am_conf.html#am_conf_byteorder">Selecting a byte order</a>
            </span>
          </dt>
          <dt>
            <span class="sect2">
              <a href="general_am_conf.html#am_conf_dup">Duplicate data items</a>
            </span>
          </dt>
          <dt>
            <span class="sect2">
              <a href="general_am_conf.html#am_conf_malloc">Non-local memory allocation</a>
            </span>
          </dt>
        </dl>
      </div>
      <p>
    There are a series of configuration tasks which are common to all
    access methods. They are described in the following sections.
</p>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="am_conf_pagesize"></a>Selecting a page size</h3>
            </div>
          </div>
        </div>
        <p>
    The size of the pages used in the underlying database can be specified
    by calling the <a href="../api_reference/C/dbset_pagesize.html" class="olink">DB-&gt;set_pagesize()</a> method.  The minimum page size is 512
    bytes and the maximum page size is 64K bytes, and must be a power of
    two.  If no page size is specified by the application, a page size is
    selected based on the underlying filesystem I/O block size.  (A page
    size selected in this way has a lower limit of 512 bytes and an upper
    limit of 16K bytes.)
</p>
        <p>
    There are several issues to consider when selecting a pagesize: overflow
    record sizes, locking, I/O efficiency, and recoverability.
</p>
        <p>
    First, the page size implicitly sets the size of an overflow record.
    Overflow records are key or data items that are too large to fit on a
    normal database page because of their size, and are therefore stored in
    overflow pages.  Overflow pages are pages that exist outside of the
    normal database structure.  For this reason, there is often a
    significant performance penalty associated with retrieving or modifying
    overflow records.  Selecting a page size that is too small, and which
    forces the creation of large numbers of overflow pages, can seriously
    impact the performance of an application.
</p>
        <p>
    Second, in the Btree, Hash and Recno access methods, the finest-grained
    lock that Berkeley DB acquires is for a page.  (The Queue access method
    generally acquires record-level locks rather than page-level locks.)
    Selecting a page size that is too large, and which causes threads or
    processes to wait because other threads of control are accessing or
    modifying records on the same page, can impact the performance of your
    application.
</p>
        <p>
    Third, the page size specifies the granularity of I/O from the database
    to the operating system.  Berkeley DB will give a page-sized unit of
    bytes to the operating system to be scheduled for reading/writing
    from/to the disk.  For many operating systems, there is an internal
    <span class="bold"><strong>block size</strong></span> which is used as the
    granularity of I/O from the operating system to the disk.  Generally,
    it will be more efficient for Berkeley DB to write filesystem-sized
    blocks to the operating system and for the operating system to write
    those same blocks to the disk.
</p>
        <p>
    Selecting a database page size smaller than the filesystem block size
    may cause the operating system to coalesce or otherwise manipulate
    Berkeley DB pages and can impact the performance of your application.
    When the page size is smaller than the filesystem block size and a page
    written by Berkeley DB is not found in the operating system's cache,
    the operating system may be forced to read a block from the disk, copy
    the page into the block it read, and then write out the block to disk,
    rather than simply writing the page to disk.  Additionally, as the
    operating system is reading more data into its buffer cache than is
    strictly necessary to satisfy each Berkeley DB request for a page, the
    operating system buffer cache may be wasting memory.
</p>
        <p>
    Alternatively, selecting a page size larger than the filesystem block
    size may cause the operating system to read more data than necessary.
    On some systems, reading filesystem blocks sequentially may cause the
    operating system to begin performing read-ahead.  If requesting a
    single database page implies reading enough filesystem blocks to
    satisfy the operating system's criteria for read-ahead, the operating
    system may do more I/O than is required.
</p>
        <p>
    Fourth, when using the Berkeley DB Transactional Data Store product,
    the page size may affect the errors from which your database can
    recover  See <a class="xref" href="transapp_reclimit.html" title="Berkeley DB recoverability">Berkeley DB recoverability</a> for more information.
</p>
        <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
          <h3 class="title">Note</h3>
          <p>
The <a href="../api_reference/C/db_tuner.html" class="olink">db_tuner</a> utility suggests a page size for btree databases that optimizes cache
efficiency and storage space requirements. This utility works only when given a pre-populated database. 
So, it is useful when tuning an existing application and not when first implementing an application.
</p>
        </div>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="am_conf_cachesize"></a>Selecting a cache size</h3>
            </div>
          </div>
        </div>
        <p>The size of the cache used for the underlying database can be specified
by calling the <a href="../api_reference/C/dbset_cachesize.html" class="olink">DB-&gt;set_cachesize()</a> method.
Choosing a cache size is, unfortunately, an art.  Your cache must be at
least large enough for your working set plus some overlap for unexpected
situations.</p>
        <p>When using the Btree access method, you must have a cache big enough for
the minimum working set for a single access.  This will include a root
page, one or more internal pages (depending on the depth of your tree),
and a leaf page.  If your cache is any smaller than that, each new page
will force out the least-recently-used page, and Berkeley DB will re-read the
root page of the tree anew on each database request.</p>
        <p>If your keys are of moderate size (a few tens of bytes) and your pages
are on the order of 4KB to 8KB, most Btree applications will be only
three levels.  For example, using 20 byte keys with 20 bytes of data
associated with each key, a 8KB page can hold roughly 400 keys (or 200
key/data pairs), so a fully populated three-level Btree will hold 32
million key/data pairs, and a tree with only a 50% page-fill factor will
still hold 16 million key/data pairs.  We rarely expect trees to exceed
five levels, although Berkeley DB will support trees up to 255 levels.</p>
        <p>The rule-of-thumb is that cache is good, and more cache is better.
Generally, applications benefit from increasing the cache size up to a
point, at which the performance will stop improving as the cache size
increases.  When this point is reached, one of two things have happened:
either the cache is large enough that the application is almost never
having to retrieve information from disk, or, your application is doing
truly random accesses, and therefore increasing size of the cache doesn't
significantly increase the odds of finding the next requested information
in the cache.  The latter is fairly rare -- almost all applications show
some form of locality of reference.</p>
        <p>That said, it is important not to increase your cache size beyond the
capabilities of your system, as that will result in reduced performance.
Under many operating systems, tying down enough virtual memory will cause
your memory and potentially your program to be swapped.  This is
especially likely on systems without unified OS buffer caches and virtual
memory spaces, as the buffer cache was allocated at boot time and so
cannot be adjusted based on application requests for large amounts of
virtual memory.</p>
        <p>For example, even if accesses are truly random within a Btree, your
access pattern will favor internal pages to leaf pages, so your cache
should be large enough to hold all internal pages.  In the steady state,
this requires at most one I/O per operation to retrieve the appropriate
leaf page.</p>
        <p>You can use the <a href="../api_reference/C/db_stat.html" class="olink">db_stat</a> utility to monitor the effectiveness of
your cache.  The following output is excerpted from the output of that
utility's <span class="bold"><strong>-m</strong></span> option:</p>
        <pre class="programlisting">prompt: db_stat -m
131072  Cache size (128K).
4273    Requested pages found in the cache (97%).
134     Requested pages not found in the cache.
18      Pages created in the cache.
116     Pages read into the cache.
93      Pages written from the cache to the backing file.
5       Clean pages forced from the cache.
13      Dirty pages forced from the cache.
0       Dirty buffers written by trickle-sync thread.
130     Current clean buffer count.
4       Current dirty buffer count.
</pre>
        <p>The statistics for this cache say that there have been 4,273 requests of
the cache, and only 116 of those requests required an I/O from disk.  This
means that the cache is working well, yielding a 97% cache hit rate.  The
<a href="../api_reference/C/db_stat.html" class="olink">db_stat</a> utility will present these statistics both for the cache
as a whole and for each file within the cache separately.</p>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="am_conf_byteorder"></a>Selecting a byte order</h3>
            </div>
          </div>
        </div>
        <p>Database files created by Berkeley DB can be created in either little- or
big-endian formats.  The byte order used for the underlying database
is specified by calling the <a href="../api_reference/C/dbset_lorder.html" class="olink">DB-&gt;set_lorder()</a> method.  If no order
is selected, the native format of the machine on which the database is
created will be used.</p>
        <p>Berkeley DB databases are architecture independent, and any format database can
be used on a machine with a different native format.  In this case, as
each page that is read into or written from the cache must be converted
to or from the host format, and databases with non-native formats will
incur a performance penalty for the run-time conversion.</p>
        <p>
          <span class="bold">
            <strong>It is important to note that the Berkeley DB access methods do no data
conversion for application specified data.  Key/data pairs written on a
little-endian format architecture will be returned to the application
exactly as they were written when retrieved on a big-endian format
architecture.</strong>
          </span>
        </p>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="am_conf_dup"></a>Duplicate data items</h3>
            </div>
          </div>
        </div>
        <p>
    The Btree and Hash access methods support the creation of multiple data
    items for a single key item.  By default, multiple data items are not
    permitted, and each database store operation will overwrite any
    previous data item for that key.  To configure Berkeley DB for
    duplicate data items, call the <a href="../api_reference/C/dbset_flags.html" class="olink">DB-&gt;set_flags()</a> method with the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUP" class="olink">DB_DUP</a>
    flag.  Only one copy of the key will be stored for each set of
    duplicate data items.  If the Btree access method comparison routine
    returns that two keys compare equally, it is undefined which of the two
    keys will be stored and returned from future database operations.
</p>
        <p>
    By default, Berkeley DB stores duplicates in the order in which they
    were added, that is, each new duplicate data item will be stored after
    any already existing data items.  This default behavior can be
    overridden by using the <a href="../api_reference/C/dbcput.html" class="olink">DBC-&gt;put()</a> method and one of the <a href="../api_reference/C/dbcput.html#dbcput_DB_AFTER" class="olink">DB_AFTER</a>,
    <a href="../api_reference/C/dbcput.html#dbcput_DB_BEFORE" class="olink">DB_BEFORE</a>, <a href="../api_reference/C/dbcput.html#dbcput_DB_KEYFIRST" class="olink">DB_KEYFIRST</a> or <a href="../api_reference/C/dbcput.html#dbcput_DB_KEYLAST" class="olink">DB_KEYLAST</a> flags.  Alternatively,
    Berkeley DB may be configured to sort duplicate data items.
</p>
        <p>
    When stepping through the database sequentially, duplicate data items
    will be returned individually, as a key/data pair, where the key item
    only changes after the last duplicate data item has been returned.  For
    this reason, duplicate data items cannot be accessed using the <a href="../api_reference/C/dbget.html" class="olink">DB-&gt;get()</a>
    method, as it always returns the first of the duplicate data items.
    Duplicate data items should be retrieved using a Berkeley DB cursor
    interface such as the <a href="../api_reference/C/dbcget.html" class="olink">DBC-&gt;get()</a> method.
</p>
        <p>
    There is a flag that permits applications to request the following data
    item only if it <span class="bold"><strong>is</strong></span> a duplicate data
    item of the current entry, see <a href="../api_reference/C/dbcget.html#dbcget_DB_NEXT_DUP" class="olink">DB_NEXT_DUP</a> for more information.
    There is a flag that permits applications to request the following data
    item only if it <span class="bold"><strong>is not</strong></span> a duplicate
    data item of the current entry, see <a href="../api_reference/C/dbcget.html#dbcget_DB_NEXT_NODUP" class="olink">DB_NEXT_NODUP</a> and <a href="../api_reference/C/dbcget.html#dbcget_DB_PREV_NODUP" class="olink">DB_PREV_NODUP</a>
    for more information.
</p>
        <p>
    It is also possible to maintain duplicate records in sorted order.
    Sorting duplicates will significantly increase performance when
    searching them and performing equality joins — both of which are
    common operations when using secondary indices.  To configure Berkeley
    DB to sort duplicate data items, the application must call the
    <a href="../api_reference/C/dbset_flags.html" class="olink">DB-&gt;set_flags()</a> method with the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUPSORT" class="olink">DB_DUPSORT</a> flag. Note that <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUPSORT" class="olink">DB_DUPSORT</a>
    automatically turns on the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUP" class="olink">DB_DUP</a> flag for you, so you do not 
    have to also set that flag; however, it is not an error to also set <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUP" class="olink">DB_DUP</a>
    when configuring for sorted duplicate records.
</p>
        <p>
    When configuring sorted duplicate records, you can also specify a
    custom comparison function using the <a href="../api_reference/C/dbset_dup_compare.html" class="olink">DB-&gt;set_dup_compare()</a> method.  If
    the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUPSORT" class="olink">DB_DUPSORT</a> flag is given, but no comparison routine is specified,
    then Berkeley DB defaults to the same lexicographical sorting used for
    Btree keys, with shorter items collating before longer items.
</p>
        <p>
    If the duplicate data items are unsorted, applications may store
    identical duplicate data items, or, for those that just like the way it
    sounds, <span class="emphasis"><em>duplicate duplicates</em></span>.
</p>
        <p>
    <span class="bold"><strong>It is an error to attempt to store identical
        duplicate data items when duplicates are being stored in a sorted
        order.</strong></span> Any such attempt results in the
        error message  "Duplicate data items are not supported with sorted
        data" with a <code class="literal">DB_KEYEXIST</code> return code.
</p>
        <p>
    Note that you can suppress the error message "Duplicate data items are
    not supported with sorted data" by using the <a href="../api_reference/C/dbput.html#put_DB_NODUPDATA" class="olink">DB_NODUPDATA</a> flag. Use
    of this flag does not change the database's basic behavior; storing
    duplicate data items in a database configured for sorted duplicates is
    still an error and so you will continue to receive the
    <code class="literal">DB_KEYEXIST</code> return code if you try to do that.
</p>
        <p>
    For further information on how searching and insertion behaves in the
    presence of duplicates (sorted or not), see the <a href="../api_reference/C/dbget.html" class="olink">DB-&gt;get()</a> <a href="../api_reference/C/dbput.html" class="olink">DB-&gt;put()</a>,
    <a href="../api_reference/C/dbcget.html" class="olink">DBC-&gt;get()</a> and <a href="../api_reference/C/dbcput.html" class="olink">DBC-&gt;put()</a> documentation.
</p>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="am_conf_malloc"></a>Non-local memory allocation</h3>
            </div>
          </div>
        </div>
        <p>Berkeley DB allocates memory for returning key/data pairs and statistical
information which becomes the responsibility of the application.
There are also interfaces where an application will allocate memory
which becomes the responsibility of Berkeley DB.</p>
        <p>On systems in which there may be multiple library versions of the
standard allocation routines (notably Windows NT), transferring memory
between the library and the application will fail because the Berkeley DB
library allocates memory from a different heap than the application
uses to free it, or vice versa.  To avoid this problem, the
<a href="../api_reference/C/envset_alloc.html" class="olink">DB_ENV-&gt;set_alloc()</a> and <a href="../api_reference/C/dbset_alloc.html" class="olink">DB-&gt;set_alloc()</a> methods can be used to
give Berkeley DB references to the application's allocation routines.</p>
      </div>
    </div>
    <div class="navfooter">
      <hr />
      <table width="100%" summary="Navigation footer">
        <tr>
          <td width="40%" align="left"><a accesskey="p" href="am_conf_logrec.html">Prev</a> </td>
          <td width="20%" align="center">
            <a accesskey="u" href="am_conf.html">Up</a>
          </td>
          <td width="40%" align="right"> <a accesskey="n" href="bt_conf.html">Next</a></td>
        </tr>
        <tr>
          <td width="40%" align="left" valign="top">Logical record numbers </td>
          <td width="20%" align="center">
            <a accesskey="h" href="index.html">Home</a>
          </td>
          <td width="40%" align="right" valign="top"> Btree access method specific configuration</td>
        </tr>
      </table>
    </div>
  </body>
</html>