libdb/docs/programmer_reference/general_am_conf.html

393 lines
21 KiB
HTML
Raw Permalink Normal View History

2011-09-13 17:44:24 +00:00
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>General access method configuration</title>
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
<meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
<link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
<link rel="up" href="am_conf.html" title="Chapter 2.  Access Method Configuration" />
<link rel="prev" href="am_conf_logrec.html" title="Logical record numbers" />
<link rel="next" href="bt_conf.html" title="Btree access method specific configuration" />
</head>
<body>
<div xmlns="" class="navheader">
<div class="libver">
2012-11-14 21:35:20 +00:00
<p>Library Version 11.2.5.3</p>
2011-09-13 17:44:24 +00:00
</div>
<table width="100%" summary="Navigation header">
<tr>
<th colspan="3" align="center">General access method configuration</th>
</tr>
<tr>
<td width="20%" align="left"><a accesskey="p" href="am_conf_logrec.html">Prev</a> </td>
<th width="60%" align="center">Chapter 2. 
Access Method Configuration
</th>
<td width="20%" align="right"> <a accesskey="n" href="bt_conf.html">Next</a></td>
</tr>
</table>
<hr />
</div>
<div class="sect1" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h2 class="title" style="clear: both"><a id="general_am_conf"></a>General access method configuration</h2>
</div>
</div>
</div>
<div class="toc">
<dl>
<dt>
<span class="sect2">
<a href="general_am_conf.html#am_conf_pagesize">Selecting a page size</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="general_am_conf.html#am_conf_cachesize">Selecting a cache size</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="general_am_conf.html#am_conf_byteorder">Selecting a byte order</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="general_am_conf.html#am_conf_dup">Duplicate data items</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="general_am_conf.html#am_conf_malloc">Non-local memory allocation</a>
</span>
</dt>
</dl>
</div>
<p>
There are a series of configuration tasks which are common to all
access methods. They are described in the following sections.
</p>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="am_conf_pagesize"></a>Selecting a page size</h3>
</div>
</div>
</div>
<p>
The size of the pages used in the underlying database can be specified
by calling the <a href="../api_reference/C/dbset_pagesize.html" class="olink">DB-&gt;set_pagesize()</a> method. The minimum page size is 512
bytes and the maximum page size is 64K bytes, and must be a power of
two. If no page size is specified by the application, a page size is
selected based on the underlying filesystem I/O block size. (A page
size selected in this way has a lower limit of 512 bytes and an upper
limit of 16K bytes.)
</p>
<p>
There are several issues to consider when selecting a pagesize: overflow
record sizes, locking, I/O efficiency, and recoverability.
</p>
<p>
First, the page size implicitly sets the size of an overflow record.
Overflow records are key or data items that are too large to fit on a
normal database page because of their size, and are therefore stored in
overflow pages. Overflow pages are pages that exist outside of the
normal database structure. For this reason, there is often a
significant performance penalty associated with retrieving or modifying
overflow records. Selecting a page size that is too small, and which
forces the creation of large numbers of overflow pages, can seriously
impact the performance of an application.
</p>
<p>
Second, in the Btree, Hash and Recno access methods, the finest-grained
lock that Berkeley DB acquires is for a page. (The Queue access method
generally acquires record-level locks rather than page-level locks.)
Selecting a page size that is too large, and which causes threads or
processes to wait because other threads of control are accessing or
modifying records on the same page, can impact the performance of your
application.
</p>
<p>
Third, the page size specifies the granularity of I/O from the database
to the operating system. Berkeley DB will give a page-sized unit of
bytes to the operating system to be scheduled for reading/writing
from/to the disk. For many operating systems, there is an internal
<span class="bold"><strong>block size</strong></span> which is used as the
granularity of I/O from the operating system to the disk. Generally,
it will be more efficient for Berkeley DB to write filesystem-sized
blocks to the operating system and for the operating system to write
those same blocks to the disk.
</p>
<p>
Selecting a database page size smaller than the filesystem block size
may cause the operating system to coalesce or otherwise manipulate
Berkeley DB pages and can impact the performance of your application.
When the page size is smaller than the filesystem block size and a page
written by Berkeley DB is not found in the operating system's cache,
the operating system may be forced to read a block from the disk, copy
the page into the block it read, and then write out the block to disk,
rather than simply writing the page to disk. Additionally, as the
operating system is reading more data into its buffer cache than is
strictly necessary to satisfy each Berkeley DB request for a page, the
operating system buffer cache may be wasting memory.
</p>
<p>
Alternatively, selecting a page size larger than the filesystem block
size may cause the operating system to read more data than necessary.
On some systems, reading filesystem blocks sequentially may cause the
operating system to begin performing read-ahead. If requesting a
single database page implies reading enough filesystem blocks to
satisfy the operating system's criteria for read-ahead, the operating
system may do more I/O than is required.
</p>
<p>
Fourth, when using the Berkeley DB Transactional Data Store product,
the page size may affect the errors from which your database can
recover See <a class="xref" href="transapp_reclimit.html" title="Berkeley DB recoverability">Berkeley DB recoverability</a> for more information.
</p>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>
The <a href="../api_reference/C/db_tuner.html" class="olink">db_tuner</a> utility suggests a page size for btree databases that optimizes cache
efficiency and storage space requirements. This utility works only when given a pre-populated database.
So, it is useful when tuning an existing application and not when first implementing an application.
</p>
</div>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="am_conf_cachesize"></a>Selecting a cache size</h3>
</div>
</div>
</div>
<p>The size of the cache used for the underlying database can be specified
by calling the <a href="../api_reference/C/dbset_cachesize.html" class="olink">DB-&gt;set_cachesize()</a> method.
Choosing a cache size is, unfortunately, an art. Your cache must be at
least large enough for your working set plus some overlap for unexpected
situations.</p>
<p>When using the Btree access method, you must have a cache big enough for
the minimum working set for a single access. This will include a root
page, one or more internal pages (depending on the depth of your tree),
and a leaf page. If your cache is any smaller than that, each new page
will force out the least-recently-used page, and Berkeley DB will re-read the
root page of the tree anew on each database request.</p>
<p>If your keys are of moderate size (a few tens of bytes) and your pages
are on the order of 4KB to 8KB, most Btree applications will be only
three levels. For example, using 20 byte keys with 20 bytes of data
associated with each key, a 8KB page can hold roughly 400 keys (or 200
key/data pairs), so a fully populated three-level Btree will hold 32
million key/data pairs, and a tree with only a 50% page-fill factor will
still hold 16 million key/data pairs. We rarely expect trees to exceed
five levels, although Berkeley DB will support trees up to 255 levels.</p>
<p>The rule-of-thumb is that cache is good, and more cache is better.
Generally, applications benefit from increasing the cache size up to a
point, at which the performance will stop improving as the cache size
increases. When this point is reached, one of two things have happened:
either the cache is large enough that the application is almost never
having to retrieve information from disk, or, your application is doing
truly random accesses, and therefore increasing size of the cache doesn't
significantly increase the odds of finding the next requested information
in the cache. The latter is fairly rare -- almost all applications show
some form of locality of reference.</p>
<p>That said, it is important not to increase your cache size beyond the
capabilities of your system, as that will result in reduced performance.
Under many operating systems, tying down enough virtual memory will cause
your memory and potentially your program to be swapped. This is
especially likely on systems without unified OS buffer caches and virtual
memory spaces, as the buffer cache was allocated at boot time and so
cannot be adjusted based on application requests for large amounts of
virtual memory.</p>
<p>For example, even if accesses are truly random within a Btree, your
access pattern will favor internal pages to leaf pages, so your cache
should be large enough to hold all internal pages. In the steady state,
this requires at most one I/O per operation to retrieve the appropriate
leaf page.</p>
<p>You can use the <a href="../api_reference/C/db_stat.html" class="olink">db_stat</a> utility to monitor the effectiveness of
your cache. The following output is excerpted from the output of that
utility's <span class="bold"><strong>-m</strong></span> option:</p>
<pre class="programlisting">prompt: db_stat -m
131072 Cache size (128K).
4273 Requested pages found in the cache (97%).
134 Requested pages not found in the cache.
18 Pages created in the cache.
116 Pages read into the cache.
93 Pages written from the cache to the backing file.
5 Clean pages forced from the cache.
13 Dirty pages forced from the cache.
0 Dirty buffers written by trickle-sync thread.
130 Current clean buffer count.
4 Current dirty buffer count.
</pre>
<p>The statistics for this cache say that there have been 4,273 requests of
the cache, and only 116 of those requests required an I/O from disk. This
means that the cache is working well, yielding a 97% cache hit rate. The
<a href="../api_reference/C/db_stat.html" class="olink">db_stat</a> utility will present these statistics both for the cache
as a whole and for each file within the cache separately.</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="am_conf_byteorder"></a>Selecting a byte order</h3>
</div>
</div>
</div>
<p>Database files created by Berkeley DB can be created in either little- or
big-endian formats. The byte order used for the underlying database
is specified by calling the <a href="../api_reference/C/dbset_lorder.html" class="olink">DB-&gt;set_lorder()</a> method. If no order
is selected, the native format of the machine on which the database is
created will be used.</p>
<p>Berkeley DB databases are architecture independent, and any format database can
be used on a machine with a different native format. In this case, as
each page that is read into or written from the cache must be converted
to or from the host format, and databases with non-native formats will
incur a performance penalty for the run-time conversion.</p>
<p>
<span class="bold">
<strong>It is important to note that the Berkeley DB access methods do no data
conversion for application specified data. Key/data pairs written on a
little-endian format architecture will be returned to the application
exactly as they were written when retrieved on a big-endian format
architecture.</strong>
</span>
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="am_conf_dup"></a>Duplicate data items</h3>
</div>
</div>
</div>
<p>
The Btree and Hash access methods support the creation of multiple data
items for a single key item. By default, multiple data items are not
permitted, and each database store operation will overwrite any
previous data item for that key. To configure Berkeley DB for
duplicate data items, call the <a href="../api_reference/C/dbset_flags.html" class="olink">DB-&gt;set_flags()</a> method with the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUP" class="olink">DB_DUP</a>
flag. Only one copy of the key will be stored for each set of
duplicate data items. If the Btree access method comparison routine
returns that two keys compare equally, it is undefined which of the two
keys will be stored and returned from future database operations.
</p>
<p>
By default, Berkeley DB stores duplicates in the order in which they
were added, that is, each new duplicate data item will be stored after
any already existing data items. This default behavior can be
overridden by using the <a href="../api_reference/C/dbcput.html" class="olink">DBC-&gt;put()</a> method and one of the <a href="../api_reference/C/dbcput.html#dbcput_DB_AFTER" class="olink">DB_AFTER</a>,
<a href="../api_reference/C/dbcput.html#dbcput_DB_BEFORE" class="olink">DB_BEFORE</a>, <a href="../api_reference/C/dbcput.html#dbcput_DB_KEYFIRST" class="olink">DB_KEYFIRST</a> or <a href="../api_reference/C/dbcput.html#dbcput_DB_KEYLAST" class="olink">DB_KEYLAST</a> flags. Alternatively,
Berkeley DB may be configured to sort duplicate data items.
</p>
<p>
When stepping through the database sequentially, duplicate data items
will be returned individually, as a key/data pair, where the key item
only changes after the last duplicate data item has been returned. For
this reason, duplicate data items cannot be accessed using the <a href="../api_reference/C/dbget.html" class="olink">DB-&gt;get()</a>
method, as it always returns the first of the duplicate data items.
Duplicate data items should be retrieved using a Berkeley DB cursor
interface such as the <a href="../api_reference/C/dbcget.html" class="olink">DBC-&gt;get()</a> method.
</p>
<p>
There is a flag that permits applications to request the following data
item only if it <span class="bold"><strong>is</strong></span> a duplicate data
item of the current entry, see <a href="../api_reference/C/dbcget.html#dbcget_DB_NEXT_DUP" class="olink">DB_NEXT_DUP</a> for more information.
There is a flag that permits applications to request the following data
item only if it <span class="bold"><strong>is not</strong></span> a duplicate
data item of the current entry, see <a href="../api_reference/C/dbcget.html#dbcget_DB_NEXT_NODUP" class="olink">DB_NEXT_NODUP</a> and <a href="../api_reference/C/dbcget.html#dbcget_DB_PREV_NODUP" class="olink">DB_PREV_NODUP</a>
for more information.
</p>
<p>
It is also possible to maintain duplicate records in sorted order.
Sorting duplicates will significantly increase performance when
searching them and performing equality joins — both of which are
common operations when using secondary indices. To configure Berkeley
DB to sort duplicate data items, the application must call the
<a href="../api_reference/C/dbset_flags.html" class="olink">DB-&gt;set_flags()</a> method with the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUPSORT" class="olink">DB_DUPSORT</a> flag. Note that <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUPSORT" class="olink">DB_DUPSORT</a>
automatically turns on the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUP" class="olink">DB_DUP</a> flag for you, so you do not
have to also set that flag; however, it is not an error to also set <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUP" class="olink">DB_DUP</a>
when configuring for sorted duplicate records.
</p>
<p>
When configuring sorted duplicate records, you can also specify a
custom comparison function using the <a href="../api_reference/C/dbset_dup_compare.html" class="olink">DB-&gt;set_dup_compare()</a> method. If
the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUPSORT" class="olink">DB_DUPSORT</a> flag is given, but no comparison routine is specified,
then Berkeley DB defaults to the same lexicographical sorting used for
Btree keys, with shorter items collating before longer items.
</p>
<p>
If the duplicate data items are unsorted, applications may store
identical duplicate data items, or, for those that just like the way it
sounds, <span class="emphasis"><em>duplicate duplicates</em></span>.
</p>
<p>
<span class="bold"><strong>It is an error to attempt to store identical
duplicate data items when duplicates are being stored in a sorted
order.</strong></span> Any such attempt results in the
error message "Duplicate data items are not supported with sorted
data" with a <code class="literal">DB_KEYEXIST</code> return code.
</p>
<p>
Note that you can suppress the error message "Duplicate data items are
not supported with sorted data" by using the <a href="../api_reference/C/dbput.html#put_DB_NODUPDATA" class="olink">DB_NODUPDATA</a> flag. Use
of this flag does not change the database's basic behavior; storing
duplicate data items in a database configured for sorted duplicates is
still an error and so you will continue to receive the
<code class="literal">DB_KEYEXIST</code> return code if you try to do that.
</p>
<p>
For further information on how searching and insertion behaves in the
presence of duplicates (sorted or not), see the <a href="../api_reference/C/dbget.html" class="olink">DB-&gt;get()</a> <a href="../api_reference/C/dbput.html" class="olink">DB-&gt;put()</a>,
<a href="../api_reference/C/dbcget.html" class="olink">DBC-&gt;get()</a> and <a href="../api_reference/C/dbcput.html" class="olink">DBC-&gt;put()</a> documentation.
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="am_conf_malloc"></a>Non-local memory allocation</h3>
</div>
</div>
</div>
<p>Berkeley DB allocates memory for returning key/data pairs and statistical
information which becomes the responsibility of the application.
There are also interfaces where an application will allocate memory
which becomes the responsibility of Berkeley DB.</p>
<p>On systems in which there may be multiple library versions of the
standard allocation routines (notably Windows NT), transferring memory
between the library and the application will fail because the Berkeley DB
library allocates memory from a different heap than the application
uses to free it, or vice versa. To avoid this problem, the
<a href="../api_reference/C/envset_alloc.html" class="olink">DB_ENV-&gt;set_alloc()</a> and <a href="../api_reference/C/dbset_alloc.html" class="olink">DB-&gt;set_alloc()</a> methods can be used to
give Berkeley DB references to the application's allocation routines.</p>
</div>
</div>
<div class="navfooter">
<hr />
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left"><a accesskey="p" href="am_conf_logrec.html">Prev</a> </td>
<td width="20%" align="center">
<a accesskey="u" href="am_conf.html">Up</a>
</td>
<td width="40%" align="right"> <a accesskey="n" href="bt_conf.html">Next</a></td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Logical record numbers </td>
<td width="20%" align="center">
<a accesskey="h" href="index.html">Home</a>
</td>
<td width="40%" align="right" valign="top"> Btree access method specific configuration</td>
</tr>
</table>
</div>
</body>
</html>