mirror of
https://github.com/berkeleydb/libdb.git
synced 2024-11-17 01:26:25 +00:00
176 lines
9.9 KiB
HTML
176 lines
9.9 KiB
HTML
|
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
|||
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|||
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|||
|
<head>
|
|||
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
|||
|
<title>Access method tuning</title>
|
|||
|
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
|
|||
|
<meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
|
|||
|
<link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
|
|||
|
<link rel="up" href="am_misc.html" title="Chapter 4. Access Method Wrapup" />
|
|||
|
<link rel="prev" href="am_misc_db_sql.html" title="Specifying a Berkeley DB schema using SQL DDL" />
|
|||
|
<link rel="next" href="am_misc_faq.html" title="Access method FAQ" />
|
|||
|
</head>
|
|||
|
<body>
|
|||
|
<div xmlns="" class="navheader">
|
|||
|
<div class="libver">
|
|||
|
<p>Library Version 11.2.5.2</p>
|
|||
|
</div>
|
|||
|
<table width="100%" summary="Navigation header">
|
|||
|
<tr>
|
|||
|
<th colspan="3" align="center">Access method tuning</th>
|
|||
|
</tr>
|
|||
|
<tr>
|
|||
|
<td width="20%" align="left"><a accesskey="p" href="am_misc_db_sql.html">Prev</a> </td>
|
|||
|
<th width="60%" align="center">Chapter 4.
|
|||
|
Access Method Wrapup
|
|||
|
</th>
|
|||
|
<td width="20%" align="right"> <a accesskey="n" href="am_misc_faq.html">Next</a></td>
|
|||
|
</tr>
|
|||
|
</table>
|
|||
|
<hr />
|
|||
|
</div>
|
|||
|
<div class="sect1" lang="en" xml:lang="en">
|
|||
|
<div class="titlepage">
|
|||
|
<div>
|
|||
|
<div>
|
|||
|
<h2 class="title" style="clear: both"><a id="am_misc_tune"></a>Access method tuning</h2>
|
|||
|
</div>
|
|||
|
</div>
|
|||
|
</div>
|
|||
|
<p>There are a few different issues to consider when tuning the performance
|
|||
|
of Berkeley DB access method applications.</p>
|
|||
|
<div class="variablelist">
|
|||
|
<dl>
|
|||
|
<dt>
|
|||
|
<span class="term">access method</span>
|
|||
|
</dt>
|
|||
|
<dd>An application's choice of a database access method can significantly
|
|||
|
affect performance. Applications using fixed-length records and integer
|
|||
|
keys are likely to get better performance from the Queue access method.
|
|||
|
Applications using variable-length records are likely to get better
|
|||
|
performance from the Btree access method, as it tends to be faster for
|
|||
|
most applications than either the Hash or Recno access methods. Because
|
|||
|
the access method APIs are largely identical between the Berkeley DB access
|
|||
|
methods, it is easy for applications to benchmark the different access
|
|||
|
methods against each other. See <a class="xref" href="am_conf_select.html" title="Selecting an access method">Selecting an access method</a> for more information.</dd>
|
|||
|
<dt>
|
|||
|
<span class="term">cache size</span>
|
|||
|
</dt>
|
|||
|
<dd>The Berkeley DB database cache defaults to a fairly small size, and most
|
|||
|
applications concerned with performance will want to set it explicitly.
|
|||
|
Using a too-small cache will result in horrible performance. The first
|
|||
|
step in tuning the cache size is to use the db_stat utility (or the
|
|||
|
statistics returned by the <a href="../api_reference/C/dbstat.html" class="olink">DB->stat()</a> function) to measure the
|
|||
|
effectiveness of the cache. The goal is to maximize the cache's hit
|
|||
|
rate. Typically, increasing the size of the cache until the hit rate
|
|||
|
reaches 100% or levels off will yield the best performance. However,
|
|||
|
if your working set is sufficiently large, you will be limited by the
|
|||
|
system's available physical memory. Depending on the virtual memory
|
|||
|
and file system buffering policies of your system, and the requirements
|
|||
|
of other applications, the maximum cache size will be some amount
|
|||
|
smaller than the size of physical memory. If you find that
|
|||
|
the <a href="../api_reference/C/db_stat.html" class="olink">db_stat</a> utility shows that increasing the cache size improves your hit
|
|||
|
rate, but performance is not improving (or is getting worse), then it's
|
|||
|
likely you've hit other system limitations. At this point, you should
|
|||
|
review the system's swapping/paging activity and limit the size of the
|
|||
|
cache to the maximum size possible without triggering paging activity.
|
|||
|
Finally, always remember to make your measurements under conditions as
|
|||
|
close as possible to the conditions your deployed application will run
|
|||
|
under, and to test your final choices under worst-case conditions.</dd>
|
|||
|
<dt>
|
|||
|
<span class="term">shared memory</span>
|
|||
|
</dt>
|
|||
|
<dd>By default, Berkeley DB creates its database environment shared regions in
|
|||
|
filesystem backed memory. Some systems do not distinguish between
|
|||
|
regular filesystem pages and memory-mapped pages backed by the
|
|||
|
filesystem, when selecting dirty pages to be flushed back to disk. For
|
|||
|
this reason, dirtying pages in the Berkeley DB cache may cause intense
|
|||
|
filesystem activity, typically when the filesystem sync thread or
|
|||
|
process is run. In some cases, this can dramatically affect application
|
|||
|
throughput. The workaround to this problem is to create the shared
|
|||
|
regions in system shared memory (<a href="../api_reference/C/envopen.html#envopen_DB_SYSTEM_MEM" class="olink">DB_SYSTEM_MEM</a>) or application
|
|||
|
private memory (<a href="../api_reference/C/envopen.html#envopen_DB_PRIVATE" class="olink">DB_PRIVATE</a>), or, in cases where this behavior
|
|||
|
is configurable, to turn off the operating system's flushing of
|
|||
|
memory-mapped pages.</dd>
|
|||
|
<dt>
|
|||
|
<span class="term">large key/data items</span>
|
|||
|
</dt>
|
|||
|
<dd>Storing large key/data items in a database can alter the performance
|
|||
|
characteristics of Btree, Hash and Recno databases. The first parameter
|
|||
|
to consider is the database page size. When a key/data item is too
|
|||
|
large to be placed on a database page, it is stored on "overflow" pages
|
|||
|
that are maintained outside of the normal database structure (typically,
|
|||
|
items that are larger than one-quarter of the page size are deemed to
|
|||
|
be too large). Accessing these overflow pages requires at least one
|
|||
|
additional page reference over a normal access, so it is usually better
|
|||
|
to increase the page size than to create a database with a large number
|
|||
|
of overflow pages. Use the <a href="../api_reference/C/db_stat.html" class="olink">db_stat</a> utility (or the statistics
|
|||
|
returned by the <a href="../api_reference/C/dbstat.html" class="olink">DB->stat()</a> method) to review the number of overflow
|
|||
|
pages in the database.
|
|||
|
<p>The second issue is using large key/data items instead of duplicate data
|
|||
|
items. While this can offer performance gains to some applications
|
|||
|
(because it is possible to retrieve several data items in a single get
|
|||
|
call), once the key/data items are large enough to be pushed off-page,
|
|||
|
they will slow the application down. Using duplicate data items is
|
|||
|
usually the better choice in the long run.</p></dd>
|
|||
|
</dl>
|
|||
|
</div>
|
|||
|
<p>A common question when tuning Berkeley DB applications is scalability. For
|
|||
|
example, people will ask why, when adding additional threads or
|
|||
|
processes to an application, the overall database throughput decreases,
|
|||
|
even when all of the operations are read-only queries.</p>
|
|||
|
<p>First, while read-only operations are logically concurrent, they still
|
|||
|
have to acquire mutexes on internal Berkeley DB data structures. For example,
|
|||
|
when searching a linked list and looking for a database page, the linked
|
|||
|
list has to be locked against other threads of control attempting to add
|
|||
|
or remove pages from the linked list. The more threads of control you
|
|||
|
add, the more contention there will be for those shared data structure
|
|||
|
resources.</p>
|
|||
|
<p>Second, once contention starts happening, applications will also start
|
|||
|
to see threads of control convoy behind locks (especially on
|
|||
|
architectures supporting only test-and-set spin mutexes, rather than
|
|||
|
blocking mutexes). On test-and-set architectures, threads of control
|
|||
|
waiting for locks must attempt to acquire the mutex, sleep, check the
|
|||
|
mutex again, and so on. Each failed check of the mutex and subsequent
|
|||
|
sleep wastes CPU and decreases the overall throughput of the system.</p>
|
|||
|
<p>Third, every time a thread acquires a shared mutex, it has to shoot down
|
|||
|
other references to that memory in every other CPU on the system. Many
|
|||
|
modern snoopy cache architectures have slow shoot down characteristics.</p>
|
|||
|
<p>Fourth, schedulers don't care what application-specific mutexes a thread
|
|||
|
of control might hold when de-scheduling a thread. If a thread of
|
|||
|
control is descheduled while holding a shared data structure mutex,
|
|||
|
other threads of control will be blocked until the scheduler decides to
|
|||
|
run the blocking thread of control again. The more threads of control
|
|||
|
that are running, the smaller their quanta of CPU time, and the more
|
|||
|
likely they will be descheduled while holding a Berkeley DB mutex.</p>
|
|||
|
<p>The results of adding new threads of control to an application, on the
|
|||
|
application's throughput, is application and hardware specific and
|
|||
|
almost entirely dependent on the application's data access pattern and
|
|||
|
hardware. In general, using operating systems that support blocking
|
|||
|
mutexes will often make a tremendous difference, and limiting threads
|
|||
|
of control to to some small multiple of the number of CPUs is usually
|
|||
|
the right choice to make.</p>
|
|||
|
</div>
|
|||
|
<div class="navfooter">
|
|||
|
<hr />
|
|||
|
<table width="100%" summary="Navigation footer">
|
|||
|
<tr>
|
|||
|
<td width="40%" align="left"><a accesskey="p" href="am_misc_db_sql.html">Prev</a> </td>
|
|||
|
<td width="20%" align="center">
|
|||
|
<a accesskey="u" href="am_misc.html">Up</a>
|
|||
|
</td>
|
|||
|
<td width="40%" align="right"> <a accesskey="n" href="am_misc_faq.html">Next</a></td>
|
|||
|
</tr>
|
|||
|
<tr>
|
|||
|
<td width="40%" align="left" valign="top">Specifying a Berkeley DB schema using SQL DDL </td>
|
|||
|
<td width="20%" align="center">
|
|||
|
<a accesskey="h" href="index.html">Home</a>
|
|||
|
</td>
|
|||
|
<td width="40%" align="right" valign="top"> Access method FAQ</td>
|
|||
|
</tr>
|
|||
|
</table>
|
|||
|
</div>
|
|||
|
</body>
|
|||
|
</html>
|