mirror of
https://github.com/berkeleydb/libdb.git
synced 2024-11-16 17:16:25 +00:00
412 lines
17 KiB
HTML
412 lines
17 KiB
HTML
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||
<head>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
||
<title>Partitioning databases</title>
|
||
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
|
||
<meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
|
||
<link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
|
||
<link rel="up" href="am.html" title="Chapter 3. Access Method Operations" />
|
||
<link rel="prev" href="am_opensub.html" title="Opening multiple databases in a single file" />
|
||
<link rel="next" href="am_get.html" title="Retrieving records" />
|
||
</head>
|
||
<body>
|
||
<div xmlns="" class="navheader">
|
||
<div class="libver">
|
||
<p>Library Version 11.2.5.2</p>
|
||
</div>
|
||
<table width="100%" summary="Navigation header">
|
||
<tr>
|
||
<th colspan="3" align="center">Partitioning databases</th>
|
||
</tr>
|
||
<tr>
|
||
<td width="20%" align="left"><a accesskey="p" href="am_opensub.html">Prev</a> </td>
|
||
<th width="60%" align="center">Chapter 3.
|
||
Access Method Operations
|
||
</th>
|
||
<td width="20%" align="right"> <a accesskey="n" href="am_get.html">Next</a></td>
|
||
</tr>
|
||
</table>
|
||
<hr />
|
||
</div>
|
||
<div class="sect1" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h2 class="title" style="clear: both"><a id="am_partition"></a>Partitioning databases</h2>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="toc">
|
||
<dl>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="am_partition.html#am_partition_keys">Specifying partition keys</a>
|
||
</span>
|
||
</dt>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="am_partition.html#am_partition_function">Partitioning callback</a>
|
||
</span>
|
||
</dt>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="am_partition.html#partition_file_placement">Placing partition files</a>
|
||
</span>
|
||
</dt>
|
||
</dl>
|
||
</div>
|
||
<p>
|
||
You can improve concurrency on your database reads and writes by
|
||
splitting access to a single database into multiple databases. This
|
||
helps to avoid contention for internal database pages, as well as
|
||
allowing you to spread your databases across multiple disks,
|
||
which can help to improve disk I/O.
|
||
</p>
|
||
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
|
||
<h3 class="title">Note</h3>
|
||
<p>
|
||
Database partitions are not supported by the C# and Java APIs at
|
||
this time.
|
||
</p>
|
||
</div>
|
||
<p>
|
||
While you can manually do this by creating and using more than one
|
||
database for your data, DB is capable of partitioning your
|
||
database for you. When you use DB's built-in database partitioning
|
||
feature, your access to your data is performed in exactly the same way
|
||
as if you were only using one database; all the work of knowing which
|
||
database to use to access a particular record is handled for you under
|
||
the hood.
|
||
</p>
|
||
<p>
|
||
Only the BTree and Hash access methods are supported for partitioned
|
||
databases.
|
||
</p>
|
||
<p>
|
||
You indicate that you want your database to be partitioned by calling
|
||
<a href="../api_reference/C/dbset_partition.html" class="olink">DB->set_partition()</a> before opening your database the first time. You can
|
||
indicate the directory in which each partition is contained using the
|
||
<a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB->set_partition_dirs()</a> method.
|
||
</p>
|
||
<p>
|
||
Once you have partitioned a database, you cannot change your
|
||
partitioning scheme.
|
||
</p>
|
||
<p>
|
||
There are two ways to indicate what key/data pairs should go on which
|
||
partition. The first is by specifying an array of <a href="../api_reference/C/dbt.html" class="olink">DBT</a>s that indicate
|
||
the minimum key value for a given partition. The second is by providing
|
||
a callback that returns the number of the partition on which a specified
|
||
key is placed.
|
||
</p>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="am_partition_keys"></a>Specifying partition keys</h3>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<p>
|
||
For simple cases, you can partition your database by providing
|
||
an array of <a href="../api_reference/C/dbt.html" class="olink">DBT</a>s, each element of which provides the minimum
|
||
key value to be placed on a partition. There must be one fewer
|
||
elements in this array than you have partitions. The first
|
||
element of the array indicates the minimum key value for the
|
||
second partition in your database. Key values that are less
|
||
than the first key value provided in this array are placed on
|
||
the first partition (partition 0).
|
||
</p>
|
||
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
|
||
<h3 class="title">Note</h3>
|
||
<p>
|
||
You can use partition keys only if you are using the Btree
|
||
access method.
|
||
</p>
|
||
</div>
|
||
<p>
|
||
For example, suppose you had a database of fruit, and you want
|
||
three partitions for your database. Then you need a <a href="../api_reference/C/dbt.html" class="olink">DBT</a> array
|
||
of size two. The first element in this array indicates the
|
||
minimum keys that should be placed on partition 1. The second
|
||
element in this array indicates the minimum key value placed on
|
||
partition 2. Keys that compare less than the first <a href="../api_reference/C/dbt.html" class="olink">DBT</a> in the
|
||
array are placed on partition 0.
|
||
</p>
|
||
<p>
|
||
All comparisons are performed according to the lexicographic
|
||
comparison used by your platform.
|
||
</p>
|
||
<p>
|
||
For example, suppose you want all fruits whose names begin
|
||
with:
|
||
</p>
|
||
<div class="itemizedlist">
|
||
<ul type="disc">
|
||
<li>
|
||
<p>
|
||
'a' - 'f' to go on partition 0
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
'g' - 'p' to go on partition 1
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
'q' - 'z' to go on partition 2.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</div>
|
||
<p>
|
||
Then you would accomplish this with the following code
|
||
fragment:
|
||
</p>
|
||
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
|
||
<h3 class="title">Note</h3>
|
||
<p>
|
||
The <a href="../api_reference/C/dbset_partition.html" class="olink">DB->set_partition()</a> partition callback parameter must
|
||
be <code class="literal">NULL</code> if you are using an array of
|
||
<a href="../api_reference/C/dbt.html" class="olink">DBT</a>s to partition your database.
|
||
</p>
|
||
</div>
|
||
<a id="prog_am10"></a>
|
||
<pre class="programlisting">DB *dbp = NULL;
|
||
DB_ENV *envp = NULL;
|
||
DBT partKeys[2];
|
||
u_int32_t db_flags;
|
||
const char *file_name = "mydb.db";
|
||
int ret;
|
||
|
||
...
|
||
|
||
/* Skipping environment open to shorten this example */
|
||
|
||
|
||
/* Initialize the DB handle */
|
||
ret = db_create(&dbp, envp, 0);
|
||
if (ret != 0) {
|
||
fprintf(stderr, "%s\n", db_strerror(ret));
|
||
return (EXIT_FAILURE);
|
||
}
|
||
|
||
/* Setup the partition keys */
|
||
memset(&partKeys[0], 0, sizeof(DBT));
|
||
partKeys[0].data = "g";
|
||
partKeys[0].size = sizeof("g") - 1;
|
||
|
||
memset(&partKeys[1], 0, sizeof(DBT));
|
||
partKeys[1].data = "q";
|
||
partKeys[1].size = sizeof("q") - 1;
|
||
|
||
dbp->set_partition(dbp, 3, partKeys, NULL);
|
||
|
||
/* Now open the database */
|
||
db_flags = DB_CREATE; /* Allow database creation */
|
||
|
||
ret = dbp->open(dbp, /* Pointer to the database */
|
||
NULL, /* Txn pointer */
|
||
file_name, /* File name */
|
||
NULL, /* Logical db name */
|
||
DB_BTREE, /* Database type (using btree) */
|
||
db_flags, /* Open flags */
|
||
0); /* File mode. Using defaults */
|
||
if (ret != 0) {
|
||
dbp->err(dbp, ret, "Database '%s' open failed",
|
||
file_name);
|
||
return (EXIT_FAILURE);
|
||
} </pre>
|
||
</div>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="am_partition_function"></a>Partitioning callback</h3>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<p>
|
||
In some cases, a simple lexicographical comparison of key data
|
||
will not sufficiently support a partitioning scheme. For
|
||
those situations, you should write a partitioning function.
|
||
This function accepts a pointer to the <a href="../api_reference/C/db.html" class="olink">DB</a> and the <a href="../api_reference/C/dbt.html" class="olink">DBT</a>, and
|
||
it returns the number of the partition on which the key
|
||
belongs.
|
||
</p>
|
||
<p>
|
||
Note that <a href="../api_reference/C/db.html" class="olink">DB</a> actually places the key on the partition
|
||
calculated by:
|
||
</p>
|
||
<pre class="programlisting">returned_partition modulo number_of_partitions</pre>
|
||
<p>
|
||
Also, remember that if you use a partitioning function when you
|
||
create your database, then you must use the same partitioning
|
||
function every time you open that database in the future.
|
||
</p>
|
||
<p>
|
||
The following code fragment illustrates a partition callback:
|
||
</p>
|
||
<a id="prog_am11"></a>
|
||
<pre class="programlisting">u_int32_t db_partition_fn(DB *db, DBT *key) {
|
||
char *key_data;
|
||
u_int32_t ret_number;
|
||
/* Obtain your key data, unpacking it as necessary
|
||
* Here, we do the very simple thing just for illustrative purposes.
|
||
*/
|
||
|
||
key_data = (char *)key->data;
|
||
|
||
/* Here you would perform whatever comparison you require to determine
|
||
* what partition the key belongs on. If you return either 0 or the
|
||
* number of partitions in the database, the key is placed in the first
|
||
* database partition. Else, it is placed on:
|
||
*
|
||
* returned_number mod number_of_partitions
|
||
*/
|
||
|
||
ret_number = 0;
|
||
|
||
return ret_number;
|
||
} </pre>
|
||
<p>
|
||
You then cause your partition callback to be used by providing it
|
||
to the <a href="../api_reference/C/dbset_partition.html" class="olink">DB->set_partition()</a> method, as illustrated by the following
|
||
code fragment.
|
||
</p>
|
||
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
|
||
<h3 class="title">Note</h3>
|
||
<p>
|
||
The <a href="../api_reference/C/dbset_partition.html" class="olink">DB->set_partition()</a> <a href="../api_reference/C/dbt.html" class="olink">DBT</a> array parameter must
|
||
be <code class="literal">NULL</code> if you are using a partition
|
||
call back to partition your database.
|
||
</p>
|
||
</div>
|
||
<a id="prog_am12"></a>
|
||
<pre class="programlisting">DB *dbp = NULL;
|
||
DB_ENV *envp = NULL;
|
||
u_int32_t db_flags;
|
||
const char *file_name = "mydb.db";
|
||
int ret;
|
||
|
||
...
|
||
|
||
/* Skipping environment open to shorten this example */
|
||
|
||
|
||
/* Initialize the DB handle */
|
||
ret = db_create(&dbp, envp, 0);
|
||
if (ret != 0) {
|
||
fprintf(stderr, "%s\n", db_strerror(ret));
|
||
return (EXIT_FAILURE);
|
||
}
|
||
|
||
dbp->set_partition(dbp, 3, NULL, db_partition_fn);
|
||
|
||
/* Now open the database */
|
||
db_flags = DB_CREATE; /* Allow database creation */
|
||
|
||
ret = dbp->open(dbp, /* Pointer to the database */
|
||
NULL, /* Txn pointer */
|
||
file_name, /* File name */
|
||
NULL, /* Logical db name */
|
||
DB_BTREE, /* Database type (using btree) */
|
||
db_flags, /* Open flags */
|
||
0); /* File mode. Using defaults */
|
||
if (ret != 0) {
|
||
dbp->err(dbp, ret, "Database '%s' open failed",
|
||
file_name);
|
||
return (EXIT_FAILURE);
|
||
} </pre>
|
||
</div>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="partition_file_placement"></a>Placing partition files</h3>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<p>
|
||
When you partition a database, a database file is created on
|
||
disk in the same way as if you were not partitioning the
|
||
database. That is, this file uses the name you provide to the
|
||
<a href="../api_reference/C/dbopen.html" class="olink">DB->open()</a> <code class="literal">file</code> parameter.
|
||
</p>
|
||
<p>
|
||
However, DB then also creates a series of database files on
|
||
disk, one for each partition that you want to use. These
|
||
partition files share the same name as the database file name,
|
||
but are also number sequentially. So if you create a database
|
||
named <code class="filename">mydb.db</code>, and you create 3 partitions
|
||
for it, then you will see the following database files on disk:
|
||
</p>
|
||
<pre class="programlisting"> mydb.db
|
||
__dbp.mydb.db.000
|
||
__dbp.mydb.db.001
|
||
__dbp.mydb.db.002 </pre>
|
||
<p>
|
||
All of the database's contents go into the numbered database
|
||
files. You can cause these files to be placed in different
|
||
directories (and, hence, different disk partitions or even
|
||
disks) by using the <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB->set_partition_dirs()</a> method.
|
||
</p>
|
||
<p>
|
||
<a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB->set_partition_dirs()</a> takes a NULL-terminated array of
|
||
strings, each one of which should represent an existing
|
||
filesystem directory.
|
||
</p>
|
||
<p>
|
||
If you are using an environment, the directories specified
|
||
using <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB->set_partition_dirs()</a> must also be included in the
|
||
environment list specified by <a href="../api_reference/C/envadd_data_dir.html" class="olink">DB_ENV->add_data_dir()</a>.
|
||
</p>
|
||
<p>
|
||
If you are not using an environment, then the the directories
|
||
specified to <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB->set_partition_dirs()</a> can be either complete
|
||
paths to currently existing directories, or paths relative to
|
||
the application's current working directory.
|
||
</p>
|
||
<p>
|
||
Ideally, you will provide <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB->set_partition_dirs()</a> with an array
|
||
that is the same size as the number of partitions you are
|
||
creating for your database. Partition files are then placed
|
||
according to the order that directories are contained in the
|
||
array; partition 0 is placed in directory_array[0], partition 1
|
||
in directory_array[1], and so forth. However, if you provide an
|
||
array of directories that is smaller than the number of
|
||
database partitions, then the directories are used on a
|
||
round-robin fashion.
|
||
</p>
|
||
<p>
|
||
You must call <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB->set_partition_dirs()</a> before you create your
|
||
database, and before you open your database each time
|
||
thereafter. The array provided to <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB->set_partition_dirs()</a> must not
|
||
change after the database has been created.
|
||
</p>
|
||
</div>
|
||
</div>
|
||
<div class="navfooter">
|
||
<hr />
|
||
<table width="100%" summary="Navigation footer">
|
||
<tr>
|
||
<td width="40%" align="left"><a accesskey="p" href="am_opensub.html">Prev</a> </td>
|
||
<td width="20%" align="center">
|
||
<a accesskey="u" href="am.html">Up</a>
|
||
</td>
|
||
<td width="40%" align="right"> <a accesskey="n" href="am_get.html">Next</a></td>
|
||
</tr>
|
||
<tr>
|
||
<td width="40%" align="left" valign="top">Opening multiple databases in a single file </td>
|
||
<td width="20%" align="center">
|
||
<a accesskey="h" href="index.html">Home</a>
|
||
</td>
|
||
<td width="40%" align="right" valign="top"> Retrieving records</td>
|
||
</tr>
|
||
</table>
|
||
</div>
|
||
</body>
|
||
</html>
|