mirror of
https://github.com/berkeleydb/libdb.git
synced 2024-11-16 09:06:25 +00:00
315 lines
15 KiB
HTML
315 lines
15 KiB
HTML
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||
<head>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
||
<title>What is Berkeley DB?</title>
|
||
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
|
||
<meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
|
||
<link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
|
||
<link rel="up" href="intro.html" title="Chapter 1. Introduction" />
|
||
<link rel="prev" href="intro_terrain.html" title="Mapping the terrain: theory and practice" />
|
||
<link rel="next" href="intro_dbisnot.html" title="What Berkeley DB is not" />
|
||
</head>
|
||
<body>
|
||
<div xmlns="" class="navheader">
|
||
<div class="libver">
|
||
<p>Library Version 11.2.5.3</p>
|
||
</div>
|
||
<table width="100%" summary="Navigation header">
|
||
<tr>
|
||
<th colspan="3" align="center">What is Berkeley DB?</th>
|
||
</tr>
|
||
<tr>
|
||
<td width="20%" align="left"><a accesskey="p" href="intro_terrain.html">Prev</a> </td>
|
||
<th width="60%" align="center">Chapter 1.
|
||
Introduction
|
||
</th>
|
||
<td width="20%" align="right"> <a accesskey="n" href="intro_dbisnot.html">Next</a></td>
|
||
</tr>
|
||
</table>
|
||
<hr />
|
||
</div>
|
||
<div class="sect1" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h2 class="title" style="clear: both"><a id="intro_dbis"></a>What is Berkeley DB?</h2>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="toc">
|
||
<dl>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="intro_dbis.html#idm1665072">Data Access Services</a>
|
||
</span>
|
||
</dt>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="intro_dbis.html#idm1554168">Data management services</a>
|
||
</span>
|
||
</dt>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="intro_dbis.html#idm157888">Design</a>
|
||
</span>
|
||
</dt>
|
||
</dl>
|
||
</div>
|
||
<p>
|
||
So far, we have discussed database systems in general terms. It is
|
||
time now to consider Berkeley DB in particular and see how it fits
|
||
into the framework we have introduced. The key question is, what
|
||
kinds of applications should use Berkeley DB?
|
||
</p>
|
||
<p>
|
||
Berkeley DB is an Open Source embedded database library that
|
||
provides scalable, high-performance, transaction-protected data
|
||
management services to applications. Berkeley DB provides a simple
|
||
function-call API for data access and management.
|
||
</p>
|
||
<p>
|
||
By "Open Source," we mean Berkeley DB is distributed under a
|
||
license that conforms to the
|
||
<a class="ulink" href="http://www.opensource.org/osd.html" target="_top"> Open Source Definition</a>.
|
||
This license guarantees Berkeley DB is freely available for use and
|
||
redistribution in other Open Source applications. Oracle
|
||
Corporation sells commercial licenses allowing the redistribution
|
||
of Berkeley DB in proprietary applications. In all cases the
|
||
complete source code for Berkeley DB is freely available for
|
||
download and use.
|
||
</p>
|
||
<p>
|
||
Berkeley DB is "embedded" because it links directly into the
|
||
application. It runs in the same address space as the application.
|
||
As a result, no inter-process communication, either over the
|
||
network or between processes on the same machine, is required for
|
||
database operations. Berkeley DB provides a simple function-call
|
||
API for a number of programming languages, including C, C++, Java,
|
||
Perl, Tcl, Python, and PHP. All database operations happen inside
|
||
the library. Multiple processes, or multiple threads in a single
|
||
process, can all use the database at the same time as each uses the
|
||
Berkeley DB library. Low-level services like locking, transaction
|
||
logging, shared buffer management, memory management, and so on are
|
||
all handled transparently by the library.
|
||
</p>
|
||
<p>
|
||
The Berkeley DB library is extremely portable. It runs under almost
|
||
all UNIX and Linux variants, Windows, and a number of embedded
|
||
real-time operating systems. It runs on both 32-bit and 64-bit
|
||
systems. It has been deployed on high-end Internet servers,
|
||
desktop machines, and on palmtop computers, set-top boxes, in
|
||
network switches, and elsewhere. Once Berkeley DB is linked into
|
||
the application, the end user generally does not know that there is
|
||
a database present at all.
|
||
</p>
|
||
<p>
|
||
Berkeley DB is scalable in a number of respects. The database library
|
||
itself is quite compact (under 300 kilobytes of text space on common
|
||
architectures), which means it is small enough to run in tightly
|
||
constrained embedded systems, but yet it can take advantage of
|
||
gigabytes of memory and terabytes of disk if you are using hardware that
|
||
has those resources.
|
||
</p>
|
||
<p>
|
||
Each of Berkeley DB's database files can contain up to 256
|
||
terabytes of data, assuming the underlying filesystem is capable of supporting
|
||
files of that size. Note that Berkeley DB applications often use
|
||
multiple database files. This means that the amount of data your
|
||
Berkeley DB application can manage is really limited only by the
|
||
constraints imposed by your operating system, filesystem, and physical
|
||
hardware.
|
||
</p>
|
||
<p>
|
||
Berkeley DB also supports high concurrency, allowing thousands of users
|
||
to operate on the same database files at the same time.
|
||
</p>
|
||
<p>
|
||
Berkeley DB generally outperforms relational and object-oriented
|
||
database systems in embedded applications for a couple of reasons.
|
||
First, because the library runs in the same address space, no
|
||
inter-process communication is required for database operations.
|
||
The cost of communicating between processes on a single machine, or
|
||
among machines on a network, is much higher than the cost of making
|
||
a function call. Second, because Berkeley DB uses a simple
|
||
function-call interface for all operations, there is no query
|
||
language to parse, and no execution plan to produce.
|
||
</p>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="idm1665072"></a>Data Access Services</h3>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<p>
|
||
Berkeley DB applications can choose the storage structure that
|
||
best suits the application. Berkeley DB supports hash tables,
|
||
Btrees, simple record-number-based storage, and persistent
|
||
queues. Programmers can create tables using any of these
|
||
storage structures, and can mix operations on different kinds
|
||
of tables in a single application.
|
||
</p>
|
||
<p>
|
||
Hash tables are generally good for very large databases that
|
||
need predictable search and update times for random-access
|
||
records. Hash tables allow users to ask, "Does this key
|
||
exist?" or to fetch a record with a known key. Hash tables do
|
||
not allow users to ask for records with keys that are close to
|
||
a known key.
|
||
</p>
|
||
<p>
|
||
Btrees are better for range-based searches, as when the
|
||
application needs to find all records with keys between some
|
||
starting and ending value. Btrees also do a better job of
|
||
exploiting <span class="emphasis"><em>locality of reference</em></span>. If the
|
||
application is likely to touch keys near each other at the same
|
||
time, the Btrees work well. The tree structure keeps keys that
|
||
are close together near one another in storage, so fetching
|
||
nearby values usually does not require a disk access.
|
||
</p>
|
||
<p>
|
||
Record-number-based storage is natural for applications that
|
||
need to store and fetch records, but that do not have a simple
|
||
way to generate keys of their own. In a record number table,
|
||
the record number is the key for the record. Berkeley DB will
|
||
generate these record numbers automatically.
|
||
</p>
|
||
<p>
|
||
Queues are well-suited for applications that create records,
|
||
and then must deal with those records in creation order. A good
|
||
example is on-line purchasing systems. Orders can enter the
|
||
system at any time, but should generally be filled in the order
|
||
in which they were placed.
|
||
</p>
|
||
</div>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="idm1554168"></a>Data management services</h3>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<p>
|
||
Berkeley DB offers important data management services,
|
||
including concurrency, transactions, and recovery. All of these
|
||
services work on all of the storage structures.
|
||
</p>
|
||
<p>
|
||
Many users can work on the same database concurrently. Berkeley
|
||
DB handles locking transparently, ensuring that two users
|
||
working on the same record do not interfere with one
|
||
another.
|
||
</p>
|
||
<p>
|
||
The library provides strict ACID transaction semantics, by
|
||
default. However, applications are allowed to relax the
|
||
isolation guarantees the database system makes.
|
||
</p>
|
||
<p>
|
||
Multiple operations can be grouped into a single transaction,
|
||
and can be committed or rolled back atomically. Berkeley DB
|
||
uses a technique called <span class="emphasis"><em>two-phase locking</em></span>
|
||
to be sure that concurrent transactions are isolated from one
|
||
another, and a technique called <span class="emphasis"><em>write-ahead
|
||
logging</em></span> to guarantee that committed changes
|
||
survive application, system, or hardware failures.
|
||
</p>
|
||
<p>
|
||
When an application starts up, it can ask Berkeley DB to run
|
||
recovery. Recovery restores the database to a clean state,
|
||
with all committed changes present, even after a crash. The
|
||
database is guaranteed to be consistent and all committed
|
||
changes are guaranteed to be present when recovery
|
||
completes.
|
||
</p>
|
||
<p>
|
||
An application can specify, when it starts up, which data
|
||
management services it will use. Some applications need fast,
|
||
single-user, non-transactional Btree data storage. In that
|
||
case, the application can disable the locking and transaction
|
||
systems, and will not incur the overhead of locking or logging.
|
||
If an application needs to support multiple concurrent users,
|
||
but does not need transactions, it can turn on locking without
|
||
transactions. Applications that need concurrent,
|
||
transaction-protected database access can enable all of the
|
||
subsystems.
|
||
</p>
|
||
<p>
|
||
In all these cases, the application uses the same function-call
|
||
API to fetch and update records.
|
||
</p>
|
||
</div>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="idm157888"></a>Design</h3>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<p>
|
||
Berkeley DB was designed to provide industrial-strength
|
||
database services to application developers, without requiring
|
||
them to become database experts. It is a classic C-library
|
||
style <span class="emphasis"><em>toolkit</em></span>, providing a broad base of
|
||
functionality to application writers. Berkeley DB was designed
|
||
by programmers, for programmers: its modular design surfaces
|
||
simple, orthogonal interfaces to core services, and it provides
|
||
mechanism (for example, good thread support) without imposing
|
||
policy (for example, the use of threads is not required). Just
|
||
as importantly, Berkeley DB allows developers to balance
|
||
performance against the need for crash recovery and concurrent
|
||
use. An application can use the storage structure that
|
||
provides the fastest access to its data and can request only
|
||
the degree of logging and locking that it needs.
|
||
</p>
|
||
<p>
|
||
Because of the tool-based approach and separate interfaces for
|
||
each Berkeley DB subsystem, you can support a complete
|
||
transaction environment for other system operations. Berkeley
|
||
DB even allows you to wrap transactions around the standard
|
||
UNIX file read and write operations! Further, Berkeley DB was
|
||
designed to interact correctly with the native system's
|
||
toolset, a feature no other database package offers. For
|
||
example, on UNIX systems Berkeley DB supports hot backups
|
||
(database backups while the database is in use), using standard
|
||
UNIX system utilities, for example, dump, tar, cpio, pax or
|
||
even cp. On other systems which do not support filesystems
|
||
with read isolation, Berkeley DB provides a tool for safely
|
||
copying files.
|
||
</p>
|
||
<p>
|
||
Finally, because scripting language interfaces are available
|
||
for Berkeley DB (notably Tcl and Perl), application writers can
|
||
build incredibly powerful database engines with little effort.
|
||
You can build transaction-protected database applications using
|
||
your favorite scripting languages, an increasingly important
|
||
feature in a world using CGI scripts to deliver HTML.
|
||
</p>
|
||
</div>
|
||
</div>
|
||
<div class="navfooter">
|
||
<hr />
|
||
<table width="100%" summary="Navigation footer">
|
||
<tr>
|
||
<td width="40%" align="left"><a accesskey="p" href="intro_terrain.html">Prev</a> </td>
|
||
<td width="20%" align="center">
|
||
<a accesskey="u" href="intro.html">Up</a>
|
||
</td>
|
||
<td width="40%" align="right"> <a accesskey="n" href="intro_dbisnot.html">Next</a></td>
|
||
</tr>
|
||
<tr>
|
||
<td width="40%" align="left" valign="top">Mapping the terrain: theory and practice </td>
|
||
<td width="20%" align="center">
|
||
<a accesskey="h" href="index.html">Home</a>
|
||
</td>
|
||
<td width="40%" align="right" valign="top"> What Berkeley DB is not</td>
|
||
</tr>
|
||
</table>
|
||
</div>
|
||
</body>
|
||
</html>
|