libdb/docs/programmer_reference/intro_dbis.html
2011-12-19 19:07:10 -05:00

315 lines
15 KiB
HTML
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>What is Berkeley DB?</title>
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
<meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
<link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
<link rel="up" href="intro.html" title="Chapter 1.  Introduction" />
<link rel="prev" href="intro_terrain.html" title="Mapping the terrain: theory and practice" />
<link rel="next" href="intro_dbisnot.html" title="What Berkeley DB is not" />
</head>
<body>
<div xmlns="" class="navheader">
<div class="libver">
<p>Library Version 11.2.5.2</p>
</div>
<table width="100%" summary="Navigation header">
<tr>
<th colspan="3" align="center">What is Berkeley DB?</th>
</tr>
<tr>
<td width="20%" align="left"><a accesskey="p" href="intro_terrain.html">Prev</a> </td>
<th width="60%" align="center">Chapter 1. 
Introduction
</th>
<td width="20%" align="right"> <a accesskey="n" href="intro_dbisnot.html">Next</a></td>
</tr>
</table>
<hr />
</div>
<div class="sect1" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h2 class="title" style="clear: both"><a id="intro_dbis"></a>What is Berkeley DB?</h2>
</div>
</div>
</div>
<div class="toc">
<dl>
<dt>
<span class="sect2">
<a href="intro_dbis.html#id3044327">Data Access Services</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="intro_dbis.html#id3044355">Data management services</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="intro_dbis.html#id3931121">Design</a>
</span>
</dt>
</dl>
</div>
<p>
So far, we have discussed database systems in general terms. It is
time now to consider Berkeley DB in particular and see how it fits
into the framework we have introduced. The key question is, what
kinds of applications should use Berkeley DB?
</p>
<p>
Berkeley DB is an Open Source embedded database library that
provides scalable, high-performance, transaction-protected data
management services to applications. Berkeley DB provides a simple
function-call API for data access and management.
</p>
<p>
By "Open Source," we mean Berkeley DB is distributed under a
license that conforms to the
<a class="ulink" href="http://www.opensource.org/osd.html" target="_top"> Open Source Definition</a>.
This license guarantees Berkeley DB is freely available for use and
redistribution in other Open Source applications. Oracle
Corporation sells commercial licenses allowing the redistribution
of Berkeley DB in proprietary applications. In all cases the
complete source code for Berkeley DB is freely available for
download and use.
</p>
<p>
Berkeley DB is "embedded" because it links directly into the
application. It runs in the same address space as the application.
As a result, no inter-process communication, either over the
network or between processes on the same machine, is required for
database operations. Berkeley DB provides a simple function-call
API for a number of programming languages, including C, C++, Java,
Perl, Tcl, Python, and PHP. All database operations happen inside
the library. Multiple processes, or multiple threads in a single
process, can all use the database at the same time as each uses the
Berkeley DB library. Low-level services like locking, transaction
logging, shared buffer management, memory management, and so on are
all handled transparently by the library.
</p>
<p>
The Berkeley DB library is extremely portable. It runs under almost
all UNIX and Linux variants, Windows, and a number of embedded
real-time operating systems. It runs on both 32-bit and 64-bit
systems. It has been deployed on high-end Internet servers,
desktop machines, and on palmtop computers, set-top boxes, in
network switches, and elsewhere. Once Berkeley DB is linked into
the application, the end user generally does not know that there is
a database present at all.
</p>
<p>
Berkeley DB is scalable in a number of respects. The database library
itself is quite compact (under 300 kilobytes of text space on common
architectures), which means it is small enough to run in tightly
constrained embedded systems, but yet it can take advantage of
gigabytes of memory and terabytes of disk if you are using hardware that
has those resources.
</p>
<p>
Each of Berkeley DB's database files can contain up to 256
terabytes of data, assuming the underlying filesystem is capable of supporting
files of that size. Note that Berkeley DB applications often use
multiple database files. This means that the amount of data your
Berkeley DB application can manage is really limited only by the
constraints imposed by your operating system, filesystem, and physical
hardware.
</p>
<p>
Berkeley DB also supports high concurrency, allowing thousands of users
to operate on the same database files at the same time.
</p>
<p>
Berkeley DB generally outperforms relational and object-oriented
database systems in embedded applications for a couple of reasons.
First, because the library runs in the same address space, no
inter-process communication is required for database operations.
The cost of communicating between processes on a single machine, or
among machines on a network, is much higher than the cost of making
a function call. Second, because Berkeley DB uses a simple
function-call interface for all operations, there is no query
language to parse, and no execution plan to produce.
</p>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="id3044327"></a>Data Access Services</h3>
</div>
</div>
</div>
<p>
Berkeley DB applications can choose the storage structure that
best suits the application. Berkeley DB supports hash tables,
Btrees, simple record-number-based storage, and persistent
queues. Programmers can create tables using any of these
storage structures, and can mix operations on different kinds
of tables in a single application.
</p>
<p>
Hash tables are generally good for very large databases that
need predictable search and update times for random-access
records. Hash tables allow users to ask, "Does this key
exist?" or to fetch a record with a known key. Hash tables do
not allow users to ask for records with keys that are close to
a known key.
</p>
<p>
Btrees are better for range-based searches, as when the
application needs to find all records with keys between some
starting and ending value. Btrees also do a better job of
exploiting <span class="emphasis"><em>locality of reference</em></span>. If the
application is likely to touch keys near each other at the same
time, the Btrees work well. The tree structure keeps keys that
are close together near one another in storage, so fetching
nearby values usually does not require a disk access.
</p>
<p>
Record-number-based storage is natural for applications that
need to store and fetch records, but that do not have a simple
way to generate keys of their own. In a record number table,
the record number is the key for the record. Berkeley DB will
generate these record numbers automatically.
</p>
<p>
Queues are well-suited for applications that create records,
and then must deal with those records in creation order. A good
example is on-line purchasing systems. Orders can enter the
system at any time, but should generally be filled in the order
in which they were placed.
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="id3044355"></a>Data management services</h3>
</div>
</div>
</div>
<p>
Berkeley DB offers important data management services,
including concurrency, transactions, and recovery. All of these
services work on all of the storage structures.
</p>
<p>
Many users can work on the same database concurrently. Berkeley
DB handles locking transparently, ensuring that two users
working on the same record do not interfere with one
another.
</p>
<p>
The library provides strict ACID transaction semantics, by
default. However, applications are allowed to relax the
isolation guarantees the database system makes.
</p>
<p>
Multiple operations can be grouped into a single transaction,
and can be committed or rolled back atomically. Berkeley DB
uses a technique called <span class="emphasis"><em>two-phase locking</em></span>
to be sure that concurrent transactions are isolated from one
another, and a technique called <span class="emphasis"><em>write-ahead
logging</em></span> to guarantee that committed changes
survive application, system, or hardware failures.
</p>
<p>
When an application starts up, it can ask Berkeley DB to run
recovery. Recovery restores the database to a clean state,
with all committed changes present, even after a crash. The
database is guaranteed to be consistent and all committed
changes are guaranteed to be present when recovery
completes.
</p>
<p>
An application can specify, when it starts up, which data
management services it will use. Some applications need fast,
single-user, non-transactional Btree data storage. In that
case, the application can disable the locking and transaction
systems, and will not incur the overhead of locking or logging.
If an application needs to support multiple concurrent users,
but does not need transactions, it can turn on locking without
transactions. Applications that need concurrent,
transaction-protected database access can enable all of the
subsystems.
</p>
<p>
In all these cases, the application uses the same function-call
API to fetch and update records.
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="id3931121"></a>Design</h3>
</div>
</div>
</div>
<p>
Berkeley DB was designed to provide industrial-strength
database services to application developers, without requiring
them to become database experts. It is a classic C-library
style <span class="emphasis"><em>toolkit</em></span>, providing a broad base of
functionality to application writers. Berkeley DB was designed
by programmers, for programmers: its modular design surfaces
simple, orthogonal interfaces to core services, and it provides
mechanism (for example, good thread support) without imposing
policy (for example, the use of threads is not required). Just
as importantly, Berkeley DB allows developers to balance
performance against the need for crash recovery and concurrent
use. An application can use the storage structure that
provides the fastest access to its data and can request only
the degree of logging and locking that it needs.
</p>
<p>
Because of the tool-based approach and separate interfaces for
each Berkeley DB subsystem, you can support a complete
transaction environment for other system operations. Berkeley
DB even allows you to wrap transactions around the standard
UNIX file read and write operations! Further, Berkeley DB was
designed to interact correctly with the native system's
toolset, a feature no other database package offers. For
example, on UNIX systems Berkeley DB supports hot backups
(database backups while the database is in use), using standard
UNIX system utilities, for example, dump, tar, cpio, pax or
even cp. On other systems which do not support filesystems
with read isolation, Berkeley DB provides a tool for safely
copying files.
</p>
<p>
Finally, because scripting language interfaces are available
for Berkeley DB (notably Tcl and Perl), application writers can
build incredibly powerful database engines with little effort.
You can build transaction-protected database applications using
your favorite scripting languages, an increasingly important
feature in a world using CGI scripts to deliver HTML.
</p>
</div>
</div>
<div class="navfooter">
<hr />
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left"><a accesskey="p" href="intro_terrain.html">Prev</a> </td>
<td width="20%" align="center">
<a accesskey="u" href="intro.html">Up</a>
</td>
<td width="40%" align="right"> <a accesskey="n" href="intro_dbisnot.html">Next</a></td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Mapping the terrain: theory and practice </td>
<td width="20%" align="center">
<a accesskey="h" href="index.html">Home</a>
</td>
<td width="40%" align="right" valign="top"> What Berkeley DB is not</td>
</tr>
</table>
</div>
</body>
</html>