libdb/docs/programmer_reference/intro_dbis.html

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>What is Berkeley DB?</title>
    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
    <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
    <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
    <link rel="up" href="intro.html" title="Chapter 1.  Introduction" />
    <link rel="prev" href="intro_terrain.html" title="Mapping the terrain: theory and practice" />
    <link rel="next" href="intro_dbisnot.html" title="What Berkeley DB is not" />
  </head>
  <body>
    <div xmlns="" class="navheader">
      <div class="libver">
        <p>Library Version 11.2.5.3</p>
      </div>
      <table width="100%" summary="Navigation header">
        <tr>
          <th colspan="3" align="center">What is Berkeley DB?</th>
        </tr>
        <tr>
          <td width="20%" align="left"><a accesskey="p" href="intro_terrain.html">Prev</a> </td>
          <th width="60%" align="center">Chapter 1.
		Introduction
        </th>
          <td width="20%" align="right"> <a accesskey="n" href="intro_dbisnot.html">Next</a></td>
        </tr>
      </table>
      <hr />
    </div>
    <div class="sect1" lang="en" xml:lang="en">
      <div class="titlepage">
        <div>
          <div>
            <h2 class="title" style="clear: both"><a id="intro_dbis"></a>What is Berkeley DB?</h2>
          </div>
        </div>
      </div>
      <div class="toc">
        <dl>
          <dt>
            <span class="sect2">
              <a href="intro_dbis.html#idm1665072">Data Access Services</a>
            </span>
          </dt>
          <dt>
            <span class="sect2">
              <a href="intro_dbis.html#idm1554168">Data management services</a>
            </span>
          </dt>
          <dt>
            <span class="sect2">
              <a href="intro_dbis.html#idm157888">Design</a>
            </span>
          </dt>
        </dl>
      </div>
      <p>
        So far, we have discussed database systems in general terms. It is
        time now to consider Berkeley DB in particular and see how it fits
        into the framework we have introduced. The key question is, what
        kinds of applications should use Berkeley DB?
    </p>
      <p>
        Berkeley DB is an Open Source embedded database library that
        provides scalable, high-performance, transaction-protected data
        management services to applications.  Berkeley DB provides a simple
        function-call API for data access and management.
    </p>
      <p>
        By "Open Source," we mean Berkeley DB is distributed under a
        license that conforms to the
        <a class="ulink" href="http://www.opensource.org/osd.html" target="_top"> Open Source Definition</a>.
        This license guarantees Berkeley DB is freely available for use and
        redistribution in other Open Source applications.  Oracle
        Corporation sells commercial licenses allowing the redistribution
        of Berkeley DB in proprietary applications.  In all cases the
        complete source code for Berkeley DB is freely available for
        download and use.
    </p>
      <p>
        Berkeley DB is "embedded" because it links directly into the
        application.  It runs in the same address space as the application.
        As a result, no inter-process communication, either over the
        network or between processes on the same machine, is required for
        database operations.  Berkeley DB provides a simple function-call
        API for a number of programming languages, including C, C++, Java,
        Perl, Tcl, Python, and PHP. All database operations happen inside
        the library. Multiple processes, or multiple threads in a single
        process, can all use the database at the same time as each uses the
        Berkeley DB library. Low-level services like locking, transaction
        logging, shared buffer management, memory management, and so on are
        all handled transparently by the library.
    </p>
      <p>
        The Berkeley DB library is extremely portable. It runs under almost
        all UNIX and Linux variants, Windows, and a number of embedded
        real-time operating systems. It runs on both 32-bit and 64-bit
        systems.  It has been deployed on high-end Internet servers,
        desktop machines, and on palmtop computers, set-top boxes, in
        network switches, and elsewhere.  Once Berkeley DB is linked into
        the application, the end user generally does not know that there is
        a database present at all.
    </p>
      <p>
        Berkeley DB is scalable in a number of respects. The database library
        itself is quite compact (under 300 kilobytes of text space on common
        architectures), which means it is small enough to run in tightly
        constrained embedded systems, but yet it can take advantage of
        gigabytes of memory and terabytes of disk if you are using hardware that
        has those resources.
    </p>
      <p>
        Each of Berkeley DB's database files can contain up to 256
        terabytes of data, assuming the underlying filesystem is capable of supporting
        files of that size. Note that Berkeley DB applications often use
        multiple database files. This means that the amount of data your
        Berkeley DB application can manage is really limited only by the
        constraints imposed by your operating system, filesystem, and physical
        hardware.
    </p>
      <p>
        Berkeley DB also supports high concurrency, allowing thousands of users
        to operate on the same database files at the same time.
    </p>
      <p>
        Berkeley DB generally outperforms relational and object-oriented
        database systems in embedded applications for a couple of reasons.
        First, because the library runs in the same address space, no
        inter-process communication is required for database operations.
        The cost of communicating between processes on a single machine, or
        among machines on a network, is much higher than the cost of making
        a function call.  Second, because Berkeley DB uses a simple
        function-call interface for all operations, there is no query
        language to parse, and no execution plan to produce.
    </p>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="idm1665072"></a>Data Access Services</h3>
            </div>
          </div>
        </div>
        <p>
            Berkeley DB applications can choose the storage structure that
            best suits the application. Berkeley DB supports hash tables,
            Btrees, simple record-number-based storage, and persistent
            queues. Programmers can create tables using any of these
            storage structures, and can mix operations on different kinds
            of tables in a single application.
        </p>
        <p>
            Hash tables are generally good for very large databases that
            need predictable search and update times for random-access
            records.  Hash tables allow users to ask, "Does this key
            exist?" or to fetch a record with a known key.  Hash tables do
            not allow users to ask for records with keys that are close to
            a known key.
        </p>
        <p>
            Btrees are better for range-based searches, as when the
            application needs to find all records with keys between some
            starting and ending value.  Btrees also do a better job of
            exploiting <span class="emphasis"><em>locality of reference</em></span>.  If the
            application is likely to touch keys near each other at the same
            time, the Btrees work well. The tree structure keeps keys that
            are close together near one another in storage, so fetching
            nearby values usually does not require a disk access.
        </p>
        <p>
            Record-number-based storage is natural for applications that
            need to store and fetch records, but that do not have a simple
            way to generate keys of their own. In a record number table,
            the record number is the key for the record. Berkeley DB will
            generate these record numbers automatically.
        </p>
        <p>
            Queues are well-suited for applications that create records,
            and then must deal with those records in creation order. A good
            example is on-line purchasing systems. Orders can enter the
            system at any time, but should generally be filled in the order
            in which they were placed.
        </p>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="idm1554168"></a>Data management services</h3>
            </div>
          </div>
        </div>
        <p>
            Berkeley DB offers important data management services,
            including concurrency, transactions, and recovery. All of these
            services work on all of the storage structures.
        </p>
        <p>
            Many users can work on the same database concurrently. Berkeley
            DB handles locking transparently, ensuring that two users
            working on the same record do not interfere with one
            another.
        </p>
        <p>
            The library provides strict ACID transaction semantics, by
            default.  However, applications are allowed to relax the
            isolation guarantees the database system makes.
        </p>
        <p>
            Multiple operations can be grouped into a single transaction,
            and can be committed or rolled back atomically. Berkeley DB
            uses a technique called <span class="emphasis"><em>two-phase locking</em></span>
            to be sure that concurrent transactions are isolated from one
            another, and a technique called <span class="emphasis"><em>write-ahead
                logging</em></span> to guarantee that committed changes
            survive application, system, or hardware failures.
        </p>
        <p>
            When an application starts up, it can ask Berkeley DB to run
            recovery.  Recovery restores the database to a clean state,
            with all committed changes present, even after a crash. The
            database is guaranteed to be consistent and all committed
            changes are guaranteed to be present when recovery
            completes.
        </p>
        <p>
            An application can specify, when it starts up, which data
            management services it will use. Some applications need fast,
            single-user, non-transactional Btree data storage. In that
            case, the application can disable the locking and transaction
            systems, and will not incur the overhead of locking or logging.
            If an application needs to support multiple concurrent users,
            but does not need transactions, it can turn on locking without
            transactions. Applications that need concurrent,
            transaction-protected database access can enable all of the
            subsystems.
        </p>
        <p>
            In all these cases, the application uses the same function-call
            API to fetch and update records.
        </p>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="idm157888"></a>Design</h3>
            </div>
          </div>
        </div>
        <p>
            Berkeley DB was designed to provide industrial-strength
            database services to application developers, without requiring
            them to become database experts.  It is a classic C-library
            style <span class="emphasis"><em>toolkit</em></span>, providing a broad base of
            functionality to application writers.  Berkeley DB was designed
            by programmers, for programmers: its modular design surfaces
            simple, orthogonal interfaces to core services, and it provides
            mechanism (for example, good thread support) without imposing
            policy (for example, the use of threads is not required).  Just
            as importantly, Berkeley DB allows developers to balance
            performance against the need for crash recovery and concurrent
            use.  An application can use the storage structure that
            provides the fastest access to its data and can request only
            the degree of logging and locking that it needs.
        </p>
        <p>
            Because of the tool-based approach and separate interfaces for
            each Berkeley DB subsystem, you can support a complete
            transaction environment for other system operations. Berkeley
            DB even allows you to wrap transactions around the standard
            UNIX file read and write operations!  Further, Berkeley DB was
            designed to interact correctly with the native system's
            toolset, a feature no other database package offers.  For
            example, on UNIX systems Berkeley DB supports hot backups
            (database backups while the database is in use), using standard
            UNIX system utilities, for example, dump, tar, cpio, pax or
            even cp.  On other systems which do not support filesystems
            with read isolation, Berkeley DB provides a tool for safely
            copying files.
        </p>
        <p>
            Finally, because scripting language interfaces are available
            for Berkeley DB (notably Tcl and Perl), application writers can
            build incredibly powerful database engines with little effort.
            You can build transaction-protected database applications using
            your favorite scripting languages, an increasingly important
            feature in a world using CGI scripts to deliver HTML.
        </p>
      </div>
    </div>
    <div class="navfooter">
      <hr />
      <table width="100%" summary="Navigation footer">
        <tr>
          <td width="40%" align="left"><a accesskey="p" href="intro_terrain.html">Prev</a> </td>
          <td width="20%" align="center">
            <a accesskey="u" href="intro.html">Up</a>
          </td>
          <td width="40%" align="right"> <a accesskey="n" href="intro_dbisnot.html">Next</a></td>
        </tr>
        <tr>
          <td width="40%" align="left" valign="top">Mapping the terrain: theory and practice </td>
          <td width="20%" align="center">
            <a accesskey="h" href="index.html">Home</a>
          </td>
          <td width="40%" align="right" valign="top"> What Berkeley DB is not</td>
        </tr>
      </table>
    </div>
  </body>
</html>