bdb updates.

This commit is contained in:
Sears Russell 2005-03-26 05:37:48 +00:00
parent fff7257809
commit 447532ee0f

View file

@ -48,7 +48,7 @@ monolithic, and do not generalize to other applications or classes of
problems. As a result, many systems are forced to ``work around'' the problems. As a result, many systems are forced to ``work around'' the
data models provided by a transactional storage layer. Manifestations data models provided by a transactional storage layer. Manifestations
of this problem include ``impedance mismatch'' in the database world, of this problem include ``impedance mismatch'' in the database world,
and the poor fit of existing transactional storage management system and the poor fit of existing transactional storage management systems
to hierarchical or semi-structured data types such as XML or to hierarchical or semi-structured data types such as XML or
scientific data. This work proposes a novel set of abstractions for scientific data. This work proposes a novel set of abstractions for
transactional storage systems and generalizes an existing transactional storage systems and generalizes an existing
@ -110,12 +110,13 @@ The most obvious example of this mismatch is in the support for
persistent objects in Java, called {\em Enterprise Java Beans} persistent objects in Java, called {\em Enterprise Java Beans}
(EJB). In a typical usage, an array of objects is made persistent by (EJB). In a typical usage, an array of objects is made persistent by
mapping each object to a row in a table\footnote{If the object is mapping each object to a row in a table\footnote{If the object is
stored in normalized relational format, it may span many rows and stored in normalized relational format it may span many rows and
tables~\cite{Hibernate}.} and then issuing queries to keep the tables~\cite{Hibernate}.} and then issuing queries to keep the
objects and rows consistent A typical update must confirm it has the objects and rows consistent. A typical update must confirm it has the
current version, modify the object, write out a serialized version current version, modify the object, write out a serialized version
using the SQL {\tt update} command, and commit. This is an awkward using the SQL {\tt update} command, and commit. This is an awkward
and slow mechanism; we show up 5x speedup over MySQL and slow mechanism; we show up to a 5x speedup over a MySQL implementation
that is optimized for single-threaded access.
(Section~\ref{OASYS}). (Section~\ref{OASYS}).
The DBMS actually has a navigational transaction system within it, The DBMS actually has a navigational transaction system within it,
@ -124,7 +125,7 @@ via the query language. In general, this occurs because the internal
transaction system is complex and highly optimized for transaction system is complex and highly optimized for
high-performance update-in-place transactions. high-performance update-in-place transactions.
In this paper, we introduce a flexible framework for ACID In this paper we introduce a flexible framework for ACID
transactions, \yad, that is intended to support a broader range of transactions, \yad, that is intended to support a broader range of
applications. Although we believe it could also be the basis of a applications. Although we believe it could also be the basis of a
DBMS, there are clearly excellent existing solutions, and we thus DBMS, there are clearly excellent existing solutions, and we thus
@ -155,20 +156,20 @@ way for systems to provide complete transactions.
With these trends in mind, we have implemented a modular, extensible With these trends in mind, we have implemented a modular, extensible
transaction system based on on ARIES that makes as few assumptions as transaction system based on on ARIES that makes as few assumptions as
possible about application data or workloads. Where such possible about application data and workloads. Where such
assumptions are inevitable, we have produced narrow APIs that allow assumptions are inevitable, we allow
the developer to plug in alternative implementations or the developer to plug in alternative implementations or
define custom operations. Rather than hiding the underlying complexity define custom operations whenever possible. Rather than hiding the underlying complexity
of the library from developers, we have produced narrow, simple APIs of the library from developers, we have produced narrow, simple APIs
and a set of invariants that must be maintained in order to ensure and a set of invariants that must be maintained in order to ensure
transactional consistency, which allows developers to produce transactional consistency. This allows developers to produce
high-performance extensions with only a little effort. high-performance extensions with only a little effort.
Specifically, application developers using \yad can control: 1) Specifically, application developers using \yad can control: 1)
on-disk representations, 2) data structure implementations (including on-disk representations, 2) data structure implementations (including
adding new transactional access methods), 3) the granularity of adding new transactional access methods), 3) the granularity of
concurrency, 4) the precise semantics of atomicity, isolation and concurrency, 4) the precise semantics of atomicity, isolation and
durability, 5) request scheduling policies, and 6) choose deadlock detection or avoidance. Developers durability, 5) request scheduling policies, and 6) deadlock detection and avoidance schemes. Developers
can also exploit application-specific or workload-specific assumptions can also exploit application-specific or workload-specific assumptions
to improve performance. to improve performance.
@ -191,8 +192,9 @@ These features are enabled by the several mechanisms:
\end{description} \end{description}
We have produced a high-concurrency, high performance and reusable We have produced a high-concurrency, high performance and reusable
open-source implementation of these mechanisms. Portions of our open-source implementation of our system. Portions of our
implementation's API are still changing, but the interfaces to low-level primitives, and most implementations have stabilized. implementation's API are still changing, but the interfaces
to low-level primitives, and the most important portions of the implementation have stabilized.
To validate these claims, we walk To validate these claims, we walk
through a sequence of optimizations for a transactional hash through a sequence of optimizations for a transactional hash
@ -308,7 +310,7 @@ The Postgres storage system~\cite{postgres} provides conventional
database functionality, but provides APIs that allow applications to database functionality, but provides APIs that allow applications to
add new index and object types.~\cite{newTypes} Although some of the methods are add new index and object types.~\cite{newTypes} Although some of the methods are
similar to ours, \yad also implements a lower-level similar to ours, \yad also implements a lower-level
interface that can coexist with these methods. Without these interface that can coexist with these methods. Without \yad's
low-level APIs, Postgres suffers from many of the limitations inherent low-level APIs, Postgres suffers from many of the limitations inherent
to the database systems mentioned above, as its extensions focus on to the database systems mentioned above, as its extensions focus on
improving improving
@ -374,17 +376,18 @@ structures and provides flags to enable or tweak certain pieces of
functionality such as lock management, log forces, and so on. Although functionality such as lock management, log forces, and so on. Although
\yad provides some of the high-level calls that Berkeley DB supports \yad provides some of the high-level calls that Berkeley DB supports
(and could probably be extended to provide most or all of these calls), \yad (and could probably be extended to provide most or all of these calls), \yad
also provides lower-level access to transactional primitives. For provides lower-level access to transactional primitives and provides a rich
instance, Berkeley DB does not allow data to be accessed by physical set of mechanisms that make it easy to use these primitives. For
(page) offset, and does not let applications implement new types of instance, Berkeley DB does not provide access methods to access data by
log entries for recovery. It only supports built-in page layout types, page offset, and does not provide applications with primative
and does not allow applications to directly access the functionality access methods to facilitate the development of higher level structures.
provided by these layouts. Although the usefulness of providing such It also seems to be difficult to specialize existing Berkeley DB functionality
low-level functionality to applications may not be immediately (for example page layouts) for new extensions.
obvious, the focus of this paper is to describe how these limitations
impact application performance, and ultimately complicate development
and deployment efforts.
Although the usefulness of providing such low-level functionality to
applications may not be immediately obvious, the focus of this paper
is to describe how the lack of such primitives impacts application performance,
and ultimately complicates development and deployment efforts.
LRVM is a version of malloc() that provides LRVM is a version of malloc() that provides
transactional memory, and is similar to an object-oriented database transactional memory, and is similar to an object-oriented database
@ -432,13 +435,19 @@ atomicity semantics may be relaxed under certain circumstances. \yad is unique
%the recovery log. \yad's host independent logical log format will %the recovery log. \yad's host independent logical log format will
%allow applications to implement such optimizations. %allow applications to implement such optimizations.
\rcs{compare and contrast with boxwood!!} %\rcs{compare and contrast with boxwood!!}
Boxwood provides a networked, fault tolerant transactional B-Tree and
``Chunk Manager''. We beleive that \yad could be a valuable part of
such a system. However, we believe that \yad's concept of a page file
and system independent logical log suggest an alternative
approach to fault tolerant storage design. We plan to
investigate such approaches in future work.
We believe that \yad can support all of these systems. We will We believe that \yad can support all of these systems. We will
demonstrate several of them, but leave implementation of a real DBMS, demonstrate several of them, but leave implementation of a real DBMS,
LRVM and Boxwood to future work. However, in each case it is LRVM and Boxwood to future work. However, it is
relatively easy to see how they would map onto \yad. relatively easy to see how each system would map onto \yad.
%\eab{DB Toolkit from Wisconsin?} %\eab{DB Toolkit from Wisconsin?}
@ -459,7 +468,7 @@ unanticipated optimizations and allowing low-level
behavior, such as recovery semantics, to be customized on a behavior, such as recovery semantics, to be customized on a
per-application basis. per-application basis.
The write-ahead logging algorithm we use is based upon ARIES, but The write-ahead logging algorithm we use is based upon ARIES, but has been
modified for extensibility and flexibility. Because comprehensive modified for extensibility and flexibility. Because comprehensive
discussions of write-ahead logging protocols and ARIES are available discussions of write-ahead logging protocols and ARIES are available
elsewhere~\cite{haerder, aries}, we focus on those details that are elsewhere~\cite{haerder, aries}, we focus on those details that are
@ -1731,7 +1740,7 @@ significantly better than Berkeley DB's with both filesystems.}. Even
when using the unoptimized hash table implementation, \yad when using the unoptimized hash table implementation, \yad
scales very well with higher concurrency, delivering over 6000 scales very well with higher concurrency, delivering over 6000
%(ACID) %(ACID)
transactions per second.\footnote{This test was run without lock managers.} \yad had about double the throughput of Berkeley DB (up to 50 threads). transactions per second.\footnote{This test was run without lock managers, so the transactions obeyed the A,C, and D ACID properties. Since each transaction performed exactly one hashtable write they obeyed I (isolation) in a trivial sense.} \yad had about double the throughput of Berkeley DB (up to 50 threads).
%\footnote{Although our current implementation does not provide the hooks that %\footnote{Although our current implementation does not provide the hooks that
%would be necessary to alter log scheduling policy, the logger %would be necessary to alter log scheduling policy, the logger