sec 1 2
This commit is contained in:
parent
0a50a40ba1
commit
2d2e8cef0c
1 changed files with 51 additions and 66 deletions
|
@ -95,7 +95,7 @@ systems.
|
||||||
Other systems that could benefit from transactions include file
|
Other systems that could benefit from transactions include file
|
||||||
systems, version-control systems, bioinformatics, workflow
|
systems, version-control systems, bioinformatics, workflow
|
||||||
applications, search engines, recoverable virtual memory, and
|
applications, search engines, recoverable virtual memory, and
|
||||||
programming languages with persistent objects (or structures).
|
programming languages with persistent objects.
|
||||||
|
|
||||||
In essence, there is an {\em impedance mismatch} between the data
|
In essence, there is an {\em impedance mismatch} between the data
|
||||||
model provided by a DBMS and that required by these applications. This is
|
model provided by a DBMS and that required by these applications. This is
|
||||||
|
@ -109,7 +109,7 @@ The most obvious example of this mismatch is in the support for
|
||||||
persistent objects in Java, called {\em Enterprise Java Beans}
|
persistent objects in Java, called {\em Enterprise Java Beans}
|
||||||
(EJB). In a typical usage, an array of objects is made persistent by
|
(EJB). In a typical usage, an array of objects is made persistent by
|
||||||
mapping each object to a row in a table\footnote{If the object is
|
mapping each object to a row in a table\footnote{If the object is
|
||||||
stored in normalized relational format, it may span many rows and tables.~\cite{Hibernate}}
|
stored in normalized relational format, it may span many rows and tables~\cite{Hibernate}.}
|
||||||
and then issuing queries to
|
and then issuing queries to
|
||||||
keep the objects and rows consistent A typical update must confirm
|
keep the objects and rows consistent A typical update must confirm
|
||||||
it has the current version, modify the object, write out a serialized
|
it has the current version, modify the object, write out a serialized
|
||||||
|
@ -121,7 +121,7 @@ The DBMS actually has a navigational transaction system within it,
|
||||||
which would be of great use to EJB, but it is not accessible except
|
which would be of great use to EJB, but it is not accessible except
|
||||||
via the query language. In general, this occurs because the internal
|
via the query language. In general, this occurs because the internal
|
||||||
transaction system is complex and highly optimized for
|
transaction system is complex and highly optimized for
|
||||||
high-performance update-in-place transactions (mostly financial).
|
high-performance update-in-place transactions.
|
||||||
|
|
||||||
In this paper, we introduce a flexible framework for ACID
|
In this paper, we introduce a flexible framework for ACID
|
||||||
transactions, \yad, that is intended to support a broader range of
|
transactions, \yad, that is intended to support a broader range of
|
||||||
|
@ -154,21 +154,20 @@ way for systems to provide complete transactions.
|
||||||
|
|
||||||
With these trends in mind, we have implemented a modular, extensible
|
With these trends in mind, we have implemented a modular, extensible
|
||||||
transaction system based on on ARIES that makes as few assumptions as
|
transaction system based on on ARIES that makes as few assumptions as
|
||||||
possible about application data structures or workload. Where such
|
possible about application data or workloads. Where such
|
||||||
assumptions are inevitable, we have produced narrow APIs that allow
|
assumptions are inevitable, we have produced narrow APIs that allow
|
||||||
the application developer to plug in alternative implementations or
|
the developer to plug in alternative implementations or
|
||||||
define custom operations. Rather than hiding the underlying complexity
|
define custom operations. Rather than hiding the underlying complexity
|
||||||
of the library from developers, we have produced narrow, simple APIs
|
of the library from developers, we have produced narrow, simple APIs
|
||||||
and a set of invariants that must be maintained in order to ensure
|
and a set of invariants that must be maintained in order to ensure
|
||||||
transactional consistency, allowing application developers to produce
|
transactional consistency, which allows developers to produce
|
||||||
high-performance extensions with only a little effort.
|
high-performance extensions with only a little effort.
|
||||||
|
|
||||||
Specifically, application developers using \yad can control: 1)
|
Specifically, application developers using \yad can control: 1)
|
||||||
on-disk representations, 2) access-method implementations (including
|
on-disk representations, 2) data structure implementations (including
|
||||||
adding new transactional access methods), 3) the granularity of
|
adding new transactional access methods), 3) the granularity of
|
||||||
concurrency, 4) the precise semantics of atomicity, isolation and
|
concurrency, 4) the precise semantics of atomicity, isolation and
|
||||||
durability, 5) request scheduling policies, and 6) the style of
|
durability, 5) request scheduling policies, and 6) choose deadlock detection or avoidance. Developers
|
||||||
synchronization (e.g. deadlock detection or avoidance). Developers
|
|
||||||
can also exploit application-specific or workload-specific assumptions
|
can also exploit application-specific or workload-specific assumptions
|
||||||
to improve performance.
|
to improve performance.
|
||||||
|
|
||||||
|
@ -178,12 +177,12 @@ These features are enabled by the several mechanisms:
|
||||||
transactional data representations (Section~\ref{page-layouts}).
|
transactional data representations (Section~\ref{page-layouts}).
|
||||||
\item[Extensible log formats] provide high-level control over
|
\item[Extensible log formats] provide high-level control over
|
||||||
transaction data structures (Section~\ref{op-def}).
|
transaction data structures (Section~\ref{op-def}).
|
||||||
\item [High and low level control over the log] such as calls to ``log this
|
\item [High- and low-level control over the log] such as calls to ``log this
|
||||||
operation'' or ``write a compensation record'' (Section~\ref{log-manager}).
|
operation'' or ``write a compensation record'' (Section~\ref{log-manager}).
|
||||||
\item [In memory logical logging] provides a data store independent
|
\item [In memory logical logging] provides a data store independent
|
||||||
record of application requests, allowing ``in flight'' log
|
record of application requests, allowing ``in flight'' log
|
||||||
reordering, manipulation and durability primitives to be
|
reordering, manipulation and durability primitives to be
|
||||||
developed (Section~\ref{graph-traversal}).
|
developed (Section~\ref{TransClos}).
|
||||||
\item[Extensible locking API] provides registration of custom lock managers
|
\item[Extensible locking API] provides registration of custom lock managers
|
||||||
and a generic lock manager implementation (Section~\ref{lock-manager}).
|
and a generic lock manager implementation (Section~\ref{lock-manager}).
|
||||||
\item[Custom durability operations] such as two phase commit's
|
\item[Custom durability operations] such as two phase commit's
|
||||||
|
@ -191,10 +190,8 @@ These features are enabled by the several mechanisms:
|
||||||
\end{description}
|
\end{description}
|
||||||
|
|
||||||
We have produced a high-concurrency, high performance and reusable
|
We have produced a high-concurrency, high performance and reusable
|
||||||
open-source implementation of these concepts. Portions of our
|
open-source implementation of these mechanisms. Portions of our
|
||||||
implementation's API are still changing, but the interfaces to low
|
implementation's API are still changing, but the interfaces to low-level primitives, and most implementations have stabilized.
|
||||||
level primitives, and implementations of basic functionality have
|
|
||||||
stabilized.
|
|
||||||
|
|
||||||
To validate these claims, we walk
|
To validate these claims, we walk
|
||||||
through a sequence of optimizations for a transactional hash
|
through a sequence of optimizations for a transactional hash
|
||||||
|
@ -202,10 +199,9 @@ table in Section~\ref{sub:Linear-Hash-Table}, an object serialization
|
||||||
scheme in Section~\ref{OASYS}, and a graph traversal algorithm in
|
scheme in Section~\ref{OASYS}, and a graph traversal algorithm in
|
||||||
Section~\ref{TransClos}. Benchmarking figures are provided for each
|
Section~\ref{TransClos}. Benchmarking figures are provided for each
|
||||||
application. \yad also includes a cluster hash table
|
application. \yad also includes a cluster hash table
|
||||||
built upon two-phase commit which will not be described in detail
|
built upon two-phase commit, which will not be described. Similarly we did not have space to discuss \yad's
|
||||||
in this paper. Similarly we did not have space to discuss \yad's
|
|
||||||
blob implementation, which demonstrates how \yad can
|
blob implementation, which demonstrates how \yad can
|
||||||
add transactional primitives to data stored in the file system.
|
add transactional primitives to data stored in a file system.
|
||||||
|
|
||||||
%To validate these claims, we developed a number of applications such
|
%To validate these claims, we developed a number of applications such
|
||||||
%as an efficient persistent object layer, {\em @todo locality preserving
|
%as an efficient persistent object layer, {\em @todo locality preserving
|
||||||
|
@ -284,12 +280,12 @@ largely filled this gap by providing a simpler, less concurrent
|
||||||
database that can work with a variety of storage options including
|
database that can work with a variety of storage options including
|
||||||
Berkeley DB (covered below) and regular files, although these
|
Berkeley DB (covered below) and regular files, although these
|
||||||
alternatives affect the semantics of transactions, and sometimes
|
alternatives affect the semantics of transactions, and sometimes
|
||||||
disable or interfere with high level database features. MySQL
|
disable or interfere with high-level database features. MySQL
|
||||||
includes these multiple storage engines for performance reasons.
|
includes these multiple storage options for performance reasons.
|
||||||
We argue that by reusing code, and providing for a greater amount
|
We argue that by reusing code, and providing for a greater amount
|
||||||
of customization, a modular storage engine can provide better
|
of customization, a modular storage engine can provide better
|
||||||
performance, increased transparency and more flexibility then a
|
performance, transparency and flexibility than a
|
||||||
set of monolithic storage engines.\eab{need to discuss other flaws! clusters? what else?}
|
set of monolithic storage engines.
|
||||||
|
|
||||||
%% Databases are designed for circumstances where development time often
|
%% Databases are designed for circumstances where development time often
|
||||||
%% dominates cost, many users must share access to the same data, and
|
%% dominates cost, many users must share access to the same data, and
|
||||||
|
@ -313,11 +309,10 @@ add new index and object types.~\cite{newTypes} Although some of the methods ar
|
||||||
similar to ours, \yad also implements a lower-level
|
similar to ours, \yad also implements a lower-level
|
||||||
interface that can coexist with these methods. Without these
|
interface that can coexist with these methods. Without these
|
||||||
low-level APIs, Postgres suffers from many of the limitations inherent
|
low-level APIs, Postgres suffers from many of the limitations inherent
|
||||||
to the database systems mentioned above. This is because Postgres was
|
to the database systems mentioned above, as its extensions focus on
|
||||||
designed to provide these extensions within the context of the
|
improving
|
||||||
relational model. Therefore, these extensions focused upon improving
|
query language and indexing support.
|
||||||
query language and indexing support. Instead of focusing upon this,
|
Although we
|
||||||
\yad is more interested in lower-level systems. Therefore, although we
|
|
||||||
believe that many of the high-level Postgres interfaces could be built
|
believe that many of the high-level Postgres interfaces could be built
|
||||||
on top of \yad, we have not yet tried to implement them.
|
on top of \yad, we have not yet tried to implement them.
|
||||||
% seems to provide
|
% seems to provide
|
||||||
|
@ -326,15 +321,13 @@ on top of \yad, we have not yet tried to implement them.
|
||||||
%writes correctly) and those that refer to relations or application
|
%writes correctly) and those that refer to relations or application
|
||||||
%data types, since \yad does not have a built-in concept of a relation.
|
%data types, since \yad does not have a built-in concept of a relation.
|
||||||
However, \yad does provide an iterator interface which we hope to
|
However, \yad does provide an iterator interface which we hope to
|
||||||
extend to provide support for relational algebra, and common
|
extend to provide support for query processing.
|
||||||
programming paradigms.
|
|
||||||
|
|
||||||
Object-oriented and XML database systems provide models tied closely
|
Object-oriented and XML database systems provide models tied closely
|
||||||
to programming language abstractions or hierarchical data formats.
|
to programming language abstractions or hierarchical data formats.
|
||||||
Like the relational model, these models are extremely general, and are
|
Like the relational model, these models are extremely general, and are
|
||||||
often inappropriate for applications with stringent performance
|
often inappropriate for applications with stringent performance
|
||||||
demands, or that use these models in a way that was not anticipated by
|
demands, or those that use these models in unusual ways. Furthermore, data stored in these databases
|
||||||
the database vendor. Furthermore, data stored in these databases
|
|
||||||
often is formatted in a way that ties it to a specific application or
|
often is formatted in a way that ties it to a specific application or
|
||||||
class of algorithms~\cite{lamb}. We will show that \yad can provide
|
class of algorithms~\cite{lamb}. We will show that \yad can provide
|
||||||
specialized support for both classes of applications, via a persistent
|
specialized support for both classes of applications, via a persistent
|
||||||
|
@ -368,32 +361,28 @@ order to serve these applications, many software systems have been
|
||||||
developed. Some are extremely complex, such as semantic file
|
developed. Some are extremely complex, such as semantic file
|
||||||
systems, where the file system understands the contents of the files
|
systems, where the file system understands the contents of the files
|
||||||
that it contains, and is able to provide services such as rapid
|
that it contains, and is able to provide services such as rapid
|
||||||
search, or file-type specific operations such as thumb-nailing,
|
search, or file-type specific operations such as thumb nails \cite{Reiser4,WinFS,BeOS,SemanticFSWork,SemanticWeb}. Others are simpler, such as
|
||||||
automatic content updates, and so on \cite{Reiser4,WinFS,BeOS,SemanticFSWork,SemanticWeb}. Others are simpler, such as
|
|
||||||
Berkeley~DB~\cite{bdb, berkeleyDB}, which provides transactional
|
Berkeley~DB~\cite{bdb, berkeleyDB}, which provides transactional
|
||||||
% bdb's recno interface seems to be a specialized b-tree implementation - Rusty
|
|
||||||
storage of data in indexed form using a hashtable or tree, or as a queue.
|
storage of data in indexed form using a hashtable or tree, or as a queue.
|
||||||
|
% bdb's recno interface seems to be a specialized b-tree implementation - Rusty
|
||||||
|
|
||||||
\rcs{Eric, Mike: How's this?}
|
Although Berkeley DB's feature set is similar to the features provided by
|
||||||
\eab{need a (careful) dedicated paragraph on Berkeley DB}
|
|
||||||
|
|
||||||
While Berkeley DB's feature set is similar to the features provided by
|
|
||||||
\yad's implementation, there is an important distinction. Berkeley DB
|
\yad's implementation, there is an important distinction. Berkeley DB
|
||||||
provides general implementations of a handful of transactional
|
provides general implementations of a handful of transactional
|
||||||
structures and provides flags to enable or tweak certain pieces of
|
structures and provides flags to enable or tweak certain pieces of
|
||||||
functionality such as lock managers, log forces, and so on. While
|
functionality such as lock management, log forces, and so on. Although
|
||||||
\yad provides some of the high level calls that Berkeley DB supports
|
\yad provides some of the high-level calls that Berkeley DB supports
|
||||||
(and could probably be extended to provide most or all of these calls), \yad
|
(and could probably be extended to provide most or all of these calls), \yad
|
||||||
also provides lower level access to transactional primatives. For
|
also provides lower-level access to transactional primitives. For
|
||||||
instance, Berkeley DB does not allow data to be accessed by physical
|
instance, Berkeley DB does not allow data to be accessed by physical
|
||||||
(page) offset, and does not let applications implement new types of
|
(page) offset, and does not let applications implement new types of
|
||||||
log entries for recovery. It only supports builtin page layout types,
|
log entries for recovery. It only supports built-in page layout types,
|
||||||
and does not allow applications to directly access the functionality
|
and does not allow applications to directly access the functionality
|
||||||
provided by these layouts. While the usefulness of providing such
|
provided by these layouts. Although the usefulness of providing such
|
||||||
low-level functionality to applications may not be immediately
|
low-level functionality to applications may not be immediately
|
||||||
obvious, the focus of this paper is to describe how these limitations
|
obvious, the focus of this paper is to describe how these limitations
|
||||||
impact application performance, and ultimately complicate development
|
impact application performance, and ultimately complicate development
|
||||||
and system deployment efforts.
|
and deployment efforts.
|
||||||
|
|
||||||
\rcs{Potential conclusion material after this line in the .tex file..}
|
\rcs{Potential conclusion material after this line in the .tex file..}
|
||||||
|
|
||||||
|
@ -405,40 +394,37 @@ and system deployment efforts.
|
||||||
%Berkeley DB, while Sections~\ref{OASYS} and~\ref{TransClos} show that
|
%Berkeley DB, while Sections~\ref{OASYS} and~\ref{TransClos} show that
|
||||||
%such optimizations have practical value.
|
%such optimizations have practical value.
|
||||||
|
|
||||||
\eab{this paragraph needs work...}
|
|
||||||
LRVM is a version of malloc() that provides
|
LRVM is a version of malloc() that provides
|
||||||
transactional memory, and is similar to an object-oriented database
|
transactional memory, and is similar to an object-oriented database
|
||||||
but is much lighter weight, and lower level~\cite{lrvm}. Unlike
|
but is much lighter weight, and lower level~\cite{lrvm}. Unlike
|
||||||
the solutions mentioned above, it does not impose limitations upon
|
the solutions mentioned above, it does not impose limitations upon
|
||||||
the layout of application data.
|
the layout of application data, although it does not provide full transactions.
|
||||||
However, its approach does not handle concurrent
|
%However, its approach does not handle concurrent
|
||||||
transactions well because the addition of concurrency support to transactional
|
%transactions well because the addition of concurrency support to transactional
|
||||||
data structures typically requires control over log formats (Section~\ref{nested-top-actions}).
|
%data structures typically requires control over log formats (Section~\ref{nested-top-actions}).
|
||||||
%However, LRVM's use of virtual memory to implement the buffer pool
|
%However, LRVM's use of virtual memory to implement the buffer pool
|
||||||
%does not seem to be incompatible with our work, and it would be
|
%does not seem to be incompatible with our work, and it would be
|
||||||
%interesting to consider potential combinations of our approach
|
%interesting to consider potential combinations of our approach
|
||||||
%with that of LRVM. In particular, the recovery algorithm that is used to
|
%with that of LRVM. In particular, the recovery algorithm that is used to
|
||||||
%implement LRVM could be changed, and \yad's logging interface could
|
%implement LRVM could be changed, and \yad's logging interface could
|
||||||
%replace the narrow interface that LRVM provides. Also,
|
%replace the narrow interface that LRVM provides. Also,
|
||||||
|
%LRVM's inter-
|
||||||
LRVM's inter-
|
%and intra-transactional log optimizations collapse multiple updates
|
||||||
and intra-transactional log optimizations collapse multiple updates
|
%into a single log entry. In the past, we have implemented such
|
||||||
into a single log entry. In the past, we have implemented such
|
%optimizations in an ad-hoc fashion in \yad. However, we believe
|
||||||
optimizations in an ad-hoc fashion in \yad. However, we believe
|
%that we have developed the necessary API hooks
|
||||||
that we have developed the necessary API hooks
|
%to allow extensions to \yad to transparently coalesce log entries in the future (Section~\ref{TransClos}).
|
||||||
to allow extensions to \yad to transparently coalesce log entries in the future (Section~\ref{TransClos}).
|
|
||||||
LRVM's
|
LRVM's
|
||||||
approach of keeping a single in-memory copy of data in the applications
|
approach of keeping a single in-memory copy of data in the applications
|
||||||
address space is similar to the optimization presented in
|
address space is similar to the optimization presented in
|
||||||
Section~\ref{OASYS}, but our approach circumvents the limitations of
|
Section~\ref{OASYS}, but our approach circumvents can support full transactions as needed.
|
||||||
LRVM that were mentioned above, providing the full flexibility of the
|
|
||||||
ARIES algorithm.
|
|
||||||
|
|
||||||
%\begin{enumerate}
|
%\begin{enumerate}
|
||||||
% \item {\bf Incredibly scalable, simple servers CHT's, google fs?, ...}
|
% \item {\bf Incredibly scalable, simple servers CHT's, google fs?, ...}
|
||||||
|
|
||||||
Finally, some applications require incredibly simple but extremely
|
Finally, some applications require incredibly simple but extremely
|
||||||
scalable storage mechanisms. Cluster hash tables are a good example
|
scalable storage mechanisms. Cluster hash tables~\cite{cht} are a good example
|
||||||
of the type of system that serves these applications well, due to
|
of the type of system that serves these applications well, due to
|
||||||
their relative simplicity and good scalability. Depending
|
their relative simplicity and good scalability. Depending
|
||||||
on the fault model on which a cluster hash table is based, it is
|
on the fault model on which a cluster hash table is based, it is
|
||||||
|
@ -457,14 +443,13 @@ atomicity semantics may be relaxed under certain circumstances. \yad is unique
|
||||||
\rcs{compare and contrast with boxwood!!}
|
\rcs{compare and contrast with boxwood!!}
|
||||||
|
|
||||||
|
|
||||||
We believe that \yad can support all of these
|
We believe that \yad can support all of these systems. We will
|
||||||
applications. We will demonstrate several of them, but leave
|
demonstrate several of them, but leave implementation of a real DBMS,
|
||||||
implementation of a real DBMS, LRVM and Boxwood to future work.
|
LRVM and Boxwood to future work. However, in each case it is
|
||||||
However, in each case it is relatively easy to see how they would map
|
relatively easy to see how they would map onto \yad.
|
||||||
onto \yad.
|
|
||||||
|
|
||||||
|
|
||||||
\eab{DB Toolkit from Wisconsin?}
|
%\eab{DB Toolkit from Wisconsin?}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue