paper updates; a bit of prior work

This commit is contained in:
Sears Russell 2006-08-03 00:13:50 +00:00
parent 7e5825aa74
commit 84bd594288

View file

@ -161,25 +161,6 @@ abstraction upon their users will restrict system designs and
implementations. implementations.
} }
%In short, reliable data management has become as unavoidable as any
%other operating system service. As this has happened, database
%designs have not incorporated this decade-old lesson from operating
%systems research:
%
%\begin{quote} The defining tragedy of the operating systems community
% has been the definition of an operating system as software that both
% multiplexes and {\em abstracts} physical resources...The solution we
% propose is simple: complete elimination of operating systems
% abstractions by lowering the operating system interface to the
% hardware level~\cite{engler95}.
%\end{quote}
%The widespread success of lower-level transactional storage libraries
%(such as Berkeley DB) is a sign of these trends. However, the level
%of abstraction provided by these systems is well above the hardware
%level, and applications that resort to ad-hoc storage mechanisms are
%still common.
This paper presents \yad, a library that provides transactional This paper presents \yad, a library that provides transactional
storage at a level of abstraction as close to the hardware as storage at a level of abstraction as close to the hardware as
possible. The library can support special purpose, transactional possible. The library can support special purpose, transactional
@ -187,7 +168,6 @@ storage interfaces in addition to ACID database-style interfaces to
abstract data models. \yad incorporates techniques from databases abstract data models. \yad incorporates techniques from databases
(e.g. write-ahead-logging) and systems (e.g. zero-copy techniques). (e.g. write-ahead-logging) and systems (e.g. zero-copy techniques).
Our goal is to combine the flexibility and layering of low-level Our goal is to combine the flexibility and layering of low-level
abstractions typical for systems work with the complete semantics abstractions typical for systems work with the complete semantics
that exemplify the database field. that exemplify the database field.
@ -254,12 +234,11 @@ hierarchical datasets, and so on. Before the relational model,
navigational databases implemented pointer- and record-based data models. navigational databases implemented pointer- and record-based data models.
An early survey of database implementations sought to enumerate the An early survey of database implementations sought to enumerate the
fundamental components used by database system implementors. This fundamental components used by database system implementors~\cite{batoryConceptual,batoryPhysical}. This
survey was performed due to difficulties in extending database systems survey was performed due to difficulties in extending database systems
into new application domains. It divided internal database into new application domains. It divided internal database
routines into two broad modules: {\em conceptual routines into two broad modules: {\em conceptual mappings} and {\em physical
mappings}~\cite{batoryConceptual} and {\em physical database models}.
database models}~\cite{batoryPhysical}.
%A physical model would then translate a set of tuples into an %A physical model would then translate a set of tuples into an
%on-disk B-Tree, and provide support for iterators and range-based query %on-disk B-Tree, and provide support for iterators and range-based query
@ -277,7 +256,7 @@ going to be used for short, write-intensive and high-concurrency
transactions (OLTP), the physical model would probably translate sets transactions (OLTP), the physical model would probably translate sets
of tuples into an on-disk B-Tree. In contrast, if the database needed of tuples into an on-disk B-Tree. In contrast, if the database needed
to support long-running, read only aggregation queries (OLAP) over high to support long-running, read only aggregation queries (OLAP) over high
dimensional data, a physical model that stores the data in sparse array format would dimensional data, a physical model that stores the data in a sparse array format would
be more appropriate~\cite{molap}. While both OLTP and OLAP databases are based be more appropriate~\cite{molap}. While both OLTP and OLAP databases are based
upon the relational model they make use of different physical models upon the relational model they make use of different physical models
in order to serve different classes of applications.} in order to serve different classes of applications.}
@ -295,14 +274,32 @@ structured physical model or abstract conceptual mappings.
\subsection{Extensible transaction systems} \subsection{Extensible transaction systems}
\label{sec:otherDBs} \label{sec:otherDBs}
This section contains discussion of transaction systems with goals similar to ours. This section contains discussion of transaction systems with goals
Although these projects were similar to ours. Although these projects were successful in many
successful in many respects, they fundamentally aimed to implement an respects, they fundamentally aimed to implement an extensible abstract
extensible data model, rather than build transactions from the bottom up. data model, rather than take a bottom-up approach and allow
In each case, this limits the applicability of their implementations. applications to customize the physical model in order to support new
high level abstractions. In each case, this limits these systems to
applications their physical models support well.
\eab{add Argus and Camelot} \eab{add Argus and Camelot}
\rcs{ Notes on these: Camelot focues more on language support for
distributed transactions. Its recovery mechanism is probably very
close to RVM's, as it does pure physical logging with transcation
duration page locks (really 'region' locks). }
\rcs{ I think Argus makes use of shadow copies for durability, and for
in-memory transactions. A tree of shadow copies exists, and is handled as
follows (I think): All transaction locks are commit duration, per
object. There are read locks and write locks, and it uses strict 2PL.
Each transaction is a tree of ``subactions'' that can get R/W locks
according to the 2PL rules. Two subactions in the same action cannot
get a write lock on the same object because each one gets its own copy
of the object to write to. If a subaction or transaction abort their
local copy is simply discarded. At commit, the local copy replaces
the global copy.}
\subsubsection{Extensible databases} \subsubsection{Extensible databases}
Genesis~\cite{genesis}, an early database toolkit, was built in terms Genesis~\cite{genesis}, an early database toolkit, was built in terms
@ -335,7 +332,7 @@ both types of systems aim to extend a high-level data model with new
abstract data types, and thus are quite limited in the range of new abstract data types, and thus are quite limited in the range of new
applications they support. In hindsight, it is not surprising that this kind of applications they support. In hindsight, it is not surprising that this kind of
extensibility has had little impact on the range of applications extensibility has had little impact on the range of applications
we listed above. we listed above. \rcs{This could be more clear. Perhaps ``... on applications that are not naturally supported by queries over sets of tuples, or other data items''?}
\subsubsection{Berkeley DB} \subsubsection{Berkeley DB}
@ -346,8 +343,8 @@ we listed above.
%databases. %databases.
Berkeley DB is a highly successful alternative to conventional Berkeley DB is a highly successful alternative to conventional
databases. At its core, it provides the physical database databases~\cite{libtp}. At its core, it provides the physical database model
(relational storage system) of a conventional database server. (relational storage system~\cite{systemR}) of a conventional database server.
%It is based on the %It is based on the
%observation that the storage subsystem is a more general (and less %observation that the storage subsystem is a more general (and less
%abstract) component than a monolithic database, and provides a %abstract) component than a monolithic database, and provides a
@ -357,7 +354,7 @@ In particular,
it provides fully transactional (ACID) operations over B-Trees, it provides fully transactional (ACID) operations over B-Trees,
hashtables, and other access methods. It provides flags that hashtables, and other access methods. It provides flags that
let its users tweak various aspects of the performance of these let its users tweak various aspects of the performance of these
primitives, and selectively disable the features it provides~\cite{libtp}. primitives, and selectively disable the features it provides.
With the With the
exception of the benchmark designed to fairly compare the two systems, none of the \yad exception of the benchmark designed to fairly compare the two systems, none of the \yad
@ -396,9 +393,8 @@ situation.
%implementations are generally incomprehensible and %implementations are generally incomprehensible and
%irreproducible, hindering further research. %irreproducible, hindering further research.
The study concludes The study concludes
by suggesting the adoption of {\em RISC} database architectures, both as a resource for researchers and as a by suggesting the adoption of highly modular, {\em RISC}, database architectures, both as a resource for researchers and as a
real-world database system. real-world database system.
RISC databases have many elements in common with RISC databases have many elements in common with
database toolkits. However, they take the database toolkit idea one database toolkits. However, they take the database toolkit idea one
step further, and suggest standardizing the interfaces of the step further, and suggest standardizing the interfaces of the
@ -444,7 +440,7 @@ operations are roughly structured as two levels of abstraction.
The transcational algorithms described in this section are not at all The transcational algorithms described in this section are not at all
novel, and are in fact based on ARIES~\cite{aries}. However, they novel, and are in fact based on ARIES~\cite{aries}. However, they
provide important background. Also, there is a large body of literature provide important background. There is a large body of literature
explaining optimizations and implementation techniques related to this explaining optimizations and implementation techniques related to this
type of recovery algorithm. Any good database textbook would cover these type of recovery algorithm. Any good database textbook would cover these
issues in more detail. issues in more detail.
@ -454,10 +450,10 @@ updates to regions of the disk. These updates do not have to deal
with concurrency, but the portion of the page file that they read and with concurrency, but the portion of the page file that they read and
write must be atomically updated, even if the system crashes. write must be atomically updated, even if the system crashes.
The higher level atomically applies operations The higher level provides operations that span multiple pages by
to the page file to provide operations that span multiple pages and atomically applying sets of operations to the page file and coping
copes with concurrency issues. Surprisingly, the implementations with concurrency issues. Surprisingly, the implementations of these
of these two layers are only loosely coupled. two layers are only loosely coupled.
Finally, this section describes how \yad manages transaction-duration Finally, this section describes how \yad manages transaction-duration
locks and discusses the alternatives \yad provides to application developers. locks and discusses the alternatives \yad provides to application developers.
@ -533,11 +529,12 @@ a non-atomic disk write, then such operations would fail during recovery.
Note that we could implement a limited form of transactions by Note that we could implement a limited form of transactions by
limiting each transaction to a single operation, and by forcing the limiting each transaction to a single operation, and by forcing the
page that each operation updates to disk in order. This would not page that each operation updates to disk in order. This would not
require any sort of logging, but is quite inefficient in practice. require any sort of logging, but is quite inefficient in practice, is
The rest of this section describes how recovery can be extended, first it foces the disk to perform a potentially random write each time the
to efficiently support multiple operations per transaction, and then page file is updated. The rest of this section describes how recovery
to allow more than one transaction to modify the same data before can be extended, first to efficiently support multiple operations per
committing. transaction, and then to allow more than one transaction to modify the
same data before committing.
\subsubsection{\yads Recovery Algorithm} \subsubsection{\yads Recovery Algorithm}