This commit is contained in:
Eric Brewer 2006-04-24 01:00:50 +00:00
parent ca229e9d83
commit f7122c9f62

View file

@ -25,7 +25,7 @@
% TARDIS: Atomic, Recoverable, Datamodel Independent Storage % TARDIS: Atomic, Recoverable, Datamodel Independent Storage
% EAB: flex, basis, stable, dura % EAB: flex, basis, stable, dura
\newcommand{\yad}{Lemon\xspace} \newcommand{\yad}{Stasys\xspace}
\newcommand{\oasys}{Oasys\xspace} \newcommand{\oasys}{Oasys\xspace}
\newcommand{\eab}[1]{\textcolor{red}{\bf EAB: #1}} \newcommand{\eab}[1]{\textcolor{red}{\bf EAB: #1}}
@ -59,9 +59,9 @@ UC Berkeley
%\thispagestyle{empty} %\thispagestyle{empty}
\subsection*{Abstract} %\subsection*{Abstract}
The is an increasing need to manage data well in a wide variety of {\em There is an increasing need to manage data well in a wide variety of
systems, including robust support for atomic durable concurrent systems, including robust support for atomic durable concurrent
transactions. Databases provide the default solution, but force transactions. Databases provide the default solution, but force
applications to interact via SQL and to forfeit control over data applications to interact via SQL and to forfeit control over data
@ -69,7 +69,7 @@ layout and access mechanisms. We argue there is a gap between DBMSs and file sy
\yad is a storage framework that incorporates ideas from traditional \yad is a storage framework that incorporates ideas from traditional
write-ahead-logging storage algorithms and file systems, write-ahead-logging storage algorithms and file systems,
while providing applications with flexible control over data structure, layout and performance vs. robustness tradeoffs. while providing applications with flexible control over data structures, layout, and performance vs. robustness tradeoffs.
% increased control over their % increased control over their
%underlying modules. Generic transactional storage systems such as SQL %underlying modules. Generic transactional storage systems such as SQL
%and BerkeleyDB serve many applications well, but impose constraints %and BerkeleyDB serve many applications well, but impose constraints
@ -90,9 +90,13 @@ improved performance to applications.
We present examples that make use of custom access methods, We present examples that make use of custom access methods,
modifed buffer manager semantics, direct log file manipulation, and modifed buffer manager semantics, direct log file manipulation, and
LSN-free pages that facilitate zero-copy optimizations, and discusses LSN-free pages that facilitate zero-copy optimizations, and discuss
the composability of these extensions. the composability of these extensions.
\eab{performance}
}
%We argue that our ability to support such a diverse range of %We argue that our ability to support such a diverse range of
%transactional systems stems directly from our rejection of %transactional systems stems directly from our rejection of
%assumptions made by early database designers. These assumptions %assumptions made by early database designers. These assumptions
@ -113,13 +117,14 @@ the composability of these extensions.
%existing systems. %existing systems.
\section{Introduction} \section{Introduction}
As our reliance on computing infrastructure has increased, the need As our reliance on computing infrastructure has increased, the need
for robust data management has increased greatly, as has the range of for robust data management has increased greatly, as has the range of
applications and systems that need it. Traditionally, data management applications and systems that need it. Traditionally, data management
has been the province of database management systems, which although has been the province of database management systems (DBMSs), which although
well-suited to enterprise applications, leads to poor support for a well-suited to enterprise applications, lead to poor support for a
wide-range systems including grid and scientific computing, wide-range systems including grid and scientific computing,
bioinformatics, search engines, version control, and workflow bioinformatics, search engines, version control, and workflow
applications. These applications need transactions but don't fit well applications. These applications need transactions but don't fit well
@ -132,13 +137,15 @@ A typical example of this mismatch is in the support for
persistent objects in Java, called {\em Enterprise Java Beans} persistent objects in Java, called {\em Enterprise Java Beans}
(EJB). In a typical usage, an array of objects is made persistent by (EJB). In a typical usage, an array of objects is made persistent by
mapping each object to a row in a table (or sometimes multiple mapping each object to a row in a table (or sometimes multiple
tables~\cite[xxx]) and then issuing queries to keep the objects and tables~\cite{xxx}) and then issuing queries to keep the objects and
rows consistent. A typical update must confirm it has the current rows consistent. A typical update must confirm it has the current
version, modify the object, write out a serialized version using the version, modify the object, write out a serialized version using the
SQL update command and commit. This is an awkward and slow mechanism; SQL update command and commit. This is an awkward and slow mechanism;
we show up to a 5x speedup over a MySQL implementation that is we show up to a 5x speedup over a MySQL implementation that is
optimized for single-threaded, local access (Section XXX). optimized for single-threaded, local access (Section XXX).
Add bioinformatics = Perl + files example?
\eat{ \eat{
Examples of real world systems that currently fall into this category Examples of real world systems that currently fall into this category
are web search engines, document repositories, large-scale web-email are web search engines, document repositories, large-scale web-email
@ -146,7 +153,6 @@ services, map and trip planning services, ticket reservation systems,
photo and video repositories, bioinformatics, version control systems, photo and video repositories, bioinformatics, version control systems,
workflow applications, CAD/VLSI applications and directory services. workflow applications, CAD/VLSI applications and directory services.
In short, we believe that a fundamental architectural shift in In short, we believe that a fundamental architectural shift in
transactional storage is necessary before general purpose storage transactional storage is necessary before general purpose storage
systems are of practical use to modern applications. systems are of practical use to modern applications.
@ -178,15 +184,11 @@ This paper presents \yad, a library that provides transactional
storage at a level of abstraction as close to the hardware as storage at a level of abstraction as close to the hardware as
possible. The library can support special purpose, transactional possible. The library can support special purpose, transactional
storage interfaces as well as ACID database-style interfaces to storage interfaces as well as ACID database-style interfaces to
abstract data models. abstract data models. \yad incororates techniques from the databases
(e.g. write-ahead logging) and systems (e.g. zero-copy techniques).
Notably, \yad incorporates many existing technologies from the storage Our goal is to combine the flexibility and layering of low-level
communities, and allows applications to incorporate appropriate abstractions typical for systems work, with the complete semantics
subsystems as necessary. A partial open-source implementation of the that exemplify the database field.
ideas presented below is available; performance numbers are provided
when possible.
Taken from sosp:
By {\em flexible} we mean that \yad{} can implement a wide By {\em flexible} we mean that \yad{} can implement a wide
range of transactional data structures, that it can support a variety range of transactional data structures, that it can support a variety
@ -206,18 +208,17 @@ to meet and form the {\em raison d'\^etre} for \yad{}: the framework
delivers these properties as reusable building blocks for systems delivers these properties as reusable building blocks for systems
to implement complete transactions. to implement complete transactions.
--- Through examples, and their good performance, we show how \yad{}
support a wide range of uses that in the database gap, including
persistent objects (roadmap?), graph or XML apps, and recoverable
virtual memory~\cite{lrvm}. An (early) open-source implementation of
the ideas presented below is available.
\eab{need to talk about positive examples: LRVM, Berk DB, windows registry? Grid FS from Wisconsin} \eab{others? CVS, windows registry, berk DB, Grid FS?}
roadmap?
Applications that have only recently begun to make use of high-level
database features include XML based systems, object persistance
mechanisms, and enterprise management systems (notably, SAP R/3).
**We've explained why the sky is falling. Now, explain why \yad is
so good. (Take ideas from old paper.)**
\section{\yad is not a Database} \section{\yad is not a Database}
@ -229,8 +230,8 @@ database systems and research projects for at least 25 years.
The section concludes with a discussion of database systems that The section concludes with a discussion of database systems that
attempt to address these problems. Although these systems were attempt to address these problems. Although these systems were
successful in many respects, they failed to address the broad class of successful in many respects, they fundamentally aim to implement a
software we are interested in. data model, rather than build transactions from the bottom up. \eab{move this?}
\subsection{The database abstraction} \subsection{The database abstraction}
@ -240,42 +241,40 @@ abstractions they present. For instance, relational database systems
implement the relational model~\cite{cobb}, object oriented implement the relational model~\cite{cobb}, object oriented
databases implement object abstractions, XML databases implement databases implement object abstractions, XML databases implement
hierarchical datasets, and so on. Before the relational model, hierarchical datasets, and so on. Before the relational model,
navigational databases implemented pointer and record navigational databases implemented pointer- and record-based data models.
based data models.
An early survey of database implementations sought to enumerate the An early survey of database implementations sought to enumerate the
fundamental components used by database system implementors. This fundamental components used by database system implementors. This
survey was performed due to difficulties in extending database systems survey was performed due to difficulties in extending database systems
into new application domains. The survey divided internal database into new application domains. The survey divided internal database
routines into two broad modules: conceptual routines into two broad modules: {\em conceptual
mappings~\cite{batoryConceptual} and the physical mappings}~\cite{batoryConceptual} and the {\em physical
database~\cite{batoryPhysical} model. database}~\cite{batoryPhysical} model.
A conceptual mapping might translate a relation into a set of keyed A conceptual mapping might translate a relation into a set of keyed
tuples. A physical model could then translate a set of tuples into an tuples. A physical model would then translate a set of tuples into an
on-disk B-Tree, and provide support for iterators and range-based query on-disk B-Tree, and provide support for iterators and range-based query
operations. operations.
It is the responsibility of a database implementor to choose a set of It is the responsibility of a database implementor to choose a set of
conceptual mappings that implement the desired higher level conceptual mappings that implement the desired higher-level
abstraction (such as the relational model). The physical data model abstraction (such as the relational model). The physical data model
is chosen to efficiently support the set of mappings that are built on is chosen to efficiently support the set of mappings that are built on
top of it. top of it.
{\em The key observation of this paper is that no known physical data model {\em A key observation of this paper is that no known physical data model
can support more than a small percentage of today's applications.} can support more than a small percentage of today's applications.}
Instead of attempting to create such a model after decades of database Instead of attempting to create such a model after decades of database
research has failed to produce one, we opt to provide a transactional research has failed to produce one, we opt to provide a transactional
storage model that mimics the primitives provided by modern hardware. storage model that mimics the primitives provided by modern hardware.
This makes it easy for system designers to implement most of the data This makes it easy for system designers to implement most of the data
models that the underlying hardware is capable of supporting, or to models that the underlying hardware can support, or to
abandon the database approach entirely, and forgo the use of a abandon the data model approach entirely, and forgo the use of a
structured physical model or conceptual mappings. structured physical model or conceptual mappings.
\subsection{Extensible databases} \subsection{Extensible databases}
Genesis~\cite{genesis}, an early database toolkit, was built in terms Genesis~\cite{genesis}, an early database toolkit, was built in terms
of a physical data model, and the conceptual mappings desribed above. of a physical data model, and the conceptual mappings desribed above.
It was designed allow database implementors to easily swap out It was designed allow database implementors to easily swap out
@ -284,11 +283,13 @@ Like subsequent systems (including \yad), it allowed it users to
implement custom operations. implement custom operations.
Subsequent extensible database work builds upon these foundations. Subsequent extensible database work builds upon these foundations.
The Exodus~\cite{exodus} database toolkit was the successor to For example, the Exodus~\cite{exodus} database toolkit was the successor to
Genesis. It supported the autmatic generation of query optimizers and Genesis. It supported the autmatic generation of query optimizers and
execution engines based upon abstract data type definitions, access execution engines based upon abstract data type definitions, access
methods and cost models provided by its users. methods and cost models provided by its users.
\eab{move this next paragraph to RW?}
Starburst's~\cite{starburst} physical data model consisted of {\em Starburst's~\cite{starburst} physical data model consisted of {\em
storage methods}. Storage methods supported {\em attachment types} storage methods}. Storage methods supported {\em attachment types}
that allowed triggers and active databases to be implemented. An that allowed triggers and active databases to be implemented. An
@ -304,7 +305,7 @@ object-oriented database systems, and relational databases with
support for user-definable abstract data types (such as in support for user-definable abstract data types (such as in
Postgres~\cite{postgres}) were the primary competitors to extensible Postgres~\cite{postgres}) were the primary competitors to extensible
database toolkits. Ideas from all of these systems have been database toolkits. Ideas from all of these systems have been
incorporated into the mechanisms that support user definable types in incorporated into the mechanisms that support user-definable types in
current database systems. current database systems.
One can characterise the difference between database toolkits and One can characterise the difference between database toolkits and
@ -312,16 +313,12 @@ extensible database servers in terms of early and late binding. With
a database toolkit, new types are defined when the database server is a database toolkit, new types are defined when the database server is
compiled. In today's object-relational database systems, new types compiled. In today's object-relational database systems, new types
are defined at runtime. Each approach has its advantages. However, are defined at runtime. Each approach has its advantages. However,
both types of systems attempted to provide similar levels of both types of systems aim to extend a high-level data model with new abstract data types, and thus are quite limited in the range of new applications they support. Not surprisingly, this kind of extensibility has had little impact on the range of applications we listed above.
abstraction and flexibility to their end users.
Therefore, the database toolkit approach is inappropriate for
applications not well serviced by modern database systems.
\subsection{Berkeley DB} \subsection{Berkeley DB}
System R was the first relational database implementation, and was System R was the first relational database implementation, and was
based upon a clean separation between it's storage system and its based upon a clean separation between its storage system and its
query processing engine. In fact, it supported a simple navigational query processing engine. In fact, it supported a simple navigational
interface to the storage subsystem. To this day, database systems are interface to the storage subsystem. To this day, database systems are
built using this sort of architecture. built using this sort of architecture.
@ -342,48 +339,36 @@ primitives.
We have already discussed the limitations of this approach. With the We have already discussed the limitations of this approach. With the
exception of the direct comparison of the two systems, none of the \yad exception of the direct comparison of the two systems, none of the \yad
applications presented in Section~\ref{extensions} are efficiently applications presented in Section~\ref{extensions} are efficiently
supported by Berkeley DB. This is a result of Berkeley DB's, supported by Berkeley DB. This is a result of Berkeley DB's
assumptions regarding workloads and decisions regarding low level data assumptions regarding workloads and decisions regarding low level data
representation. While Berkeley DB could be built on top of \yad, representation. Thus, although Berkeley DB could be built on top of \yad,
Berkeley DB is too specialized to support \yad. Berkeley DB is too specialized to support \yad.
\subsection{Boxwood} \eab{for BDB, should we say that it still has a data model?}
The Boxwood system provides a networked, fault-tolerant transactional
B-Tree and ``Chunk Manager.'' We believe that \yad is an interesting
complement to such a system, especially given \yad's focus on
intelligence and optimizations within a single node, and Boxwoods
focus on multiple node systems. In particular, when implementing
applications with predictable locality properties, it would be
interesting to explore extensions to the Boxwood approach that make
use of \yad's customizable semantics (Section~\ref{wal}), and fully logical logging
mechanism. (Section~\ref{logging})
%cover P2 (the old one, not "Pier 2" if there is time... %cover P2 (the old one, not "Pier 2" if there is time...
\subsection{Better databases} \subsection{Better databases}
The database community is also aware of this gap.
A recent survey~\cite{riscDB} enumerates problems that plague users of A recent survey~\cite{riscDB} enumerates problems that plague users of
state-of-the-art database systems. state-of-the-art database systems, and finds that database implementations fail to support the
The survey finds that database implementations fail to support the
needs of modern systems. In large systems, this manifests itself as needs of modern systems. In large systems, this manifests itself as
managability and tuning issues that prevent databases from predictably managability and tuning issues that prevent databases from predictably
servicing diverse, large scale, declartive, workloads. servicing diverse, large scale, declartive, workloads.
On small devices, footprint, predictable performance, and power consumption are On small devices, footprint, predictable performance, and power consumption are
primary, concerns that database systems do not address. primary, concerns that database systems do not address.
Midsize deployments, such as desktop installations, must run without %Midsize deployments, such as desktop installations, must run without
user intervention, but self-tuning, self-administering database %user intervention, but self-tuning, self-administering database
servers are still an area of active research. %servers are still an area of active research.
The survey argues that these problems cannot be adequately addressed without a fundamental shift in the architectures that underly database systems. Complete, modern database The survey argues that these problems cannot be adequately addressed without a fundamental shift in the architectures that underly database systems. Complete, modern database
implementations are generally incomprehensible and implementations are generally incomprehensible and
irreproducable, hindering further research. The study concludes irreproducable, hindering further research. The study concludes
by suggesting the adoption of ``RISC'' by suggesting the adoption of ``RISC''-style database architectures, both as a research and an
style database architectures, both as a research and as an
implementation tool~\cite{riscDB}. implementation tool~\cite{riscDB}.
RISC databases have many elements in common with RISC databases have many elements in common with
@ -398,13 +383,12 @@ effort required to implement a new database system~\cite{riscDB}.
We agree with the motivations behind RISC databases, and that a need We agree with the motivations behind RISC databases, and that a need
for improvement in database technology exists. In fact, is our hope for improvement in database technology exists. In fact, is our hope
that our system will mature to the point where it can support that our system will mature to the point where it can support
competitive relational database storage subsystems. However this is a competitive relational database. However this is
not our primary goal. not our primary goal.
Instead, we are interested in supporting applications that derive Instead, we are interested in supporting applications that derive
little benefit from database abstractions, but that need reliable little benefit from database abstractions, but that need reliable
storage. Therefore, instead of building a modular database, we seek storage. Therefore, instead of building a modular database, we seek
to build a system that allows programmers to avoid databases. to build a system that enables a wider range of data management options.
%For example, large scale application such as web search, map services, %For example, large scale application such as web search, map services,
%e-mail use databases to store unstructured binary data, if at all. %e-mail use databases to store unstructured binary data, if at all.
@ -983,10 +967,25 @@ concurrent, durable data structure using RVM. We plan to add RVM
style transactional memory to \yad in a way that is compatible with style transactional memory to \yad in a way that is compatible with
fully concurrent collections such as hash tables and tree structures. fully concurrent collections such as hash tables and tree structures.
\section{Related Work?}
The Boxwood system provides a networked, fault-tolerant transactional
B-Tree and ``Chunk Manager.'' We believe that \yad is an interesting
complement to such a system, especially given \yad's focus on
intelligence and optimizations within a single node, and Boxwoods
focus on multiple node systems. In particular, when implementing
applications with predictable locality properties, it would be
interesting to explore extensions to the Boxwood approach that make
use of \yad's customizable semantics (Section~\ref{wal}), and fully logical logging
mechanism. (Section~\ref{logging})
\section{Conclusion} \section{Conclusion}
\section{Acknowledgements} \section{Acknowledgements}
mike demmer, others?
\section{Availability} \section{Availability}
Additional information, and \yad's source code is available at: Additional information, and \yad's source code is available at: