This commit is contained in:
Eric Brewer 2006-04-24 01:00:50 +00:00
parent ca229e9d83
commit f7122c9f62

View file

@ -25,7 +25,7 @@
% TARDIS: Atomic, Recoverable, Datamodel Independent Storage
% EAB: flex, basis, stable, dura
\newcommand{\yad}{Lemon\xspace}
\newcommand{\yad}{Stasys\xspace}
\newcommand{\oasys}{Oasys\xspace}
\newcommand{\eab}[1]{\textcolor{red}{\bf EAB: #1}}
@ -59,9 +59,9 @@ UC Berkeley
%\thispagestyle{empty}
\subsection*{Abstract}
%\subsection*{Abstract}
The is an increasing need to manage data well in a wide variety of
{\em There is an increasing need to manage data well in a wide variety of
systems, including robust support for atomic durable concurrent
transactions. Databases provide the default solution, but force
applications to interact via SQL and to forfeit control over data
@ -69,7 +69,7 @@ layout and access mechanisms. We argue there is a gap between DBMSs and file sy
\yad is a storage framework that incorporates ideas from traditional
write-ahead-logging storage algorithms and file systems,
while providing applications with flexible control over data structure, layout and performance vs. robustness tradeoffs.
while providing applications with flexible control over data structures, layout, and performance vs. robustness tradeoffs.
% increased control over their
%underlying modules. Generic transactional storage systems such as SQL
%and BerkeleyDB serve many applications well, but impose constraints
@ -90,9 +90,13 @@ improved performance to applications.
We present examples that make use of custom access methods,
modifed buffer manager semantics, direct log file manipulation, and
LSN-free pages that facilitate zero-copy optimizations, and discusses
LSN-free pages that facilitate zero-copy optimizations, and discuss
the composability of these extensions.
\eab{performance}
}
%We argue that our ability to support such a diverse range of
%transactional systems stems directly from our rejection of
%assumptions made by early database designers. These assumptions
@ -113,13 +117,14 @@ the composability of these extensions.
%existing systems.
\section{Introduction}
As our reliance on computing infrastructure has increased, the need
for robust data management has increased greatly, as has the range of
applications and systems that need it. Traditionally, data management
has been the province of database management systems, which although
well-suited to enterprise applications, leads to poor support for a
has been the province of database management systems (DBMSs), which although
well-suited to enterprise applications, lead to poor support for a
wide-range systems including grid and scientific computing,
bioinformatics, search engines, version control, and workflow
applications. These applications need transactions but don't fit well
@ -132,13 +137,15 @@ A typical example of this mismatch is in the support for
persistent objects in Java, called {\em Enterprise Java Beans}
(EJB). In a typical usage, an array of objects is made persistent by
mapping each object to a row in a table (or sometimes multiple
tables~\cite[xxx]) and then issuing queries to keep the objects and
tables~\cite{xxx}) and then issuing queries to keep the objects and
rows consistent. A typical update must confirm it has the current
version, modify the object, write out a serialized version using the
SQL update command and commit. This is an awkward and slow mechanism;
we show up to a 5x speedup over a MySQL implementation that is
optimized for single-threaded, local access (Section XXX).
Add bioinformatics = Perl + files example?
\eat{
Examples of real world systems that currently fall into this category
are web search engines, document repositories, large-scale web-email
@ -146,7 +153,6 @@ services, map and trip planning services, ticket reservation systems,
photo and video repositories, bioinformatics, version control systems,
workflow applications, CAD/VLSI applications and directory services.
In short, we believe that a fundamental architectural shift in
transactional storage is necessary before general purpose storage
systems are of practical use to modern applications.
@ -178,15 +184,11 @@ This paper presents \yad, a library that provides transactional
storage at a level of abstraction as close to the hardware as
possible. The library can support special purpose, transactional
storage interfaces as well as ACID database-style interfaces to
abstract data models.
Notably, \yad incorporates many existing technologies from the storage
communities, and allows applications to incorporate appropriate
subsystems as necessary. A partial open-source implementation of the
ideas presented below is available; performance numbers are provided
when possible.
Taken from sosp:
abstract data models. \yad incororates techniques from the databases
(e.g. write-ahead logging) and systems (e.g. zero-copy techniques).
Our goal is to combine the flexibility and layering of low-level
abstractions typical for systems work, with the complete semantics
that exemplify the database field.
By {\em flexible} we mean that \yad{} can implement a wide
range of transactional data structures, that it can support a variety
@ -206,18 +208,17 @@ to meet and form the {\em raison d'\^etre} for \yad{}: the framework
delivers these properties as reusable building blocks for systems
to implement complete transactions.
---
Through examples, and their good performance, we show how \yad{}
support a wide range of uses that in the database gap, including
persistent objects (roadmap?), graph or XML apps, and recoverable
virtual memory~\cite{lrvm}. An (early) open-source implementation of
the ideas presented below is available.
\eab{need to talk about positive examples: LRVM, Berk DB, windows registry? Grid FS from Wisconsin}
\eab{others? CVS, windows registry, berk DB, Grid FS?}
roadmap?
Applications that have only recently begun to make use of high-level
database features include XML based systems, object persistance
mechanisms, and enterprise management systems (notably, SAP R/3).
**We've explained why the sky is falling. Now, explain why \yad is
so good. (Take ideas from old paper.)**
\section{\yad is not a Database}
@ -229,8 +230,8 @@ database systems and research projects for at least 25 years.
The section concludes with a discussion of database systems that
attempt to address these problems. Although these systems were
successful in many respects, they failed to address the broad class of
software we are interested in.
successful in many respects, they fundamentally aim to implement a
data model, rather than build transactions from the bottom up. \eab{move this?}
\subsection{The database abstraction}
@ -240,42 +241,40 @@ abstractions they present. For instance, relational database systems
implement the relational model~\cite{cobb}, object oriented
databases implement object abstractions, XML databases implement
hierarchical datasets, and so on. Before the relational model,
navigational databases implemented pointer and record
based data models.
navigational databases implemented pointer- and record-based data models.
An early survey of database implementations sought to enumerate the
fundamental components used by database system implementors. This
survey was performed due to difficulties in extending database systems
into new application domains. The survey divided internal database
routines into two broad modules: conceptual
mappings~\cite{batoryConceptual} and the physical
database~\cite{batoryPhysical} model.
routines into two broad modules: {\em conceptual
mappings}~\cite{batoryConceptual} and the {\em physical
database}~\cite{batoryPhysical} model.
A conceptual mapping might translate a relation into a set of keyed
tuples. A physical model could then translate a set of tuples into an
tuples. A physical model would then translate a set of tuples into an
on-disk B-Tree, and provide support for iterators and range-based query
operations.
It is the responsibility of a database implementor to choose a set of
conceptual mappings that implement the desired higher level
conceptual mappings that implement the desired higher-level
abstraction (such as the relational model). The physical data model
is chosen to efficiently support the set of mappings that are built on
top of it.
{\em The key observation of this paper is that no known physical data model
{\em A key observation of this paper is that no known physical data model
can support more than a small percentage of today's applications.}
Instead of attempting to create such a model after decades of database
research has failed to produce one, we opt to provide a transactional
storage model that mimics the primitives provided by modern hardware.
This makes it easy for system designers to implement most of the data
models that the underlying hardware is capable of supporting, or to
abandon the database approach entirely, and forgo the use of a
models that the underlying hardware can support, or to
abandon the data model approach entirely, and forgo the use of a
structured physical model or conceptual mappings.
\subsection{Extensible databases}
Genesis~\cite{genesis}, an early database toolkit, was built in terms
of a physical data model, and the conceptual mappings desribed above.
It was designed allow database implementors to easily swap out
@ -284,11 +283,13 @@ Like subsequent systems (including \yad), it allowed it users to
implement custom operations.
Subsequent extensible database work builds upon these foundations.
The Exodus~\cite{exodus} database toolkit was the successor to
For example, the Exodus~\cite{exodus} database toolkit was the successor to
Genesis. It supported the autmatic generation of query optimizers and
execution engines based upon abstract data type definitions, access
methods and cost models provided by its users.
\eab{move this next paragraph to RW?}
Starburst's~\cite{starburst} physical data model consisted of {\em
storage methods}. Storage methods supported {\em attachment types}
that allowed triggers and active databases to be implemented. An
@ -304,7 +305,7 @@ object-oriented database systems, and relational databases with
support for user-definable abstract data types (such as in
Postgres~\cite{postgres}) were the primary competitors to extensible
database toolkits. Ideas from all of these systems have been
incorporated into the mechanisms that support user definable types in
incorporated into the mechanisms that support user-definable types in
current database systems.
One can characterise the difference between database toolkits and
@ -312,16 +313,12 @@ extensible database servers in terms of early and late binding. With
a database toolkit, new types are defined when the database server is
compiled. In today's object-relational database systems, new types
are defined at runtime. Each approach has its advantages. However,
both types of systems attempted to provide similar levels of
abstraction and flexibility to their end users.
Therefore, the database toolkit approach is inappropriate for
applications not well serviced by modern database systems.
both types of systems aim to extend a high-level data model with new abstract data types, and thus are quite limited in the range of new applications they support. Not surprisingly, this kind of extensibility has had little impact on the range of applications we listed above.
\subsection{Berkeley DB}
System R was the first relational database implementation, and was
based upon a clean separation between it's storage system and its
based upon a clean separation between its storage system and its
query processing engine. In fact, it supported a simple navigational
interface to the storage subsystem. To this day, database systems are
built using this sort of architecture.
@ -342,48 +339,36 @@ primitives.
We have already discussed the limitations of this approach. With the
exception of the direct comparison of the two systems, none of the \yad
applications presented in Section~\ref{extensions} are efficiently
supported by Berkeley DB. This is a result of Berkeley DB's,
supported by Berkeley DB. This is a result of Berkeley DB's
assumptions regarding workloads and decisions regarding low level data
representation. While Berkeley DB could be built on top of \yad,
representation. Thus, although Berkeley DB could be built on top of \yad,
Berkeley DB is too specialized to support \yad.
\subsection{Boxwood}
\eab{for BDB, should we say that it still has a data model?}
The Boxwood system provides a networked, fault-tolerant transactional
B-Tree and ``Chunk Manager.'' We believe that \yad is an interesting
complement to such a system, especially given \yad's focus on
intelligence and optimizations within a single node, and Boxwoods
focus on multiple node systems. In particular, when implementing
applications with predictable locality properties, it would be
interesting to explore extensions to the Boxwood approach that make
use of \yad's customizable semantics (Section~\ref{wal}), and fully logical logging
mechanism. (Section~\ref{logging})
%cover P2 (the old one, not "Pier 2" if there is time...
\subsection{Better databases}
The database community is also aware of this gap.
A recent survey~\cite{riscDB} enumerates problems that plague users of
state-of-the-art database systems.
The survey finds that database implementations fail to support the
state-of-the-art database systems, and finds that database implementations fail to support the
needs of modern systems. In large systems, this manifests itself as
managability and tuning issues that prevent databases from predictably
servicing diverse, large scale, declartive, workloads.
On small devices, footprint, predictable performance, and power consumption are
primary, concerns that database systems do not address.
Midsize deployments, such as desktop installations, must run without
user intervention, but self-tuning, self-administering database
servers are still an area of active research.
%Midsize deployments, such as desktop installations, must run without
%user intervention, but self-tuning, self-administering database
%servers are still an area of active research.
The survey argues that these problems cannot be adequately addressed without a fundamental shift in the architectures that underly database systems. Complete, modern database
implementations are generally incomprehensible and
irreproducable, hindering further research. The study concludes
by suggesting the adoption of ``RISC''
style database architectures, both as a research and as an
by suggesting the adoption of ``RISC''-style database architectures, both as a research and an
implementation tool~\cite{riscDB}.
RISC databases have many elements in common with
@ -398,13 +383,12 @@ effort required to implement a new database system~\cite{riscDB}.
We agree with the motivations behind RISC databases, and that a need
for improvement in database technology exists. In fact, is our hope
that our system will mature to the point where it can support
competitive relational database storage subsystems. However this is
a competitive relational database. However this is
not our primary goal.
Instead, we are interested in supporting applications that derive
little benefit from database abstractions, but that need reliable
storage. Therefore, instead of building a modular database, we seek
to build a system that allows programmers to avoid databases.
to build a system that enables a wider range of data management options.
%For example, large scale application such as web search, map services,
%e-mail use databases to store unstructured binary data, if at all.
@ -983,10 +967,25 @@ concurrent, durable data structure using RVM. We plan to add RVM
style transactional memory to \yad in a way that is compatible with
fully concurrent collections such as hash tables and tree structures.
\section{Related Work?}
The Boxwood system provides a networked, fault-tolerant transactional
B-Tree and ``Chunk Manager.'' We believe that \yad is an interesting
complement to such a system, especially given \yad's focus on
intelligence and optimizations within a single node, and Boxwoods
focus on multiple node systems. In particular, when implementing
applications with predictable locality properties, it would be
interesting to explore extensions to the Boxwood approach that make
use of \yad's customizable semantics (Section~\ref{wal}), and fully logical logging
mechanism. (Section~\ref{logging})
\section{Conclusion}
\section{Acknowledgements}
mike demmer, others?
\section{Availability}
Additional information, and \yad's source code is available at: