sec1-2
This commit is contained in:
parent
ca229e9d83
commit
f7122c9f62
1 changed files with 71 additions and 72 deletions
|
@ -25,7 +25,7 @@
|
||||||
% TARDIS: Atomic, Recoverable, Datamodel Independent Storage
|
% TARDIS: Atomic, Recoverable, Datamodel Independent Storage
|
||||||
% EAB: flex, basis, stable, dura
|
% EAB: flex, basis, stable, dura
|
||||||
|
|
||||||
\newcommand{\yad}{Lemon\xspace}
|
\newcommand{\yad}{Stasys\xspace}
|
||||||
\newcommand{\oasys}{Oasys\xspace}
|
\newcommand{\oasys}{Oasys\xspace}
|
||||||
|
|
||||||
\newcommand{\eab}[1]{\textcolor{red}{\bf EAB: #1}}
|
\newcommand{\eab}[1]{\textcolor{red}{\bf EAB: #1}}
|
||||||
|
@ -59,9 +59,9 @@ UC Berkeley
|
||||||
%\thispagestyle{empty}
|
%\thispagestyle{empty}
|
||||||
|
|
||||||
|
|
||||||
\subsection*{Abstract}
|
%\subsection*{Abstract}
|
||||||
|
|
||||||
The is an increasing need to manage data well in a wide variety of
|
{\em There is an increasing need to manage data well in a wide variety of
|
||||||
systems, including robust support for atomic durable concurrent
|
systems, including robust support for atomic durable concurrent
|
||||||
transactions. Databases provide the default solution, but force
|
transactions. Databases provide the default solution, but force
|
||||||
applications to interact via SQL and to forfeit control over data
|
applications to interact via SQL and to forfeit control over data
|
||||||
|
@ -69,7 +69,7 @@ layout and access mechanisms. We argue there is a gap between DBMSs and file sy
|
||||||
|
|
||||||
\yad is a storage framework that incorporates ideas from traditional
|
\yad is a storage framework that incorporates ideas from traditional
|
||||||
write-ahead-logging storage algorithms and file systems,
|
write-ahead-logging storage algorithms and file systems,
|
||||||
while providing applications with flexible control over data structure, layout and performance vs. robustness tradeoffs.
|
while providing applications with flexible control over data structures, layout, and performance vs. robustness tradeoffs.
|
||||||
% increased control over their
|
% increased control over their
|
||||||
%underlying modules. Generic transactional storage systems such as SQL
|
%underlying modules. Generic transactional storage systems such as SQL
|
||||||
%and BerkeleyDB serve many applications well, but impose constraints
|
%and BerkeleyDB serve many applications well, but impose constraints
|
||||||
|
@ -90,9 +90,13 @@ improved performance to applications.
|
||||||
|
|
||||||
We present examples that make use of custom access methods,
|
We present examples that make use of custom access methods,
|
||||||
modifed buffer manager semantics, direct log file manipulation, and
|
modifed buffer manager semantics, direct log file manipulation, and
|
||||||
LSN-free pages that facilitate zero-copy optimizations, and discusses
|
LSN-free pages that facilitate zero-copy optimizations, and discuss
|
||||||
the composability of these extensions.
|
the composability of these extensions.
|
||||||
|
|
||||||
|
\eab{performance}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
%We argue that our ability to support such a diverse range of
|
%We argue that our ability to support such a diverse range of
|
||||||
%transactional systems stems directly from our rejection of
|
%transactional systems stems directly from our rejection of
|
||||||
%assumptions made by early database designers. These assumptions
|
%assumptions made by early database designers. These assumptions
|
||||||
|
@ -113,13 +117,14 @@ the composability of these extensions.
|
||||||
%existing systems.
|
%existing systems.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\section{Introduction}
|
\section{Introduction}
|
||||||
|
|
||||||
As our reliance on computing infrastructure has increased, the need
|
As our reliance on computing infrastructure has increased, the need
|
||||||
for robust data management has increased greatly, as has the range of
|
for robust data management has increased greatly, as has the range of
|
||||||
applications and systems that need it. Traditionally, data management
|
applications and systems that need it. Traditionally, data management
|
||||||
has been the province of database management systems, which although
|
has been the province of database management systems (DBMSs), which although
|
||||||
well-suited to enterprise applications, leads to poor support for a
|
well-suited to enterprise applications, lead to poor support for a
|
||||||
wide-range systems including grid and scientific computing,
|
wide-range systems including grid and scientific computing,
|
||||||
bioinformatics, search engines, version control, and workflow
|
bioinformatics, search engines, version control, and workflow
|
||||||
applications. These applications need transactions but don't fit well
|
applications. These applications need transactions but don't fit well
|
||||||
|
@ -132,13 +137,15 @@ A typical example of this mismatch is in the support for
|
||||||
persistent objects in Java, called {\em Enterprise Java Beans}
|
persistent objects in Java, called {\em Enterprise Java Beans}
|
||||||
(EJB). In a typical usage, an array of objects is made persistent by
|
(EJB). In a typical usage, an array of objects is made persistent by
|
||||||
mapping each object to a row in a table (or sometimes multiple
|
mapping each object to a row in a table (or sometimes multiple
|
||||||
tables~\cite[xxx]) and then issuing queries to keep the objects and
|
tables~\cite{xxx}) and then issuing queries to keep the objects and
|
||||||
rows consistent. A typical update must confirm it has the current
|
rows consistent. A typical update must confirm it has the current
|
||||||
version, modify the object, write out a serialized version using the
|
version, modify the object, write out a serialized version using the
|
||||||
SQL update command and commit. This is an awkward and slow mechanism;
|
SQL update command and commit. This is an awkward and slow mechanism;
|
||||||
we show up to a 5x speedup over a MySQL implementation that is
|
we show up to a 5x speedup over a MySQL implementation that is
|
||||||
optimized for single-threaded, local access (Section XXX).
|
optimized for single-threaded, local access (Section XXX).
|
||||||
|
|
||||||
|
Add bioinformatics = Perl + files example?
|
||||||
|
|
||||||
\eat{
|
\eat{
|
||||||
Examples of real world systems that currently fall into this category
|
Examples of real world systems that currently fall into this category
|
||||||
are web search engines, document repositories, large-scale web-email
|
are web search engines, document repositories, large-scale web-email
|
||||||
|
@ -146,7 +153,6 @@ services, map and trip planning services, ticket reservation systems,
|
||||||
photo and video repositories, bioinformatics, version control systems,
|
photo and video repositories, bioinformatics, version control systems,
|
||||||
workflow applications, CAD/VLSI applications and directory services.
|
workflow applications, CAD/VLSI applications and directory services.
|
||||||
|
|
||||||
|
|
||||||
In short, we believe that a fundamental architectural shift in
|
In short, we believe that a fundamental architectural shift in
|
||||||
transactional storage is necessary before general purpose storage
|
transactional storage is necessary before general purpose storage
|
||||||
systems are of practical use to modern applications.
|
systems are of practical use to modern applications.
|
||||||
|
@ -178,15 +184,11 @@ This paper presents \yad, a library that provides transactional
|
||||||
storage at a level of abstraction as close to the hardware as
|
storage at a level of abstraction as close to the hardware as
|
||||||
possible. The library can support special purpose, transactional
|
possible. The library can support special purpose, transactional
|
||||||
storage interfaces as well as ACID database-style interfaces to
|
storage interfaces as well as ACID database-style interfaces to
|
||||||
abstract data models.
|
abstract data models. \yad incororates techniques from the databases
|
||||||
|
(e.g. write-ahead logging) and systems (e.g. zero-copy techniques).
|
||||||
Notably, \yad incorporates many existing technologies from the storage
|
Our goal is to combine the flexibility and layering of low-level
|
||||||
communities, and allows applications to incorporate appropriate
|
abstractions typical for systems work, with the complete semantics
|
||||||
subsystems as necessary. A partial open-source implementation of the
|
that exemplify the database field.
|
||||||
ideas presented below is available; performance numbers are provided
|
|
||||||
when possible.
|
|
||||||
|
|
||||||
Taken from sosp:
|
|
||||||
|
|
||||||
By {\em flexible} we mean that \yad{} can implement a wide
|
By {\em flexible} we mean that \yad{} can implement a wide
|
||||||
range of transactional data structures, that it can support a variety
|
range of transactional data structures, that it can support a variety
|
||||||
|
@ -206,18 +208,17 @@ to meet and form the {\em raison d'\^etre} for \yad{}: the framework
|
||||||
delivers these properties as reusable building blocks for systems
|
delivers these properties as reusable building blocks for systems
|
||||||
to implement complete transactions.
|
to implement complete transactions.
|
||||||
|
|
||||||
---
|
Through examples, and their good performance, we show how \yad{}
|
||||||
|
support a wide range of uses that in the database gap, including
|
||||||
|
persistent objects (roadmap?), graph or XML apps, and recoverable
|
||||||
|
virtual memory~\cite{lrvm}. An (early) open-source implementation of
|
||||||
|
the ideas presented below is available.
|
||||||
|
|
||||||
\eab{need to talk about positive examples: LRVM, Berk DB, windows registry? Grid FS from Wisconsin}
|
\eab{others? CVS, windows registry, berk DB, Grid FS?}
|
||||||
|
|
||||||
|
roadmap?
|
||||||
|
|
||||||
|
|
||||||
Applications that have only recently begun to make use of high-level
|
|
||||||
database features include XML based systems, object persistance
|
|
||||||
mechanisms, and enterprise management systems (notably, SAP R/3).
|
|
||||||
|
|
||||||
|
|
||||||
**We've explained why the sky is falling. Now, explain why \yad is
|
|
||||||
so good. (Take ideas from old paper.)**
|
|
||||||
|
|
||||||
\section{\yad is not a Database}
|
\section{\yad is not a Database}
|
||||||
|
|
||||||
|
@ -229,8 +230,8 @@ database systems and research projects for at least 25 years.
|
||||||
|
|
||||||
The section concludes with a discussion of database systems that
|
The section concludes with a discussion of database systems that
|
||||||
attempt to address these problems. Although these systems were
|
attempt to address these problems. Although these systems were
|
||||||
successful in many respects, they failed to address the broad class of
|
successful in many respects, they fundamentally aim to implement a
|
||||||
software we are interested in.
|
data model, rather than build transactions from the bottom up. \eab{move this?}
|
||||||
|
|
||||||
|
|
||||||
\subsection{The database abstraction}
|
\subsection{The database abstraction}
|
||||||
|
@ -240,42 +241,40 @@ abstractions they present. For instance, relational database systems
|
||||||
implement the relational model~\cite{cobb}, object oriented
|
implement the relational model~\cite{cobb}, object oriented
|
||||||
databases implement object abstractions, XML databases implement
|
databases implement object abstractions, XML databases implement
|
||||||
hierarchical datasets, and so on. Before the relational model,
|
hierarchical datasets, and so on. Before the relational model,
|
||||||
navigational databases implemented pointer and record
|
navigational databases implemented pointer- and record-based data models.
|
||||||
based data models.
|
|
||||||
|
|
||||||
An early survey of database implementations sought to enumerate the
|
An early survey of database implementations sought to enumerate the
|
||||||
fundamental components used by database system implementors. This
|
fundamental components used by database system implementors. This
|
||||||
survey was performed due to difficulties in extending database systems
|
survey was performed due to difficulties in extending database systems
|
||||||
into new application domains. The survey divided internal database
|
into new application domains. The survey divided internal database
|
||||||
routines into two broad modules: conceptual
|
routines into two broad modules: {\em conceptual
|
||||||
mappings~\cite{batoryConceptual} and the physical
|
mappings}~\cite{batoryConceptual} and the {\em physical
|
||||||
database~\cite{batoryPhysical} model.
|
database}~\cite{batoryPhysical} model.
|
||||||
|
|
||||||
A conceptual mapping might translate a relation into a set of keyed
|
A conceptual mapping might translate a relation into a set of keyed
|
||||||
tuples. A physical model could then translate a set of tuples into an
|
tuples. A physical model would then translate a set of tuples into an
|
||||||
on-disk B-Tree, and provide support for iterators and range-based query
|
on-disk B-Tree, and provide support for iterators and range-based query
|
||||||
operations.
|
operations.
|
||||||
|
|
||||||
It is the responsibility of a database implementor to choose a set of
|
It is the responsibility of a database implementor to choose a set of
|
||||||
conceptual mappings that implement the desired higher level
|
conceptual mappings that implement the desired higher-level
|
||||||
abstraction (such as the relational model). The physical data model
|
abstraction (such as the relational model). The physical data model
|
||||||
is chosen to efficiently support the set of mappings that are built on
|
is chosen to efficiently support the set of mappings that are built on
|
||||||
top of it.
|
top of it.
|
||||||
|
|
||||||
{\em The key observation of this paper is that no known physical data model
|
{\em A key observation of this paper is that no known physical data model
|
||||||
can support more than a small percentage of today's applications.}
|
can support more than a small percentage of today's applications.}
|
||||||
|
|
||||||
Instead of attempting to create such a model after decades of database
|
Instead of attempting to create such a model after decades of database
|
||||||
research has failed to produce one, we opt to provide a transactional
|
research has failed to produce one, we opt to provide a transactional
|
||||||
storage model that mimics the primitives provided by modern hardware.
|
storage model that mimics the primitives provided by modern hardware.
|
||||||
This makes it easy for system designers to implement most of the data
|
This makes it easy for system designers to implement most of the data
|
||||||
models that the underlying hardware is capable of supporting, or to
|
models that the underlying hardware can support, or to
|
||||||
abandon the database approach entirely, and forgo the use of a
|
abandon the data model approach entirely, and forgo the use of a
|
||||||
structured physical model or conceptual mappings.
|
structured physical model or conceptual mappings.
|
||||||
|
|
||||||
\subsection{Extensible databases}
|
\subsection{Extensible databases}
|
||||||
|
|
||||||
|
|
||||||
Genesis~\cite{genesis}, an early database toolkit, was built in terms
|
Genesis~\cite{genesis}, an early database toolkit, was built in terms
|
||||||
of a physical data model, and the conceptual mappings desribed above.
|
of a physical data model, and the conceptual mappings desribed above.
|
||||||
It was designed allow database implementors to easily swap out
|
It was designed allow database implementors to easily swap out
|
||||||
|
@ -284,11 +283,13 @@ Like subsequent systems (including \yad), it allowed it users to
|
||||||
implement custom operations.
|
implement custom operations.
|
||||||
|
|
||||||
Subsequent extensible database work builds upon these foundations.
|
Subsequent extensible database work builds upon these foundations.
|
||||||
The Exodus~\cite{exodus} database toolkit was the successor to
|
For example, the Exodus~\cite{exodus} database toolkit was the successor to
|
||||||
Genesis. It supported the autmatic generation of query optimizers and
|
Genesis. It supported the autmatic generation of query optimizers and
|
||||||
execution engines based upon abstract data type definitions, access
|
execution engines based upon abstract data type definitions, access
|
||||||
methods and cost models provided by its users.
|
methods and cost models provided by its users.
|
||||||
|
|
||||||
|
\eab{move this next paragraph to RW?}
|
||||||
|
|
||||||
Starburst's~\cite{starburst} physical data model consisted of {\em
|
Starburst's~\cite{starburst} physical data model consisted of {\em
|
||||||
storage methods}. Storage methods supported {\em attachment types}
|
storage methods}. Storage methods supported {\em attachment types}
|
||||||
that allowed triggers and active databases to be implemented. An
|
that allowed triggers and active databases to be implemented. An
|
||||||
|
@ -304,7 +305,7 @@ object-oriented database systems, and relational databases with
|
||||||
support for user-definable abstract data types (such as in
|
support for user-definable abstract data types (such as in
|
||||||
Postgres~\cite{postgres}) were the primary competitors to extensible
|
Postgres~\cite{postgres}) were the primary competitors to extensible
|
||||||
database toolkits. Ideas from all of these systems have been
|
database toolkits. Ideas from all of these systems have been
|
||||||
incorporated into the mechanisms that support user definable types in
|
incorporated into the mechanisms that support user-definable types in
|
||||||
current database systems.
|
current database systems.
|
||||||
|
|
||||||
One can characterise the difference between database toolkits and
|
One can characterise the difference between database toolkits and
|
||||||
|
@ -312,16 +313,12 @@ extensible database servers in terms of early and late binding. With
|
||||||
a database toolkit, new types are defined when the database server is
|
a database toolkit, new types are defined when the database server is
|
||||||
compiled. In today's object-relational database systems, new types
|
compiled. In today's object-relational database systems, new types
|
||||||
are defined at runtime. Each approach has its advantages. However,
|
are defined at runtime. Each approach has its advantages. However,
|
||||||
both types of systems attempted to provide similar levels of
|
both types of systems aim to extend a high-level data model with new abstract data types, and thus are quite limited in the range of new applications they support. Not surprisingly, this kind of extensibility has had little impact on the range of applications we listed above.
|
||||||
abstraction and flexibility to their end users.
|
|
||||||
|
|
||||||
Therefore, the database toolkit approach is inappropriate for
|
|
||||||
applications not well serviced by modern database systems.
|
|
||||||
|
|
||||||
\subsection{Berkeley DB}
|
\subsection{Berkeley DB}
|
||||||
|
|
||||||
System R was the first relational database implementation, and was
|
System R was the first relational database implementation, and was
|
||||||
based upon a clean separation between it's storage system and its
|
based upon a clean separation between its storage system and its
|
||||||
query processing engine. In fact, it supported a simple navigational
|
query processing engine. In fact, it supported a simple navigational
|
||||||
interface to the storage subsystem. To this day, database systems are
|
interface to the storage subsystem. To this day, database systems are
|
||||||
built using this sort of architecture.
|
built using this sort of architecture.
|
||||||
|
@ -342,48 +339,36 @@ primitives.
|
||||||
We have already discussed the limitations of this approach. With the
|
We have already discussed the limitations of this approach. With the
|
||||||
exception of the direct comparison of the two systems, none of the \yad
|
exception of the direct comparison of the two systems, none of the \yad
|
||||||
applications presented in Section~\ref{extensions} are efficiently
|
applications presented in Section~\ref{extensions} are efficiently
|
||||||
supported by Berkeley DB. This is a result of Berkeley DB's,
|
supported by Berkeley DB. This is a result of Berkeley DB's
|
||||||
assumptions regarding workloads and decisions regarding low level data
|
assumptions regarding workloads and decisions regarding low level data
|
||||||
representation. While Berkeley DB could be built on top of \yad,
|
representation. Thus, although Berkeley DB could be built on top of \yad,
|
||||||
Berkeley DB is too specialized to support \yad.
|
Berkeley DB is too specialized to support \yad.
|
||||||
|
|
||||||
\subsection{Boxwood}
|
\eab{for BDB, should we say that it still has a data model?}
|
||||||
|
|
||||||
The Boxwood system provides a networked, fault-tolerant transactional
|
|
||||||
B-Tree and ``Chunk Manager.'' We believe that \yad is an interesting
|
|
||||||
complement to such a system, especially given \yad's focus on
|
|
||||||
intelligence and optimizations within a single node, and Boxwoods
|
|
||||||
focus on multiple node systems. In particular, when implementing
|
|
||||||
applications with predictable locality properties, it would be
|
|
||||||
interesting to explore extensions to the Boxwood approach that make
|
|
||||||
use of \yad's customizable semantics (Section~\ref{wal}), and fully logical logging
|
|
||||||
mechanism. (Section~\ref{logging})
|
|
||||||
|
|
||||||
|
|
||||||
%cover P2 (the old one, not "Pier 2" if there is time...
|
%cover P2 (the old one, not "Pier 2" if there is time...
|
||||||
|
|
||||||
\subsection{Better databases}
|
\subsection{Better databases}
|
||||||
|
|
||||||
|
The database community is also aware of this gap.
|
||||||
A recent survey~\cite{riscDB} enumerates problems that plague users of
|
A recent survey~\cite{riscDB} enumerates problems that plague users of
|
||||||
state-of-the-art database systems.
|
state-of-the-art database systems, and finds that database implementations fail to support the
|
||||||
|
|
||||||
The survey finds that database implementations fail to support the
|
|
||||||
needs of modern systems. In large systems, this manifests itself as
|
needs of modern systems. In large systems, this manifests itself as
|
||||||
managability and tuning issues that prevent databases from predictably
|
managability and tuning issues that prevent databases from predictably
|
||||||
servicing diverse, large scale, declartive, workloads.
|
servicing diverse, large scale, declartive, workloads.
|
||||||
|
|
||||||
On small devices, footprint, predictable performance, and power consumption are
|
On small devices, footprint, predictable performance, and power consumption are
|
||||||
primary, concerns that database systems do not address.
|
primary, concerns that database systems do not address.
|
||||||
|
|
||||||
Midsize deployments, such as desktop installations, must run without
|
%Midsize deployments, such as desktop installations, must run without
|
||||||
user intervention, but self-tuning, self-administering database
|
%user intervention, but self-tuning, self-administering database
|
||||||
servers are still an area of active research.
|
%servers are still an area of active research.
|
||||||
|
|
||||||
The survey argues that these problems cannot be adequately addressed without a fundamental shift in the architectures that underly database systems. Complete, modern database
|
The survey argues that these problems cannot be adequately addressed without a fundamental shift in the architectures that underly database systems. Complete, modern database
|
||||||
implementations are generally incomprehensible and
|
implementations are generally incomprehensible and
|
||||||
irreproducable, hindering further research. The study concludes
|
irreproducable, hindering further research. The study concludes
|
||||||
by suggesting the adoption of ``RISC''
|
by suggesting the adoption of ``RISC''-style database architectures, both as a research and an
|
||||||
style database architectures, both as a research and as an
|
|
||||||
implementation tool~\cite{riscDB}.
|
implementation tool~\cite{riscDB}.
|
||||||
|
|
||||||
RISC databases have many elements in common with
|
RISC databases have many elements in common with
|
||||||
|
@ -398,13 +383,12 @@ effort required to implement a new database system~\cite{riscDB}.
|
||||||
We agree with the motivations behind RISC databases, and that a need
|
We agree with the motivations behind RISC databases, and that a need
|
||||||
for improvement in database technology exists. In fact, is our hope
|
for improvement in database technology exists. In fact, is our hope
|
||||||
that our system will mature to the point where it can support
|
that our system will mature to the point where it can support
|
||||||
competitive relational database storage subsystems. However this is
|
a competitive relational database. However this is
|
||||||
not our primary goal.
|
not our primary goal.
|
||||||
|
|
||||||
Instead, we are interested in supporting applications that derive
|
Instead, we are interested in supporting applications that derive
|
||||||
little benefit from database abstractions, but that need reliable
|
little benefit from database abstractions, but that need reliable
|
||||||
storage. Therefore, instead of building a modular database, we seek
|
storage. Therefore, instead of building a modular database, we seek
|
||||||
to build a system that allows programmers to avoid databases.
|
to build a system that enables a wider range of data management options.
|
||||||
|
|
||||||
%For example, large scale application such as web search, map services,
|
%For example, large scale application such as web search, map services,
|
||||||
%e-mail use databases to store unstructured binary data, if at all.
|
%e-mail use databases to store unstructured binary data, if at all.
|
||||||
|
@ -983,10 +967,25 @@ concurrent, durable data structure using RVM. We plan to add RVM
|
||||||
style transactional memory to \yad in a way that is compatible with
|
style transactional memory to \yad in a way that is compatible with
|
||||||
fully concurrent collections such as hash tables and tree structures.
|
fully concurrent collections such as hash tables and tree structures.
|
||||||
|
|
||||||
|
|
||||||
|
\section{Related Work?}
|
||||||
|
|
||||||
|
The Boxwood system provides a networked, fault-tolerant transactional
|
||||||
|
B-Tree and ``Chunk Manager.'' We believe that \yad is an interesting
|
||||||
|
complement to such a system, especially given \yad's focus on
|
||||||
|
intelligence and optimizations within a single node, and Boxwoods
|
||||||
|
focus on multiple node systems. In particular, when implementing
|
||||||
|
applications with predictable locality properties, it would be
|
||||||
|
interesting to explore extensions to the Boxwood approach that make
|
||||||
|
use of \yad's customizable semantics (Section~\ref{wal}), and fully logical logging
|
||||||
|
mechanism. (Section~\ref{logging})
|
||||||
|
|
||||||
\section{Conclusion}
|
\section{Conclusion}
|
||||||
|
|
||||||
\section{Acknowledgements}
|
\section{Acknowledgements}
|
||||||
|
|
||||||
|
mike demmer, others?
|
||||||
|
|
||||||
\section{Availability}
|
\section{Availability}
|
||||||
|
|
||||||
Additional information, and \yad's source code is available at:
|
Additional information, and \yad's source code is available at:
|
||||||
|
|
Loading…
Reference in a new issue