Did a pass of section 2, changed name to Lemon so the figures are up-to-date.
This commit is contained in:
parent
f73f124f2a
commit
bfb65391ad
1 changed files with 144 additions and 198 deletions
|
@ -25,8 +25,8 @@
|
||||||
% TARDIS: Atomic, Recoverable, Datamodel Independent Storage
|
% TARDIS: Atomic, Recoverable, Datamodel Independent Storage
|
||||||
% EAB: flex, basis, stable, dura
|
% EAB: flex, basis, stable, dura
|
||||||
|
|
||||||
\newcommand{\yad}{Void\xspace}
|
\newcommand{\yad}{Lemon\xspace}
|
||||||
\newcommand{\oasys}{Juicer\xspace}
|
\newcommand{\oasys}{Oasys\xspace}
|
||||||
|
|
||||||
\newcommand{\eab}[1]{\textcolor{red}{\bf EAB: #1}}
|
\newcommand{\eab}[1]{\textcolor{red}{\bf EAB: #1}}
|
||||||
\newcommand{\rcs}[1]{\textcolor{green}{\bf RCS: #1}}
|
\newcommand{\rcs}[1]{\textcolor{green}{\bf RCS: #1}}
|
||||||
|
@ -302,36 +302,143 @@ Instead of attempting to create such a model after decades of database
|
||||||
research has failed to produce one, we opt to provide a transactional
|
research has failed to produce one, we opt to provide a transactional
|
||||||
storage model that mimics the primitives provided by modern hardware.
|
storage model that mimics the primitives provided by modern hardware.
|
||||||
This makes it easy for system designers to implement most of the data
|
This makes it easy for system designers to implement most of the data
|
||||||
models that the underlying hardware is capable of supporting.
|
models that the underlying hardware is capable of supporting, or to
|
||||||
|
abandon the database approach entirely, and forgo the use of a
|
||||||
|
structured physical model or conceptual mappings.
|
||||||
|
|
||||||
\subsection{\yad and ``traditional'' database workloads}
|
\subsection{Extensible databases}
|
||||||
|
|
||||||
\rcs{Rework this, and put it at end of this section: Key idea: DBMS systems are managability nightmare!}
|
|
||||||
|
Genesis~\cite{genesis}, an early database toolkit, was built in terms
|
||||||
|
of a physical data model, and the conceptual mappings desribed above.
|
||||||
|
It was designed allow database implementors to easily swap out
|
||||||
|
implementations of the various components defined by its framework.
|
||||||
|
Like subsequent systems (including \yad), it allowed it users to
|
||||||
|
implement custom operations.
|
||||||
|
|
||||||
|
Subsequent extensible database work builds upon these foundations.
|
||||||
|
The Exodus~\cite{exodus} database toolkit was the successor to
|
||||||
|
Genesis. It supported the autmatic generation of query optimizers and
|
||||||
|
execution engines based upon abstract data type definitions, access
|
||||||
|
methods and cost models provided by its users.
|
||||||
|
|
||||||
|
Starburst's~\cite{starburst} physical data model consisted of {\em
|
||||||
|
storage methods}. Storage methods supported {\em attachment types}
|
||||||
|
that allowed triggers and active databases to be implemented. An
|
||||||
|
attachment type is associated with some data on disk, and is invoked
|
||||||
|
via an event queue whenever the data is modified. In addition to
|
||||||
|
providing triggers, it was used to facilitate index management.
|
||||||
|
Starburst includes a type system that supported multiple inheritance,
|
||||||
|
and it supports hints such as information regarding desired physical
|
||||||
|
clustering. Starburst also included a query language.
|
||||||
|
|
||||||
|
Although further discussion is beyond the scope of this paper,
|
||||||
|
object-oriented database systems, and relational databases with
|
||||||
|
support for user-definable abstract data types (such as in
|
||||||
|
Postgres~\cite{postgres}) were the primary competitors to extensible
|
||||||
|
database toolkits. Ideas from all of these systems have been
|
||||||
|
incorporated into the mechanisms that support user definable types in
|
||||||
|
current database systems.
|
||||||
|
|
||||||
|
One can characterise the difference between database toolkits and
|
||||||
|
extensible database servers in terms of early and late binding. With
|
||||||
|
a database toolkit, new types are defined when the database server is
|
||||||
|
compiled. In today's object-relational database systems, new types
|
||||||
|
are defined at runtime. Each approach has its advantages. However,
|
||||||
|
both types of systems attempted to provide similar levels of
|
||||||
|
abstraction and flexibility to their end users.
|
||||||
|
|
||||||
|
Therefore, the database toolkit approach is inappropriate for
|
||||||
|
applications not well serviced by modern database systems.
|
||||||
|
|
||||||
|
\subsection{Berkeley DB}
|
||||||
|
|
||||||
|
System R was the first relational database implementation, and was
|
||||||
|
based upon a clean separation between it's storage system and its
|
||||||
|
query processing engine. In fact, it supported a simple navigational
|
||||||
|
interface to the storage subsystem. To this day, database systems are
|
||||||
|
built using this sort of architecture.
|
||||||
|
|
||||||
|
Berkeley DB is a highly successful alternative to conventional
|
||||||
|
database design. At its core, it provides the physical database, or
|
||||||
|
the relational storage system of a conventional database server.
|
||||||
|
It is based on the
|
||||||
|
observation that the storge subsystem is a more general (and less
|
||||||
|
abstract) component than a monolithic database, and provides a
|
||||||
|
standalone implementation of the storage primitives built into
|
||||||
|
most relational database systems~\cite{libtp}. In particular,
|
||||||
|
it provides fully transactional (ACID) operations over B-Trees,
|
||||||
|
hashtables, and other access methods. It provides flags that
|
||||||
|
let its users tweak various aspects of the performance of these
|
||||||
|
primitives.
|
||||||
|
|
||||||
|
We have already discussed the limitations of this approach. With the
|
||||||
|
exception of the direct comparison of the two systems, none of the \yad
|
||||||
|
applications presented in Section~\ref{extensions} are efficiently
|
||||||
|
supported by Berkeley DB. This is a result of Berkeley DB's,
|
||||||
|
assumptions regarding workloads and decisions regarding low level data
|
||||||
|
representation. While Berkeley DB could be built on top of \yad,
|
||||||
|
Berkeley DB is too specialized to support \yad.
|
||||||
|
|
||||||
|
\subsection{Boxwood}
|
||||||
|
|
||||||
|
The Boxwood system provides a networked, fault-tolerant transactional
|
||||||
|
B-Tree and ``Chunk Manager.'' We believe that \yad is an interesting
|
||||||
|
complement to such a system, especially given \yad's focus on
|
||||||
|
intelligence and optimizations within a single node, and Boxwoods
|
||||||
|
focus on multiple node systems. In particular, when implementing
|
||||||
|
applications with predictable locality properties, it would be
|
||||||
|
interesting to explore extensions to the Boxwood approach that make
|
||||||
|
use of \yad's customizable semantics (Section~\ref{wal}), and fully logical logging
|
||||||
|
mechanism. (Section~\ref{logging})
|
||||||
|
|
||||||
|
|
||||||
|
%cover P2 (the old one, not "Pier 2" if there is time...
|
||||||
|
|
||||||
|
\subsection{Better databases}
|
||||||
|
|
||||||
A recent survey~\cite{riscDB} enumerates problems that plague users of
|
A recent survey~\cite{riscDB} enumerates problems that plague users of
|
||||||
state-of-the-art database systems.
|
state-of-the-art database systems.
|
||||||
|
|
||||||
The survey finds that database implementations fail to support the
|
The survey finds that database implementations fail to support the
|
||||||
needs of modern systems. In large systems, this manifests itself as
|
needs of modern systems. In large systems, this manifests itself as
|
||||||
managability and tuning issues that prevent databases from effectively
|
managability and tuning issues that prevent databases from predictably
|
||||||
servicing large scale, diverse, interactive workloads. On smaller
|
servicing diverse, large scale, declartive, workloads.
|
||||||
systems, footprint, predictable performance, and power consumption are
|
|
||||||
primary concerns that remain troublesome.
|
|
||||||
%Database applications that must scale up to large numbers of
|
|
||||||
%independent, self-administering desktop installations will be
|
|
||||||
%problematic unless a number of open research problems are solved.
|
|
||||||
|
|
||||||
The survey also provides evidence that declarative languages such as SQL are problematic.
|
On small devices, footprint, predictable performance, and power consumption are
|
||||||
Although SQL serves some classes of applications well, it is
|
primary, concerns that database systems do not address.
|
||||||
often inadequate for algorithmic and hierarchical computing tasks.
|
|
||||||
|
|
||||||
Finally, complete, modern database
|
Midsize deployments, such as desktop installations, must run without
|
||||||
implementations are often incomprehensible and
|
user intervention, but self-tuning, self-administering database
|
||||||
irreproducable, hindering further research. After making these
|
servers are still an area of active research.
|
||||||
points, the study concludes by suggesting the adoption of ``RISC''
|
|
||||||
|
The survey argues that these problems cannot be adequately addressed without a fundamental shift in the architectures that underly database systems. Complete, modern database
|
||||||
|
implementations are generally incomprehensible and
|
||||||
|
irreproducable, hindering further research. The study concludes
|
||||||
|
by suggesting the adoption of ``RISC''
|
||||||
style database architectures, both as a research and as an
|
style database architectures, both as a research and as an
|
||||||
implementation tool~\cite{riscDB}.
|
implementation tool~\cite{riscDB}.
|
||||||
|
|
||||||
|
RISC databases have many elements in common with
|
||||||
|
database toolkits. However, they take the database toolkit idea one
|
||||||
|
step further, and suggest standardizing the interfaces of the
|
||||||
|
toolkit's internal components, allowing multiple organizations to
|
||||||
|
compete to improve each module. The idea is to produce a research
|
||||||
|
platform, and to address issues that affect modern
|
||||||
|
databases, such as automatic performance tuning, and reducing the
|
||||||
|
effort required to implement a new database system~\cite{riscDB}.
|
||||||
|
|
||||||
|
We agree with the motivations behind RISC databases, and that a need
|
||||||
|
for improvement in database technology exists. In fact, is our hope
|
||||||
|
that our system will mature to the point where it can support
|
||||||
|
competitive relational database storage subsystems. However this is
|
||||||
|
not our primary goal.
|
||||||
|
|
||||||
|
Instead, we are interested in supporting applications that derive
|
||||||
|
little benefit from database abstractions, but that need reliable
|
||||||
|
storage. Therefore, instead of building a modular database, we seek
|
||||||
|
to build a system that allows programmers to avoid databases.
|
||||||
|
|
||||||
%For example, large scale application such as web search, map services,
|
%For example, large scale application such as web search, map services,
|
||||||
%e-mail use databases to store unstructured binary data, if at all.
|
%e-mail use databases to store unstructured binary data, if at all.
|
||||||
|
|
||||||
|
@ -347,185 +454,6 @@ implementation tool~\cite{riscDB}.
|
||||||
%was more difficult than implementing from scratch (winfs), scaling
|
%was more difficult than implementing from scratch (winfs), scaling
|
||||||
%down doesn't work (variance in performance, footprint),
|
%down doesn't work (variance in performance, footprint),
|
||||||
|
|
||||||
%\subsection{Database Toolkits}
|
|
||||||
|
|
||||||
%\yad is a library that could be used to provide the storage primatives needed by a
|
|
||||||
%database server. Therefore, one might suppose that \yad is a database
|
|
||||||
%toolkit. However, such an assumption would be incorrect, as \yad incorporates neither of the two basic concepts that underly database toolkit designs. These two concepts are
|
|
||||||
%{\em conceptual-to-internal mappings}~\cite{batoryConceptual}
|
|
||||||
%and {\em physical database models}~\cite{batoryPhysical}.
|
|
||||||
%
|
|
||||||
%Conceptual-to-internal mappings and physical database models were
|
|
||||||
%discovered during an early survey of database implementations. Mappings
|
|
||||||
%describe the computational primitives upon which client applications must
|
|
||||||
%be implemented. Physical database models define the on-disk layout used
|
|
||||||
%by a system in terms of data layouts and representations that are commonly
|
|
||||||
%used by relational and navigational database implementations.
|
|
||||||
%
|
|
||||||
%Both concepts are fundamentally incompatible with a general storage
|
|
||||||
%implementation. By definition, database servers (and toolkits) encode both
|
|
||||||
%concepts, while transaction processing libraries manage to avoid complex
|
|
||||||
%conceptual mappings. \yad's novelty stems from the fact that it avoids
|
|
||||||
%both concepts, while making it easy for applications to incorporate results from the database
|
|
||||||
%literature.
|
|
||||||
|
|
||||||
|
|
||||||
%\subsubsection{Conceptual mappings}
|
|
||||||
%
|
|
||||||
%At the time of their introduction, ten
|
|
||||||
%conceptual-to-internal mappings were sufficient to describe existing
|
|
||||||
%database systems. These mappings included indexing, encoding
|
|
||||||
%(compression, encryption, etc), segmentation (along field boundaries),
|
|
||||||
%fragmentation (without regard to fields), $n:m$ pointers, and
|
|
||||||
%horizontal partitioning, among others.
|
|
||||||
%
|
|
||||||
%The initial survey postulates that a finite number of such mappings
|
|
||||||
%are adequate to describe database systems. A
|
|
||||||
%database toolkit need only implement each type of mapping in order to
|
|
||||||
%encode the set of all conceivable database systems.
|
|
||||||
%
|
|
||||||
%Our work's primary concern is to support systems beyond database
|
|
||||||
%implementations. Therefore, our system must support a more general
|
|
||||||
%set of primitives than existing systems. Defining a universal (but
|
|
||||||
%practical) framework that encompasses such a broad class of
|
|
||||||
%computation is clearly unrealistic.
|
|
||||||
%
|
|
||||||
%Therefore, \yad's architecture avoids hard-coded assumptions regarding
|
|
||||||
%the computation or abstract data types of the applications built on
|
|
||||||
%top of it.
|
|
||||||
%
|
|
||||||
\rcs{ This belongs somewhere else: Instead, it leaves decisions regarding abstract data types and
|
|
||||||
algorithm design to system developers or language designers. For
|
|
||||||
instance, while \yad has no concept of object oriented data types, two
|
|
||||||
radically different approaches toward object persistance have been
|
|
||||||
implemented on top of it~\ref{oasys}.}
|
|
||||||
|
|
||||||
\rcs{We could have just as easily written a persistance mechanism for a
|
|
||||||
functional programming language, or a particular application (such as
|
|
||||||
an email server). Our experience building data manipulation routines
|
|
||||||
on top of application-specific primitives was favorable compared to
|
|
||||||
past experiences attempting to restructure entire applications to
|
|
||||||
match pre-existing computational models, such as SQL's declarative
|
|
||||||
interface.}
|
|
||||||
|
|
||||||
%\subsubsection{Physical data models}
|
|
||||||
%
|
|
||||||
%As it was initially tempting to say that \yad was a database toolkit,
|
|
||||||
%it may now be tempting to claim that \yad implements a physical
|
|
||||||
%database model. In this section, we discuss fundamental limitations
|
|
||||||
%of the physical data model, and explain how \yad avoids these
|
|
||||||
%limitations.
|
|
||||||
|
|
||||||
We discuss Berkeley DB, and show that it provides funcationality
|
|
||||||
similar to a physical database model. Just as \yad allows
|
|
||||||
applications to build mappings on top of the primitives it provides,
|
|
||||||
\yad's design allows them to take design storage in terms of a
|
|
||||||
physical database model. Therefore, while Berkeley DB could be implemented on top
|
|
||||||
of \yad, Berkeley DB cannot support the primitives provided by \yad.
|
|
||||||
|
|
||||||
Genesis~\cite{genesis}, an early database toolkit, was built in terms
|
|
||||||
of interchangable primitives that implemented the interfaces of an
|
|
||||||
early database implementation model. It built upon the idea of
|
|
||||||
conceptual mappings described above, and the physical database model
|
|
||||||
decribed here.
|
|
||||||
|
|
||||||
%The physical database model provides the abstraction upon which
|
|
||||||
%conceptual mappings can be built. It is based on a partitioning of storage into
|
|
||||||
%{\em simple files}, which provide operations associated with key based storage, and
|
|
||||||
%{\em linksets}, which make use of various pointer storage schemes to provide
|
|
||||||
%mappings between records in simple files~\cite{batoryPhysical}.
|
|
||||||
|
|
||||||
Subsequent database toolkit work builds upon these foundations,
|
|
||||||
Exodus~\cite{exodus} and Starburst~\cite{starburst} are notable
|
|
||||||
examples, and incorporated a number of ideas that will be referred to
|
|
||||||
later in this paper. Although further discussion is beyond the scope
|
|
||||||
of this paper, object-oriented database systems, and relational
|
|
||||||
databases with support for user-definable abstract data types (such as
|
|
||||||
in Postgres~\cite{postgres}) were the primary competitors to these
|
|
||||||
database toolkits, and are the precursors to the user definable types
|
|
||||||
present in current database systems.
|
|
||||||
|
|
||||||
One can characterise the difference between database toolkits and
|
|
||||||
extensible database servers in terms of early and late binding. With
|
|
||||||
a database toolkit, new types are defined when the database server is
|
|
||||||
compiled. In today's object-relational database systems, new types
|
|
||||||
are defined at runtime. Each approach has its advantages. However,
|
|
||||||
both types of systems attempted to provide similar levels of
|
|
||||||
abstraction and flexibility to their end users.
|
|
||||||
|
|
||||||
Therefore, the database toolkit approach is inappropriate for
|
|
||||||
applications not well serviced by modern database systems.
|
|
||||||
|
|
||||||
\eat{Therefore, \yad abandons the concept of a physical database. Instead
|
|
||||||
of forcing applications to reason in terms of simple files and
|
|
||||||
linksets, it allows applications to reason about storage in terms of
|
|
||||||
atomically applicable changes to the page file. Of course,
|
|
||||||
applications that wish to reason in terms of linksets and simple files
|
|
||||||
are free to do so.
|
|
||||||
|
|
||||||
We regret forcing applications to arrange for updates to be atomic, but
|
|
||||||
this restriction is fundamental if we wish to support concurrent
|
|
||||||
transactions, durability and recovery using conventional hardware
|
|
||||||
systems. In Section~\ref{nestedTopActions} we explain how a set of
|
|
||||||
atomic changes may be atomically applied to the page file, alleviating
|
|
||||||
the burden we place upon applications somewhat.}
|
|
||||||
|
|
||||||
Now that we have introduced the underlying concepts of database
|
|
||||||
toolkits, we can discuss the proposed RISC database architectures
|
|
||||||
in more detail. RISC databases have many elements in common with
|
|
||||||
database toolkits. However, they take the database toolkit idea one
|
|
||||||
step further, and suggest standardizing the interfaces of the
|
|
||||||
toolkit's internal components, allowing multiple organizations to
|
|
||||||
compete to improve each module. The idea is to produce a research
|
|
||||||
platform, and to address issues that affect modern
|
|
||||||
databases, such as automatic performance tuning, and reducing the
|
|
||||||
effort required to implement a new database system~\cite{riscDB}.
|
|
||||||
|
|
||||||
Although we agree with the motivations behind RISC databases, instead of
|
|
||||||
building a modular database, we seek to build a system that allows
|
|
||||||
programmers to avoid databases.
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{Transaction processing libraries}
|
|
||||||
|
|
||||||
Berkeley DB is a highly successful alternative to conventional
|
|
||||||
database design. At its core, it provides the physical database, or
|
|
||||||
the relational storage system of a conventional database server.
|
|
||||||
|
|
||||||
This module focuses on providing fully transactional data storage with
|
|
||||||
B-Tree and hashtable based indexes. Berkeley DB also provides some
|
|
||||||
support for application specific access methods, as did Genesis, and
|
|
||||||
the database toolkits that succeeded it~\cite{libtp}. Finally,
|
|
||||||
Berkeley DB allows applications that need to modify the recovery
|
|
||||||
semantics of Berkeley DB, or otherwise tweak the way its
|
|
||||||
write-ahead-logging protocol works to pass flags via its API.
|
|
||||||
|
|
||||||
Transaction processing libraries such as Berkeley DB are \yad's closest relative.
|
|
||||||
However, they encode a physical data model, and hardcode many
|
|
||||||
assumptions regarding workloads and decisions regarding low level data
|
|
||||||
representation. While Berkeley DB could be built on top of \yad,
|
|
||||||
Berkeley DB is too specialized to support \yad.
|
|
||||||
|
|
||||||
The Boxwood system provides a networked, fault-tolerant transactional
|
|
||||||
B-Tree and ``Chunk Manager.'' We believe that \yad is an interesting
|
|
||||||
complement to such a system, especially given \yad's focus on
|
|
||||||
intelligence and optimizations within a single node, and Boxwoods
|
|
||||||
focus on multiple node systems. In particular, when implementing
|
|
||||||
applications with predictable locality properties, it would be
|
|
||||||
interesting to explore extensions to the Boxwood approach that make
|
|
||||||
use of \yad's customizable semantics (Section~\ref{wal}), and fully logical logging
|
|
||||||
mechanism. (Section~\ref{logging})
|
|
||||||
|
|
||||||
|
|
||||||
% This part of the rant belongs in some other paper:
|
|
||||||
%
|
|
||||||
%Offer rebuttal to the Asilomar Report. On the web 2.0, no one knows
|
|
||||||
%you implemeneted your web service with perl and duct tape... Is it
|
|
||||||
%possible to scale to 1,000,000's of datastores without punting on the
|
|
||||||
%data model? (HTML suggests not...) Argue that C bindings are be the
|
|
||||||
%¨universal glue¨ the RISC db paper should be asking for.
|
|
||||||
|
|
||||||
%cover P2 (the old one, not "Pier 2" if there is time...
|
|
||||||
|
|
||||||
\section{Write ahead loging}
|
\section{Write ahead loging}
|
||||||
|
|
||||||
|
@ -895,6 +823,24 @@ benchmark.
|
||||||
The effect of \yad object serialization optimizations under low and high memory pressure.}
|
The effect of \yad object serialization optimizations under low and high memory pressure.}
|
||||||
\end{figure*}
|
\end{figure*}
|
||||||
|
|
||||||
|
\subsection{Object persistance mechanisms}
|
||||||
|
\rcs{ This belongs somewhere else: Instead, it leaves decisions regarding abstract data types and
|
||||||
|
algorithm design to system developers or language designers. For
|
||||||
|
instance, while \yad has no concept of object oriented data types, two
|
||||||
|
radically different approaches toward object persistance have been
|
||||||
|
implemented on top of it~\ref{oasys}.}
|
||||||
|
|
||||||
|
\rcs{We could have just as easily written a persistance mechanism for a
|
||||||
|
functional programming language, or a particular application (such as
|
||||||
|
an email server). Our experience building data manipulation routines
|
||||||
|
on top of application-specific primitives was favorable compared to
|
||||||
|
past experiences attempting to restructure entire applications to
|
||||||
|
match pre-existing computational models, such as SQL's declarative
|
||||||
|
interface.}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Numerous schemes are used for object serialization. Support for two
|
Numerous schemes are used for object serialization. Support for two
|
||||||
different styles of object serialization have been eimplemented in
|
different styles of object serialization have been eimplemented in
|
||||||
\yad. The first, pobj, provided transactional updates to objects in
|
\yad. The first, pobj, provided transactional updates to objects in
|
||||||
|
|
Loading…
Reference in a new issue