clarified db toolkit exposition...
This commit is contained in:
parent
d4e8252a6a
commit
a9b5e7bf4c
1 changed files with 105 additions and 90 deletions
|
@ -255,40 +255,55 @@ so good. (Take ideas from old paper.)**
|
||||||
\section{\yad is not a Database}
|
\section{\yad is not a Database}
|
||||||
|
|
||||||
Database research has a long history, including the development of
|
Database research has a long history, including the development of
|
||||||
many technologies that our system builds upon. However, we view \yad
|
many technologies that our system builds upon. This section explains
|
||||||
as a rejection of the fundamental assumptions that underly database
|
why databases are fundamentally inappropriate tools for system
|
||||||
systems. In particular, we reject the idea that a general-purpose
|
developers. The problems we present here have been the focus of
|
||||||
storage sytem should attempt to encode universal data models and
|
database systems and research projects for at least 25 years.
|
||||||
computational paradigms. Although we accept that such a data model
|
|
||||||
for a particular class of applications, we believe that system builders need more
|
|
||||||
control and flexibility.
|
|
||||||
|
|
||||||
Instead, we are less ambitious and seek to build a flexible
|
The section concludes with a discussion of database systems that
|
||||||
transactional storage system that provides durable access to the
|
attempt to address these problems. Although these systems were
|
||||||
primitives provided by the underlying hardware. To be of practical
|
successful in many respects, they failed to address the broad class of
|
||||||
value, it must be easy to specialize such a system so that it encodes
|
software we are interested in.
|
||||||
any of a variety of data models and computational paradigms.
|
|
||||||
Otherwise, the system could not easily reused in many environments.
|
|
||||||
We know of no system that adequately achieves these two goals.
|
|
||||||
|
|
||||||
Here, we present a brief history of transactional storage architectures, and
|
|
||||||
explain why they fail to achieve \yad's goals. Citations of the
|
|
||||||
technical work upon which our system is based are included below, in
|
|
||||||
the description of \yad's design.
|
|
||||||
|
|
||||||
%Here we will focus on lines of research that are
|
\subsection{The database abstraction}
|
||||||
%superficially similar, but distinct from our own, and cite evidence
|
|
||||||
%from within the database community that highlights problems with
|
|
||||||
%systems that attempt to incorporate databases into other systems.
|
|
||||||
|
|
||||||
%Of course, database systems have a place in modern software
|
Database systems are often thought of in terms of the high-level
|
||||||
%development and design, and are the best available storage solution
|
abstractions they present. For instance, Relational database systems
|
||||||
%for many classes of applications. Also, this section refers to work
|
implement the relation model~\cite{cobb}, while object oriented
|
||||||
%that introduces technologies that are crucial to \yad's design; when
|
databases implement object abstractions, XML databases implement
|
||||||
%we claim that prior work is dissimilar to our own, we refer to
|
hierarchical datasets, and so on. Before the relational model,
|
||||||
%high-level architectural considerations, not low-level details.
|
navigational databases implemented a navigational, pointer and record
|
||||||
|
based data model.
|
||||||
|
|
||||||
\subsection{Databases as system components}
|
An early survey of database implementations sought to enumerate the
|
||||||
|
fundamental components used by database system implementors. This
|
||||||
|
survey was performed due to difficulties in extending database systems
|
||||||
|
into new application domains. The survey divided databases into two
|
||||||
|
broad modules: conceptual mappings~\cite{batoryConceptual} and the
|
||||||
|
physical database~\cite{batoryPhysical} model.
|
||||||
|
|
||||||
|
A conceptual mapping may translate a relation into a set of keyed
|
||||||
|
tuples. A physical model may translate a set of tuples into an
|
||||||
|
on-disk B-Tree with support for iterators and range-based query
|
||||||
|
operations.
|
||||||
|
|
||||||
|
It is the responsibility of a database implementor to choose a set of
|
||||||
|
conceptual mappings that implement the desired higher level
|
||||||
|
abstraction (such as the relational model). The physical data model
|
||||||
|
is chosen to efficiently support the set of mappings that are built on
|
||||||
|
top of it.
|
||||||
|
|
||||||
|
{\em The key observation of this paper is that no known physical data model
|
||||||
|
can support more than a small percentage of today's applications.}
|
||||||
|
|
||||||
|
Instead of attempting to create such a model after decades of database
|
||||||
|
research has failed to produce one, we opt to provide a storage model
|
||||||
|
that mimics the primitives provided by modern hardware as closely as
|
||||||
|
possible. This makes it easy for system designers to implement most
|
||||||
|
of the data models that the underlying hardware can support.
|
||||||
|
|
||||||
|
\subsection{Recent survey}
|
||||||
|
|
||||||
A recent survey~\cite{riscDB} enumerates problems that plague users of
|
A recent survey~\cite{riscDB} enumerates problems that plague users of
|
||||||
state-of-the-art database systems.
|
state-of-the-art database systems.
|
||||||
|
@ -329,76 +344,76 @@ implementation tool~\cite{riscDB}.
|
||||||
%was more difficult than implementing from scratch (winfs), scaling
|
%was more difficult than implementing from scratch (winfs), scaling
|
||||||
%down doesn't work (variance in performance, footprint),
|
%down doesn't work (variance in performance, footprint),
|
||||||
|
|
||||||
\subsection{Database Toolkits}
|
%\subsection{Database Toolkits}
|
||||||
|
|
||||||
\yad is a library that could be used to provide the storage primatives needed by a
|
%\yad is a library that could be used to provide the storage primatives needed by a
|
||||||
database server. Therefore, one might suppose that \yad is a database
|
%database server. Therefore, one might suppose that \yad is a database
|
||||||
toolkit. However, such an assumption would be incorrect, as \yad incorporates neither of the two basic concepts that underly database toolkit designs. These two concepts are
|
%toolkit. However, such an assumption would be incorrect, as \yad incorporates neither of the two basic concepts that underly database toolkit designs. These two concepts are
|
||||||
{\em conceptual-to-internal mappings}~\cite{batoryConceptual}
|
%{\em conceptual-to-internal mappings}~\cite{batoryConceptual}
|
||||||
and {\em physical database models}~\cite{batoryPhysical}.
|
%and {\em physical database models}~\cite{batoryPhysical}.
|
||||||
|
%
|
||||||
Conceptual-to-internal mappings and physical database models were
|
%Conceptual-to-internal mappings and physical database models were
|
||||||
discovered during an early survey of database implementations. Mappings
|
%discovered during an early survey of database implementations. Mappings
|
||||||
describe the computational primitives upon which client applications must
|
%describe the computational primitives upon which client applications must
|
||||||
be implemented. Physical database models define the on-disk layout used
|
%be implemented. Physical database models define the on-disk layout used
|
||||||
by a system in terms of data layouts and representations that are commonly
|
%by a system in terms of data layouts and representations that are commonly
|
||||||
used by relational and navigational database implementations.
|
%used by relational and navigational database implementations.
|
||||||
|
%
|
||||||
Both concepts are fundamentally incompatible with a general storage
|
%Both concepts are fundamentally incompatible with a general storage
|
||||||
implementation. By definition, database servers (and toolkits) encode both
|
%implementation. By definition, database servers (and toolkits) encode both
|
||||||
concepts, while transaction processing libraries manage to avoid complex
|
%concepts, while transaction processing libraries manage to avoid complex
|
||||||
conceptual mappings. \yad's novelty stems from the fact that it avoids
|
%conceptual mappings. \yad's novelty stems from the fact that it avoids
|
||||||
both concepts, while making it easy for applications to incorporate results from the database
|
%both concepts, while making it easy for applications to incorporate results from the database
|
||||||
literature.
|
%literature.
|
||||||
|
|
||||||
|
|
||||||
\subsubsection{Conceptual mappings}
|
%\subsubsection{Conceptual mappings}
|
||||||
|
%
|
||||||
At the time of their introduction, ten
|
%At the time of their introduction, ten
|
||||||
conceptual-to-internal mappings were sufficient to describe existing
|
%conceptual-to-internal mappings were sufficient to describe existing
|
||||||
database systems. These mappings included indexing, encoding
|
%database systems. These mappings included indexing, encoding
|
||||||
(compression, encryption, etc), segmentation (along field boundaries),
|
%(compression, encryption, etc), segmentation (along field boundaries),
|
||||||
fragmentation (without regard to fields), $n:m$ pointers, and
|
%fragmentation (without regard to fields), $n:m$ pointers, and
|
||||||
horizontal partitioning, among others.
|
%horizontal partitioning, among others.
|
||||||
|
%
|
||||||
The initial survey postulates that a finite number of such mappings
|
%The initial survey postulates that a finite number of such mappings
|
||||||
are adequate to describe database systems. A
|
%are adequate to describe database systems. A
|
||||||
database toolkit need only implement each type of mapping in order to
|
%database toolkit need only implement each type of mapping in order to
|
||||||
encode the set of all conceivable database systems.
|
%encode the set of all conceivable database systems.
|
||||||
|
%
|
||||||
Our work's primary concern is to support systems beyond database
|
%Our work's primary concern is to support systems beyond database
|
||||||
implementations. Therefore, our system must support a more general
|
%implementations. Therefore, our system must support a more general
|
||||||
set of primitives than existing systems. Defining a universal (but
|
%set of primitives than existing systems. Defining a universal (but
|
||||||
practical) framework that encompasses such a broad class of
|
%practical) framework that encompasses such a broad class of
|
||||||
computation is clearly unrealistic.
|
%computation is clearly unrealistic.
|
||||||
|
%
|
||||||
Therefore, \yad's architecture avoids hard-coded assumptions regarding
|
%Therefore, \yad's architecture avoids hard-coded assumptions regarding
|
||||||
the computation or abstract data types of the applications built on
|
%the computation or abstract data types of the applications built on
|
||||||
top of it.
|
%top of it.
|
||||||
|
%
|
||||||
Instead, it leaves decisions regarding abstract data types and
|
\rcs{ This belongs somewhere else: Instead, it leaves decisions regarding abstract data types and
|
||||||
algorithm design to system developers or language designers. For
|
algorithm design to system developers or language designers. For
|
||||||
instance, while \yad has no concept of object oriented data types, two
|
instance, while \yad has no concept of object oriented data types, two
|
||||||
radically different approaches toward object persistance have been
|
radically different approaches toward object persistance have been
|
||||||
implemented on top of it~\ref{oasys}.
|
implemented on top of it~\ref{oasys}.}
|
||||||
|
|
||||||
We could have just as easily written a persistance mechanism for a
|
\rcs{We could have just as easily written a persistance mechanism for a
|
||||||
functional programming language, or a particular application (such as
|
functional programming language, or a particular application (such as
|
||||||
an email server). Our experience building data manipulation routines
|
an email server). Our experience building data manipulation routines
|
||||||
on top of application-specific primitives was favorable compared to
|
on top of application-specific primitives was favorable compared to
|
||||||
past experiences attempting to restructure entire applications to
|
past experiences attempting to restructure entire applications to
|
||||||
match pre-existing computational models, such as SQL's declarative
|
match pre-existing computational models, such as SQL's declarative
|
||||||
interface.
|
interface.}
|
||||||
|
|
||||||
\subsubsection{Physical data models}
|
%\subsubsection{Physical data models}
|
||||||
|
%
|
||||||
|
%As it was initially tempting to say that \yad was a database toolkit,
|
||||||
|
%it may now be tempting to claim that \yad implements a physical
|
||||||
|
%database model. In this section, we discuss fundamental limitations
|
||||||
|
%of the physical data model, and explain how \yad avoids these
|
||||||
|
%limitations.
|
||||||
|
|
||||||
As it was initially tempting to say that \yad was a database toolkit,
|
We discuss Berkeley DB, and show that it provides funcationality
|
||||||
it may now be tempting to claim that \yad implements a physical
|
|
||||||
database model. In this section, we discuss fundamental limitations
|
|
||||||
of the physical data model, and explain how \yad avoids these
|
|
||||||
limitations.
|
|
||||||
|
|
||||||
\rcs{this should be later...} We discuss Berkeley DB, and show that it provides funcationality
|
|
||||||
similar to a physical database model. Just as \yad allows
|
similar to a physical database model. Just as \yad allows
|
||||||
applications to build mappings on top of the primitives it provides,
|
applications to build mappings on top of the primitives it provides,
|
||||||
\yad's design allows them to take design storage in terms of a
|
\yad's design allows them to take design storage in terms of a
|
||||||
|
@ -411,11 +426,11 @@ early database implementation model. It built upon the idea of
|
||||||
conceptual mappings described above, and the physical database model
|
conceptual mappings described above, and the physical database model
|
||||||
decribed here.
|
decribed here.
|
||||||
|
|
||||||
The physical database model provides the abstraction upon which
|
%The physical database model provides the abstraction upon which
|
||||||
conceptual mappings can be built. It is based on a partitioning of storage into
|
%conceptual mappings can be built. It is based on a partitioning of storage into
|
||||||
{\em simple files}, which provide operations associated with key based storage, and
|
%{\em simple files}, which provide operations associated with key based storage, and
|
||||||
{\em linksets}, which make use of various pointer storage schemes to provide
|
%{\em linksets}, which make use of various pointer storage schemes to provide
|
||||||
mappings between records in simple files~\cite{batoryPhysical}.
|
%mappings between records in simple files~\cite{batoryPhysical}.
|
||||||
|
|
||||||
Subsequent database toolkit work builds upon these foundations,
|
Subsequent database toolkit work builds upon these foundations,
|
||||||
Exodus~\cite{exodus} and Starburst~\cite{starburst} are notable
|
Exodus~\cite{exodus} and Starburst~\cite{starburst} are notable
|
||||||
|
|
Loading…
Reference in a new issue