clarified db toolkit exposition...

This commit is contained in:
Sears Russell 2006-04-23 21:23:51 +00:00
parent d4e8252a6a
commit a9b5e7bf4c

View file

@ -255,40 +255,55 @@ so good. (Take ideas from old paper.)**
\section{\yad is not a Database} \section{\yad is not a Database}
Database research has a long history, including the development of Database research has a long history, including the development of
many technologies that our system builds upon. However, we view \yad many technologies that our system builds upon. This section explains
as a rejection of the fundamental assumptions that underly database why databases are fundamentally inappropriate tools for system
systems. In particular, we reject the idea that a general-purpose developers. The problems we present here have been the focus of
storage sytem should attempt to encode universal data models and database systems and research projects for at least 25 years.
computational paradigms. Although we accept that such a data model
for a particular class of applications, we believe that system builders need more
control and flexibility.
Instead, we are less ambitious and seek to build a flexible The section concludes with a discussion of database systems that
transactional storage system that provides durable access to the attempt to address these problems. Although these systems were
primitives provided by the underlying hardware. To be of practical successful in many respects, they failed to address the broad class of
value, it must be easy to specialize such a system so that it encodes software we are interested in.
any of a variety of data models and computational paradigms.
Otherwise, the system could not easily reused in many environments.
We know of no system that adequately achieves these two goals.
Here, we present a brief history of transactional storage architectures, and
explain why they fail to achieve \yad's goals. Citations of the
technical work upon which our system is based are included below, in
the description of \yad's design.
%Here we will focus on lines of research that are \subsection{The database abstraction}
%superficially similar, but distinct from our own, and cite evidence
%from within the database community that highlights problems with
%systems that attempt to incorporate databases into other systems.
%Of course, database systems have a place in modern software Database systems are often thought of in terms of the high-level
%development and design, and are the best available storage solution abstractions they present. For instance, Relational database systems
%for many classes of applications. Also, this section refers to work implement the relation model~\cite{cobb}, while object oriented
%that introduces technologies that are crucial to \yad's design; when databases implement object abstractions, XML databases implement
%we claim that prior work is dissimilar to our own, we refer to hierarchical datasets, and so on. Before the relational model,
%high-level architectural considerations, not low-level details. navigational databases implemented a navigational, pointer and record
based data model.
\subsection{Databases as system components} An early survey of database implementations sought to enumerate the
fundamental components used by database system implementors. This
survey was performed due to difficulties in extending database systems
into new application domains. The survey divided databases into two
broad modules: conceptual mappings~\cite{batoryConceptual} and the
physical database~\cite{batoryPhysical} model.
A conceptual mapping may translate a relation into a set of keyed
tuples. A physical model may translate a set of tuples into an
on-disk B-Tree with support for iterators and range-based query
operations.
It is the responsibility of a database implementor to choose a set of
conceptual mappings that implement the desired higher level
abstraction (such as the relational model). The physical data model
is chosen to efficiently support the set of mappings that are built on
top of it.
{\em The key observation of this paper is that no known physical data model
can support more than a small percentage of today's applications.}
Instead of attempting to create such a model after decades of database
research has failed to produce one, we opt to provide a storage model
that mimics the primitives provided by modern hardware as closely as
possible. This makes it easy for system designers to implement most
of the data models that the underlying hardware can support.
\subsection{Recent survey}
A recent survey~\cite{riscDB} enumerates problems that plague users of A recent survey~\cite{riscDB} enumerates problems that plague users of
state-of-the-art database systems. state-of-the-art database systems.
@ -329,76 +344,76 @@ implementation tool~\cite{riscDB}.
%was more difficult than implementing from scratch (winfs), scaling %was more difficult than implementing from scratch (winfs), scaling
%down doesn't work (variance in performance, footprint), %down doesn't work (variance in performance, footprint),
\subsection{Database Toolkits} %\subsection{Database Toolkits}
\yad is a library that could be used to provide the storage primatives needed by a %\yad is a library that could be used to provide the storage primatives needed by a
database server. Therefore, one might suppose that \yad is a database %database server. Therefore, one might suppose that \yad is a database
toolkit. However, such an assumption would be incorrect, as \yad incorporates neither of the two basic concepts that underly database toolkit designs. These two concepts are %toolkit. However, such an assumption would be incorrect, as \yad incorporates neither of the two basic concepts that underly database toolkit designs. These two concepts are
{\em conceptual-to-internal mappings}~\cite{batoryConceptual} %{\em conceptual-to-internal mappings}~\cite{batoryConceptual}
and {\em physical database models}~\cite{batoryPhysical}. %and {\em physical database models}~\cite{batoryPhysical}.
%
Conceptual-to-internal mappings and physical database models were %Conceptual-to-internal mappings and physical database models were
discovered during an early survey of database implementations. Mappings %discovered during an early survey of database implementations. Mappings
describe the computational primitives upon which client applications must %describe the computational primitives upon which client applications must
be implemented. Physical database models define the on-disk layout used %be implemented. Physical database models define the on-disk layout used
by a system in terms of data layouts and representations that are commonly %by a system in terms of data layouts and representations that are commonly
used by relational and navigational database implementations. %used by relational and navigational database implementations.
%
Both concepts are fundamentally incompatible with a general storage %Both concepts are fundamentally incompatible with a general storage
implementation. By definition, database servers (and toolkits) encode both %implementation. By definition, database servers (and toolkits) encode both
concepts, while transaction processing libraries manage to avoid complex %concepts, while transaction processing libraries manage to avoid complex
conceptual mappings. \yad's novelty stems from the fact that it avoids %conceptual mappings. \yad's novelty stems from the fact that it avoids
both concepts, while making it easy for applications to incorporate results from the database %both concepts, while making it easy for applications to incorporate results from the database
literature. %literature.
\subsubsection{Conceptual mappings} %\subsubsection{Conceptual mappings}
%
At the time of their introduction, ten %At the time of their introduction, ten
conceptual-to-internal mappings were sufficient to describe existing %conceptual-to-internal mappings were sufficient to describe existing
database systems. These mappings included indexing, encoding %database systems. These mappings included indexing, encoding
(compression, encryption, etc), segmentation (along field boundaries), %(compression, encryption, etc), segmentation (along field boundaries),
fragmentation (without regard to fields), $n:m$ pointers, and %fragmentation (without regard to fields), $n:m$ pointers, and
horizontal partitioning, among others. %horizontal partitioning, among others.
%
The initial survey postulates that a finite number of such mappings %The initial survey postulates that a finite number of such mappings
are adequate to describe database systems. A %are adequate to describe database systems. A
database toolkit need only implement each type of mapping in order to %database toolkit need only implement each type of mapping in order to
encode the set of all conceivable database systems. %encode the set of all conceivable database systems.
%
Our work's primary concern is to support systems beyond database %Our work's primary concern is to support systems beyond database
implementations. Therefore, our system must support a more general %implementations. Therefore, our system must support a more general
set of primitives than existing systems. Defining a universal (but %set of primitives than existing systems. Defining a universal (but
practical) framework that encompasses such a broad class of %practical) framework that encompasses such a broad class of
computation is clearly unrealistic. %computation is clearly unrealistic.
%
Therefore, \yad's architecture avoids hard-coded assumptions regarding %Therefore, \yad's architecture avoids hard-coded assumptions regarding
the computation or abstract data types of the applications built on %the computation or abstract data types of the applications built on
top of it. %top of it.
%
Instead, it leaves decisions regarding abstract data types and \rcs{ This belongs somewhere else: Instead, it leaves decisions regarding abstract data types and
algorithm design to system developers or language designers. For algorithm design to system developers or language designers. For
instance, while \yad has no concept of object oriented data types, two instance, while \yad has no concept of object oriented data types, two
radically different approaches toward object persistance have been radically different approaches toward object persistance have been
implemented on top of it~\ref{oasys}. implemented on top of it~\ref{oasys}.}
We could have just as easily written a persistance mechanism for a \rcs{We could have just as easily written a persistance mechanism for a
functional programming language, or a particular application (such as functional programming language, or a particular application (such as
an email server). Our experience building data manipulation routines an email server). Our experience building data manipulation routines
on top of application-specific primitives was favorable compared to on top of application-specific primitives was favorable compared to
past experiences attempting to restructure entire applications to past experiences attempting to restructure entire applications to
match pre-existing computational models, such as SQL's declarative match pre-existing computational models, such as SQL's declarative
interface. interface.}
\subsubsection{Physical data models} %\subsubsection{Physical data models}
%
%As it was initially tempting to say that \yad was a database toolkit,
%it may now be tempting to claim that \yad implements a physical
%database model. In this section, we discuss fundamental limitations
%of the physical data model, and explain how \yad avoids these
%limitations.
As it was initially tempting to say that \yad was a database toolkit, We discuss Berkeley DB, and show that it provides funcationality
it may now be tempting to claim that \yad implements a physical
database model. In this section, we discuss fundamental limitations
of the physical data model, and explain how \yad avoids these
limitations.
\rcs{this should be later...} We discuss Berkeley DB, and show that it provides funcationality
similar to a physical database model. Just as \yad allows similar to a physical database model. Just as \yad allows
applications to build mappings on top of the primitives it provides, applications to build mappings on top of the primitives it provides,
\yad's design allows them to take design storage in terms of a \yad's design allows them to take design storage in terms of a
@ -411,11 +426,11 @@ early database implementation model. It built upon the idea of
conceptual mappings described above, and the physical database model conceptual mappings described above, and the physical database model
decribed here. decribed here.
The physical database model provides the abstraction upon which %The physical database model provides the abstraction upon which
conceptual mappings can be built. It is based on a partitioning of storage into %conceptual mappings can be built. It is based on a partitioning of storage into
{\em simple files}, which provide operations associated with key based storage, and %{\em simple files}, which provide operations associated with key based storage, and
{\em linksets}, which make use of various pointer storage schemes to provide %{\em linksets}, which make use of various pointer storage schemes to provide
mappings between records in simple files~\cite{batoryPhysical}. %mappings between records in simple files~\cite{batoryPhysical}.
Subsequent database toolkit work builds upon these foundations, Subsequent database toolkit work builds upon these foundations,
Exodus~\cite{exodus} and Starburst~\cite{starburst} are notable Exodus~\cite{exodus} and Starburst~\cite{starburst} are notable