*** empty log message ***

This commit is contained in:
Sears Russell 2006-04-22 22:14:00 +00:00
parent eee21ad6fd
commit 39bf19166e

View file

@ -214,32 +214,57 @@ so good. (Take ideas from old paper.)**
Database research has a long history, including the development of Database research has a long history, including the development of
many technologies that our system builds upon. However, we view \yad many technologies that our system builds upon. However, we view \yad
as a rejection of the fundamental assumptions that underly database as a rejection of the fundamental assumptions that underly database
systems. Here we will focus on lines of research that are systems. In particular, we reject the idea that a general purpose
superficially similar, but distinct from our own, and cite evidence storage sytem should attempt to encode universal data models and
from within the database community that highlights problems with computational paradigms.
systems that attempt to incorporate databases into other systems.
Of course, database systems have a place in modern software Instead, we are less ambitious and seek to build a storage system that
development and design, and are the best available storage solution provides durable (which often implies transactional) access to the
for many classes of applications. Also, this section refers to work primitives provided by the underlying hardware. To be of practical
that introduces technologies that are crucial to \yad's design; when value, it must be easy to specialize such a system so that it encodes
we claim that prior work is dissimilar to our own, we refer to any of a variety of data models and computational paradigms.
high-level architectural considerations, not low-level details. Otherwise, the system could not easily reused in many environments.
We know of no system that adequately achieves these two goals.
Here, we present a brief history of transactional storage systems, and
explain why they fail to achieve \yad's goals. Citations of the
technical work upon which our system is based are included below, in
the description of \yad's design.
%Here we will focus on lines of research that are
%superficially similar, but distinct from our own, and cite evidence
%from within the database community that highlights problems with
%systems that attempt to incorporate databases into other systems.
%Of course, database systems have a place in modern software
%development and design, and are the best available storage solution
%for many classes of applications. Also, this section refers to work
%that introduces technologies that are crucial to \yad's design; when
%we claim that prior work is dissimilar to our own, we refer to
%high-level architectural considerations, not low-level details.
\subsection{Databases as system components} \subsection{Databases as system components}
A recent survey~\cite{riscDB} enumerates problems that plague users of
A recent survey enumerates problems that plague users of state-of-the-art database systems. It concludes that efficiently optimizing and
state-of-the-art database systems. Efficiently optimizing and
consistenly servicing large declarative queries is inherently consistenly servicing large declarative queries is inherently
difficult. This leads to managability and tuning issues that difficult.
prevent databases from effectively servicing diverse, interactive
workloads. While SQL serves some classes of applications well, it is The survey finds that database implementations fail to scale to modern systems.
This leads to managability and tuning issues that
prevent databases from effectively servicing large scale, diverse, interactive
workloads.
They are also a poor fit for
smaller devices, where footprint, predictable performance, and power
consumption are primary concerns.
Scaling out to large numbers of self-administering desktop
installations will be difficult until a number of open research problems are solved.
The survey provides evidence that SQL itself is problematic.
While SQL serves some classes of applications well, it is
often inadequate for algorithmic and hierarchical computing tasks. often inadequate for algorithmic and hierarchical computing tasks.
The survey finds that database implementations are also a poor fit for Finally, complete, modern database
smaller devices, where footprint, predictable performance, and power
consumption are primary concerns. Finally, complete, modern database
implementations are often incomprehensible, and border on implementations are often incomprehensible, and border on
irreproducable, hindering further research. After making these irreproducable, hindering further research. After making these
points, the study concludes by suggesting the adoption of ``RISC'' points, the study concludes by suggesting the adoption of ``RISC''
@ -261,40 +286,105 @@ implementation tool~\cite{riscDB}.
%was more difficult than implementing from scratch (winfs), scaling %was more difficult than implementing from scratch (winfs), scaling
%down doesn't work (variance in performance, footprint), %down doesn't work (variance in performance, footprint),
\subsection{Database toolkits} \subsection{Database Toolkits}
Database toolkits are based upon the idea that database \yad is a library that could be used to provide storage primatives to a
implementations can be broken into smaller components with database server. Therefore, one might suppose that \yad is a database
standardized interfaces. Early work in this field surveyed database toolkit. However, such an assumption would be incorrect. Here we
implementations that existed at the time. It casts compoenents of describe the two characteristics that are the essence of database
these implementation in terms of a physical database toolkits: {\em conceptual-to-internal mappings}~\cite{batoryConceptual}
model~\cite{batoryPhysical} and conceptual-to-internal and {\em physical database models}~\cite{batoryPhysical}.
mappings~\cite{batoryConceptual}. These abstractions describe
relational database systems, and describe many aspects of subsequent
database toolkit research.
However, these abstractions are built upon assumptions about Conceptual-to-internal mappings and physical database models were
application structure and data layout. At the time of the survey, ten discovered by an early survey of database implementations. Mappings
are essentially a model of computation, while physical database models
are essentially a model of data layout and representation.
Both concepts are fundamentally incompatible with a general storage
implementation. By definition, a database server encodes both
concepts, while transaction processing libraries mange to avoid
conceptual mappings. \yad's novelty stems from the fact that it avoids
both concepts, while incorporating results from the database
literature.
\subsubsection{Conceptual mappings}
%Database toolkits are based upon the idea that database
%implementations can be broken into smaller components with
%standardized interfaces.
%Early work in this field surveyed database
%implementations that existed at the time. It casts compoenents of
%these implementation in terms of a physical database
%model~\cite{batoryPhysical} and conceptual-to-internal
%mappings~\cite{batoryConceptual}. These abstractions describe
%relational database systems, and describe many aspects of subsequent
%database toolkit research.
%However, these abstractions are built upon assumptions about
%application structure and data layout.
At the time of their introduction, ten
conceptual-to-internal mappings were sufficient to describe existing conceptual-to-internal mappings were sufficient to describe existing
implementation. These mappings included: database systems. These mappings include indexing, encoding
(compression, encryption, etc), segmentation (along field boundaries),
fragmentation (without regard to fields), $n:m$ pointers, and
horizontal partitioning, among others.
The initial survey postulates that a finite number of such mappings
are adequate to describe database implementations. A general purpose
database toolkit need only implement each type of mapping in order to
encode the set of all conceivable database systems.
\begin{itemize} To meet out requirements with this approach, one would first develop a
\item indexing framework that adequately encodes the requirements of {\em every}
\item encoding (compression, encryption, etc) system that manipulates data, and would then define interfaces that
\item transposition support the needs of each implementation of the components specified
\item segmentation (along field boundaries) by the framework.
\item fragmentation (without regard to field boundaries)
\item pointers with support for $n:m$ relationships
\item horizonatal partitioning
\end{itemize}
Many data manipulation tasks can be cast as mappings from abstract to Put this way, this goal seems absurd. However, this approach has
more concrete representation, and even cleanly partitioned into more been extremeley successful. In fact, much of the
general sets of mappings. In fact, Genesis,~\cite{genesis} an early database literature is devoted to this task and has
database toolkit was built in terms of interchangable primitives that certainly improved the state of computer science. Furthermore, it is the basis for
implemented interfaces that correspond to these interafaces. the highly successful database industry.
Similarly, the physical database model partitions storage into simple However, from a practical perspective, current database
implementations are already among the most complex
software systems ever created, are difficult to understand or
reason about, They still only encode a small percentage of
the computational and storage primitives in the database
literature, which in turn only represents a portion of
the computer science literature.
%\begin{itemize}
%\item indexing
%\item encoding (compression, encryption, etc)
%\item transposition
%\item segmentation (along field boundaries)
%\item fragmentation (without regard to field boundaries)
%\item pointers with support for $n:m$ relationships
%\item horizonatal partitioning
%\end{itemize}
\subsubsection{Physical data models}
As it was initially tempting to say that \yad was a database toolkit,
it may now be tempting to claim that \yad implements a physical
database model. In this section, we compare \yad to the physical
database model of existing toolkits, and show that it supports a wider
range of storage technologies than physical database models. In fact,
it has no concept of a physical database model, and intentionally
allows applications to avoid such concepts as well.
Genesis,~\cite{genesis} an early database toolkit, was built in terms
of interchangable primitives that implemented the interfaces of an
early database implementation model. It built upon the idea of
conceptual mappings described above, and the physical databse model
decribed here.
The physical database model partitions storage into simple
files, which provide operations associated with key based storage, and files, which provide operations associated with key based storage, and
linksets, which make use of various pointer storage schemes to provide linksets, which make use of various pointer storage schemes to provide
mappings between records in simple files. mappings between records in simple files.