*** empty log message ***

2006-04-22 22:14:00 +00:00 · 2006-04-22 22:14:00 +00:00 · 39bf19166e
commit 39bf19166e
parent eee21ad6fd
1 changed files with 137 additions and 47 deletions
--- a/doc/paper3/LLADD.tex
+++ b/doc/paper3/LLADD.tex
@ -214,32 +214,57 @@ so good.  (Take ideas from old paper.)**
 Database research has a long history, including the development of
 many technologies that our system builds upon.  However, we view \yad
 as a rejection of the fundamental assumptions that underly database
-systems.  Here we will focus on lines of research that are
+systems.  In particular, we reject the idea that a general purpose
-superficially similar, but distinct from our own, and cite evidence
+storage sytem should attempt to encode universal data models and
-from within the database community that highlights problems with
+computational paradigms.  
 systems that attempt to incorporate databases into other systems.
-Of course, database systems have a place in modern software
+Instead, we are less ambitious and seek to build a storage system that
-development and design, and are the best available storage solution
+provides durable (which often implies transactional) access to the
-for many classes of applications.  Also, this section refers to work
+primitives provided by the underlying hardware.  To be of practical
-that introduces technologies that are crucial to \yad's design; when
+value, it must be easy to specialize such a system so that it encodes
-we claim that prior work is dissimilar to our own, we refer to
+any of a variety of data models and computational paradigms.
-high-level architectural considerations, not low-level details.
+Otherwise, the system could not easily reused in many environments.
 We know of no system that adequately achieves these two goals.
 Here, we present a brief history of transactional storage systems, and
 explain why they fail to achieve \yad's goals.  Citations of the
 technical work upon which our system is based are included below, in
 the description of \yad's design.
 %Here we will focus on lines of research that are
 %superficially similar, but distinct from our own, and cite evidence
 %from within the database community that highlights problems with
 %systems that attempt to incorporate databases into other systems.
 %Of course, database systems have a place in modern software
 %development and design, and are the best available storage solution
 %for many classes of applications.  Also, this section refers to work
 %that introduces technologies that are crucial to \yad's design; when
 %we claim that prior work is dissimilar to our own, we refer to
 %high-level architectural considerations, not low-level details.
 \subsection{Databases  as system components}
-
+A recent survey~\cite{riscDB} enumerates problems that plague users of
-A recent survey enumerates problems that plague users of
+state-of-the-art database systems.  It concludes that efficiently optimizing and
 state-of-the-art database systems.  Efficiently optimizing and
 consistenly servicing large declarative queries is inherently
-difficult.  This leads to managability and tuning issues that
+difficult.  
-prevent databases from effectively servicing diverse, interactive
+
-workloads.  While SQL serves some classes of applications well, it is
+The survey finds that database implementations fail to scale to modern systems.  
 This leads to managability and tuning issues that
 prevent databases from effectively servicing large scale, diverse, interactive
 workloads.  
 They are also a poor fit for
 smaller devices, where footprint, predictable performance, and power
 consumption are primary concerns.  
 Scaling out to large numbers of self-administering desktop 
 installations will be difficult until a number of open research problems are solved.  
 The survey provides evidence that SQL itself is problematic.  
 While SQL serves some classes of applications well, it is
 often inadequate for algorithmic and hierarchical computing tasks.
-The survey finds that database implementations are also a poor fit for
+Finally, complete, modern database
 smaller devices, where footprint, predictable performance, and power
 consumption are primary concerns.  Finally, complete, modern database
 implementations are often incomprehensible, and border on
 irreproducable, hindering further research.  After making these
 points, the study concludes by suggesting the adoption of ``RISC''
@ -261,40 +286,105 @@ implementation tool~\cite{riscDB}.
 %was more difficult than implementing from scratch (winfs), scaling
 %down doesn't work (variance in performance, footprint),
-\subsection{Database toolkits}
+\subsection{Database Toolkits}
-Database toolkits are based upon the idea that database
+\yad is a library that could be used to provide storage primatives to a
-implementations can be broken into smaller components with
+database server.  Therefore, one might suppose that \yad is a database
-standardized interfaces.  Early work in this field surveyed database
+toolkit.  However, such an assumption would be incorrect.  Here we
-implementations that existed at the time.  It casts compoenents of
+describe the two characteristics that are the essence of database
-these implementation in terms of a physical database
+toolkits: {\em conceptual-to-internal mappings}~\cite{batoryConceptual}
-model~\cite{batoryPhysical} and conceptual-to-internal
+and {\em physical database models}~\cite{batoryPhysical}.
 mappings~\cite{batoryConceptual}.  These abstractions describe
 relational database systems, and describe many aspects of subsequent
 database toolkit research.
-However, these abstractions are built upon assumptions about
+Conceptual-to-internal mappings and physical database models were
-application structure and data layout.  At the time of the survey, ten
+discovered by an early survey of database implementations.  Mappings
 are essentially a model of computation, while physical database models
 are essentially a model of data layout and representation.
 Both concepts are fundamentally incompatible with a general storage
 implementation.  By definition, a database server encodes both
 concepts, while transaction processing libraries mange to avoid
 conceptual mappings. \yad's novelty stems from the fact that it avoids
 both concepts, while incorporating results from the database
 literature.
 \subsubsection{Conceptual mappings}
 %Database toolkits are based upon the idea that database
 %implementations can be broken into smaller components with
 %standardized interfaces.  
 %Early work in this field surveyed database
 %implementations that existed at the time.  It casts compoenents of
 %these implementation in terms of a physical database
 %model~\cite{batoryPhysical} and conceptual-to-internal
 %mappings~\cite{batoryConceptual}.  These abstractions describe
 %relational database systems, and describe many aspects of subsequent
 %database toolkit research.
 %However, these abstractions are built upon assumptions about
 %application structure and data layout.  
 At the time of their introduction, ten
 conceptual-to-internal mappings were sufficient to describe existing
-implementation.  These mappings included:
+database systems.  These mappings include indexing, encoding
 (compression, encryption, etc), segmentation (along field boundaries),
 fragmentation (without regard to fields), $n:m$ pointers, and
 horizontal partitioning, among others.
 The initial survey postulates that a finite number of such mappings
 are adequate to describe database implementations.  A general purpose
 database toolkit need only implement each type of mapping in order to
 encode the set of all conceivable database systems.
-\begin{itemize}
+To meet out requirements with this approach, one would first develop a
-\item indexing
+framework that adequately encodes the requirements of {\em every}
-\item encoding (compression, encryption, etc)
+system that manipulates data, and would then define interfaces that
-\item transposition
+support the needs of each implementation of the components specified
-\item segmentation (along field boundaries)
+by the framework.  
 \item fragmentation (without regard to field boundaries)
 \item pointers with support for $n:m$ relationships
 \item horizonatal partitioning
 \end{itemize}
-Many data manipulation tasks can be cast as mappings from abstract to
+Put this way, this goal seems absurd.  However, this approach has
-more concrete representation, and even cleanly partitioned into more
+been extremeley successful.   In fact, much of the
-general sets of mappings.  In fact, Genesis,~\cite{genesis} an early
+database literature is devoted to this task and has
-database toolkit was built in terms of interchangable primitives that
+certainly improved the state of computer science.  Furthermore, it is the basis for
-implemented interfaces that correspond to these interafaces.
+the highly successful database industry.  
-Similarly, the physical database model partitions storage into simple
+However, from a practical perspective, current database 
 implementations are already among the most complex
 software systems ever created, are difficult to understand or 
 reason about, They still only encode a small percentage of
 the computational and storage primitives in the database 
 literature, which in turn only represents a portion of 
 the computer science literature.
 %\begin{itemize}
 %\item indexing
 %\item encoding (compression, encryption, etc)
 %\item transposition
 %\item segmentation (along field boundaries)
 %\item fragmentation (without regard to field boundaries)
 %\item pointers with support for $n:m$ relationships
 %\item horizonatal partitioning
 %\end{itemize}
 \subsubsection{Physical data models}
 As it was initially tempting to say that \yad was a database toolkit,
 it may now be tempting to claim that \yad implements a physical
 database model.  In this section, we compare \yad to the physical
 database model of existing toolkits, and show that it supports a wider
 range of storage technologies than physical database models.  In fact,
 it has no concept of a physical database model, and intentionally
 allows applications to avoid such concepts as well.
 Genesis,~\cite{genesis} an early database toolkit, was built in terms
 of interchangable primitives that implemented the interfaces of an
 early database implementation model.  It built upon the idea of
 conceptual mappings described above, and the physical databse model
 decribed here.
 The physical database model partitions storage into simple
 files, which provide operations associated with key based storage, and
 linksets, which make use of various pointer storage schemes to provide
 mappings between records in simple files.