sec1-2

2006-04-24 01:00:50 +00:00 · 2006-04-24 01:00:50 +00:00 · f7122c9f62
commit f7122c9f62
parent ca229e9d83
1 changed files with 71 additions and 72 deletions
--- a/doc/paper3/LLADD.tex
+++ b/doc/paper3/LLADD.tex
@ -25,7 +25,7 @@
 %  TARDIS: Atomic, Recoverable, Datamodel Independent Storage
 % EAB: flex, basis, stable, dura

-\newcommand{\yad}{Lemon\xspace}
+\newcommand{\yad}{Stasys\xspace}
 \newcommand{\oasys}{Oasys\xspace}

 \newcommand{\eab}[1]{\textcolor{red}{\bf EAB: #1}}
@ -59,9 +59,9 @@ UC Berkeley
 %\thispagestyle{empty}


-\subsection*{Abstract}
+%\subsection*{Abstract}

-The is an increasing need to manage data well in a wide variety of
+{\em There is an increasing need to manage data well in a wide variety of
 systems, including robust support for atomic durable concurrent
 transactions.  Databases provide the default solution, but force
 applications to interact via SQL and to forfeit control over data
@ -69,7 +69,7 @@ layout and access mechanisms.  We argue there is a gap between DBMSs and file sy

 \yad is a storage framework that incorporates ideas from traditional
 write-ahead-logging storage algorithms and file systems,
-while providing applications with flexible control over data structure, layout and performance vs. robustness tradeoffs.
+while providing applications with flexible control over data structures, layout, and performance vs. robustness tradeoffs.
 % increased control over their
 %underlying modules.  Generic transactional storage systems such as SQL
 %and BerkeleyDB serve many applications well, but impose constraints
@ -90,9 +90,13 @@ improved performance to applications.

 We present examples that make use of custom access methods,
 modifed buffer manager semantics, direct log file manipulation, and
-LSN-free pages that facilitate zero-copy optimizations, and discusses
+LSN-free pages that facilitate zero-copy optimizations, and discuss
 the composability of these extensions.

+\eab{performance}
+
+}
+
 %We argue that our ability to support such a diverse range of
 %transactional systems stems directly from our rejection of
 %assumptions made by early database designers.  These assumptions
@ -113,13 +117,14 @@ the composability of these extensions.
 %existing systems.


+
 \section{Introduction}

 As our reliance on computing infrastructure has increased, the need
 for robust data management has increased greatly, as has the range of
 applications and systems that need it.  Traditionally, data management
-has been the province of database management systems, which although
-well-suited to enterprise applications, leads to poor support for a
+has been the province of database management systems (DBMSs), which although
+well-suited to enterprise applications, lead to poor support for a
 wide-range systems including grid and scientific computing,
 bioinformatics, search engines, version control, and workflow
 applications.  These applications need transactions but don't fit well
@ -132,13 +137,15 @@ A typical example of this mismatch is in the support for
 persistent objects in Java, called {\em Enterprise Java Beans}
 (EJB). In a typical usage, an array of objects is made persistent by
 mapping each object to a row in a table (or sometimes multiple
-tables~\cite[xxx]) and then issuing queries to keep the objects and
+tables~\cite{xxx}) and then issuing queries to keep the objects and
 rows consistent. A typical update must confirm it has the current
 version, modify the object, write out a serialized version using the
 SQL update command and commit. This is an awkward and slow mechanism;
 we show up to a 5x speedup over a MySQL implementation that is
 optimized for single-threaded, local access (Section XXX).

+Add bioinformatics = Perl + files example?
+
 \eat{
 Examples of real world systems that currently fall into this category
 are web search engines, document repositories, large-scale web-email
@ -146,7 +153,6 @@ services, map and trip planning services, ticket reservation systems,
 photo and video repositories, bioinformatics, version control systems,
 workflow applications, CAD/VLSI applications and directory services.

-
 In short, we believe that a fundamental architectural shift in
 transactional storage is necessary before general purpose storage
 systems are of practical use to modern applications.
@ -178,15 +184,11 @@ This paper presents \yad, a library that provides transactional
 storage at a level of abstraction as close to the hardware as
 possible.  The library can support special purpose, transactional
 storage interfaces as well as ACID database-style interfaces to
-abstract data models.  
-
-Notably, \yad incorporates many existing technologies from the storage
-communities, and allows applications to incorporate appropriate
-subsystems as necessary.  A partial open-source implementation of the
-ideas presented below is available; performance numbers are provided
-when possible.
-
-Taken from sosp:
+abstract data models.  \yad incororates techniques from the databases
+(e.g. write-ahead logging) and systems (e.g. zero-copy techniques).
+Our goal is to combine the flexibility and layering of low-level
+abstractions typical for systems work, with the complete semantics
+that exemplify the database field.

 By {\em flexible} we mean that \yad{}  can implement a wide
 range of transactional data structures, that it can support a variety
@ -206,18 +208,17 @@ to meet and form the {\em raison d'\^etre} for \yad{}: the framework
 delivers these properties as reusable building blocks for systems
 to implement complete transactions.

---
+Through examples, and their good performance, we show how \yad{}
+support a wide range of uses that in the database gap, including
+persistent objects (roadmap?), graph or XML apps, and recoverable
+virtual memory~\cite{lrvm}.  An (early) open-source implementation of
+the ideas presented below is available.

-\eab{need to talk about positive examples: LRVM, Berk DB, windows registry? Grid FS from Wisconsin}
+\eab{others?  CVS, windows registry, berk DB, Grid FS?}
+
+roadmap?


-Applications that have only recently begun to make use of high-level
-database features include XML based systems, object persistance
-mechanisms, and enterprise management systems (notably, SAP R/3).
-
-
-**We've explained why the sky is falling.  Now, explain why \yad is
-so good.  (Take ideas from old paper.)**

 \section{\yad is not a Database}

@ -229,8 +230,8 @@ database systems and research projects for at least 25 years.

 The section concludes with a discussion of database systems that
 attempt to address these problems.  Although these systems were
-successful in many respects, they failed to address the broad class of
-software we are interested in.
+successful in many respects, they fundamentally aim to implement a
+data model, rather than build transactions from the bottom up. \eab{move this?}


 \subsection{The database abstraction}
@ -240,42 +241,40 @@ abstractions they present.  For instance, relational database systems
 implement the relational model~\cite{cobb}, object oriented
 databases implement object abstractions, XML databases implement
 hierarchical datasets, and so on.  Before the relational model,
-navigational databases implemented pointer and record
-based data models.
+navigational databases implemented pointer- and record-based data models.

 An early survey of database implementations sought to enumerate the
 fundamental components used by database system implementors.  This
 survey was performed due to difficulties in extending database systems
 into new application domains.  The survey divided internal database
-routines into two broad modules: conceptual
-mappings~\cite{batoryConceptual} and the physical
-database~\cite{batoryPhysical} model.
+routines into two broad modules: {\em conceptual
+mappings}~\cite{batoryConceptual} and the {\em physical
+database}~\cite{batoryPhysical} model.

 A conceptual mapping might translate a relation into a set of keyed
-tuples.  A physical model could then translate a set of tuples into an
+tuples.  A physical model would then translate a set of tuples into an
 on-disk B-Tree, and provide support for iterators and range-based query
 operations.

 It is the responsibility of a database implementor to choose a set of
-conceptual mappings that implement the desired higher level
+conceptual mappings that implement the desired higher-level
 abstraction (such as the relational model).  The physical data model
 is chosen to efficiently support the set of mappings that are built on
 top of it.

-{\em The key observation of this paper is that no known physical data model
+{\em A key observation of this paper is that no known physical data model
 can support more than a small percentage of today's applications.}

 Instead of attempting to create such a model after decades of database
 research has failed to produce one, we opt to provide a transactional
 storage model that mimics the primitives provided by modern hardware.
 This makes it easy for system designers to implement most of the data
-models that the underlying hardware is capable of supporting, or to
-abandon the database approach entirely, and forgo the use of a
+models that the underlying hardware can support, or to
+abandon the data model approach entirely, and forgo the use of a
 structured physical model or conceptual mappings.

 \subsection{Extensible databases}

-
 Genesis~\cite{genesis}, an early database toolkit, was built in terms
 of a physical data model, and the conceptual mappings desribed above.
 It was designed allow database implementors to easily swap out
@ -284,11 +283,13 @@ Like subsequent systems (including \yad), it allowed it users to
 implement custom operations.

 Subsequent extensible database work builds upon these foundations.
-The Exodus~\cite{exodus} database toolkit was the successor to
+For example, the Exodus~\cite{exodus} database toolkit was the successor to
 Genesis. It supported the autmatic generation of query optimizers and
 execution engines based upon abstract data type definitions, access
 methods and cost models provided by its users.

+\eab{move this next paragraph to RW?}
+
 Starburst's~\cite{starburst} physical data model consisted of {\em
  storage methods}.  Storage methods supported {\em attachment types}
 that allowed triggers and active databases to be implemented.  An
@ -304,7 +305,7 @@ object-oriented database systems, and relational databases with
 support for user-definable abstract data types (such as in
 Postgres~\cite{postgres}) were the primary competitors to extensible
 database toolkits.  Ideas from all of these systems have been
-incorporated into the mechanisms that support user definable types in
+incorporated into the mechanisms that support user-definable types in
 current database systems.

 One can characterise the difference between database toolkits and
@ -312,16 +313,12 @@ extensible database servers in terms of early and late binding.  With
 a database toolkit, new types are defined when the database server is
 compiled.  In today's object-relational database systems, new types
 are defined at runtime.  Each approach has its advantages.  However,
-both types of systems attempted to provide similar levels of
-abstraction and flexibility to their end users.
-
-Therefore, the database toolkit approach is inappropriate for
-applications not well serviced by modern database systems.
+both types of systems aim to extend a high-level data model with new abstract data types, and thus are quite limited in the range of new applications they support.  Not surprisingly, this kind of extensibility has had little impact on the range of applications we listed above.

 \subsection{Berkeley DB}

 System R was the first relational database implementation, and was
-based upon a clean separation between it's storage system and its
+based upon a clean separation between its storage system and its
 query processing engine.  In fact, it supported a simple navigational
 interface to the storage subsystem.  To this day, database systems are
 built using this sort of architecture.  
@ -342,48 +339,36 @@ primitives.
 We have already discussed the limitations of this approach.  With the
 exception of the direct comparison of the two systems, none of the \yad 
 applications presented in Section~\ref{extensions} are efficiently
-supported by Berkeley DB.   This is a result of Berkeley DB's,  
+supported by Berkeley DB.   This is a result of Berkeley DB's  
 assumptions regarding workloads and decisions regarding low level data
-representation.  While Berkeley DB could be built on top of \yad,
+representation.  Thus, although Berkeley DB could be built on top of \yad,
 Berkeley DB is too specialized to support \yad.

-\subsection{Boxwood}
+\eab{for BDB, should we say that it still has a data model?}

-The Boxwood system provides a networked, fault-tolerant transactional
-B-Tree and ``Chunk Manager.''  We believe that \yad is an interesting
-complement to such a system, especially given \yad's focus on
-intelligence and optimizations within a single node, and Boxwoods
-focus on multiple node systems.  In particular, when implementing
-applications with predictable locality properties, it would be
-interesting to explore extensions to the Boxwood approach that make
-use of \yad's customizable semantics (Section~\ref{wal}), and fully logical logging
-mechanism. (Section~\ref{logging})


 %cover P2 (the old one, not "Pier 2" if there is time...

 \subsection{Better databases}

+The database community is also aware of this gap. 
 A recent survey~\cite{riscDB} enumerates problems that plague users of
-state-of-the-art database systems.  
-
-The survey finds that database implementations fail to support the
+state-of-the-art database systems, and finds that database implementations fail to support the
 needs of modern systems.  In large systems, this manifests itself as
 managability and tuning issues that prevent databases from predictably
 servicing diverse, large scale, declartive, workloads.  
-
 On small devices, footprint, predictable performance, and power consumption are
 primary, concerns that database systems do not address.

-Midsize deployments, such as desktop installations, must run without
-user intervention, but self-tuning, self-administering database
-servers are still an area of active research.
+%Midsize deployments, such as desktop installations, must run without
+%user intervention, but self-tuning, self-administering database
+%servers are still an area of active research.

 The survey argues that these problems cannot be adequately addressed without a fundamental shift in the architectures that underly database systems.  Complete, modern database
 implementations are generally incomprehensible and
 irreproducable, hindering further research.  The study concludes 
-by suggesting the adoption of ``RISC''
-style database architectures, both as a research and as an
+by suggesting the adoption of ``RISC''-style database architectures, both as a research and an
 implementation tool~\cite{riscDB}.  

 RISC databases have many elements in common with
@ -398,13 +383,12 @@ effort required to implement a new database system~\cite{riscDB}.
 We agree with the motivations behind RISC databases, and that a need
 for improvement in database technology exists.  In fact, is our hope
 that our system will mature to the point where it can support
-competitive relational database storage subsystems.  However this is
+a competitive relational database.  However this is
 not our primary goal.  
-
 Instead, we are interested in supporting applications that derive
 little benefit from database abstractions, but that need reliable
 storage.  Therefore, instead of building a modular database, we seek
-to build a system that allows programmers to avoid databases.
+to build a system that enables a wider range of data management options.

 %For example, large scale application such as web search, map services,
 %e-mail use databases to store unstructured binary data, if at all.
@ -983,10 +967,25 @@ concurrent, durable data structure using RVM.  We plan to add RVM
 style transactional memory to \yad in a way that is compatible with
 fully concurrent collections such as hash tables and tree structures.

+
+\section{Related Work?}
+
+The Boxwood system provides a networked, fault-tolerant transactional
+B-Tree and ``Chunk Manager.''  We believe that \yad is an interesting
+complement to such a system, especially given \yad's focus on
+intelligence and optimizations within a single node, and Boxwoods
+focus on multiple node systems.  In particular, when implementing
+applications with predictable locality properties, it would be
+interesting to explore extensions to the Boxwood approach that make
+use of \yad's customizable semantics (Section~\ref{wal}), and fully logical logging
+mechanism. (Section~\ref{logging})
+
 \section{Conclusion}

 \section{Acknowledgements}

+mike demmer, others?
+
 \section{Availability}

 Additional information, and \yad's source code is available at: