sec1

2006-08-12 21:30:54 +00:00 · 2006-08-12 21:30:54 +00:00 · f706cb6d22
commit f706cb6d22
parent b41f3cce18
1 changed files with 34 additions and 36 deletions
--- a/doc/paper3/LLADD.tex
+++ b/doc/paper3/LLADD.tex
@ -122,12 +122,12 @@ onto SQL or the monolithic approach of current databases.
 Simply providing
 access to a database system's internal storage module is an improvement.
 However, many of these applications require special transactional properties 
-that general purpose transactional storage systems do not provide.  In
+that general-purpose transactional storage systems do not provide.  In
 fact, DBMSs are often not used for these systems, which instead
 implement custom, ad-hoc data management tools on top of file
 systems.

-A typical example of this mismatch is in the support for
+An example of this mismatch is in the support for
 persistent objects.
 % in Java, called {\em Enterprise Java Beans}
 %(EJB). 
@ -136,9 +136,9 @@ mapping each object to a row in a table (or sometimes multiple
 tables)~\cite{hibernate} and then issuing queries to keep the objects and
 rows consistent. An update must confirm it has the current
 version, modify the object, write out a serialized version using the
-SQL update command and commit.  Also, for efficiency, most systems must 
+SQL update command, and commit.  Also, for efficiency, most systems must 
 buffer two copies of the application's working set in memory.  
-This is an awkward and slow mechanism.
+This is an awkward and inefficient mechanism, and hence we claim that DBMSs do not support this task well.

 Bioinformatics systems perform complex scientific
 computations over large, semi-structured databases with rapidly evolving schemas.  Versioning and
@ -154,7 +154,7 @@ photo and video repositories, bioinformatics, version control systems,
 work-flow applications, CAD/VLSI applications and directory services.

 In short, we believe that a fundamental architectural shift in
-transactional storage is necessary before general purpose storage
+transactional storage is necessary before general-purpose storage
 systems are of practical use to modern applications.
 Until this change occurs, databases' imposition of unwanted
 abstraction upon their users will restrict system designs and
@ -166,13 +166,13 @@ storage at a level of abstraction as close to the hardware as
 possible.  The library can support special purpose, transactional
 storage interfaces in addition to ACID database-style interfaces to
 abstract data models.  \yad incorporates techniques from databases
-(e.g. write-ahead-logging) and operating systems (e.g. zero-copy techniques).
+(e.g. write-ahead logging) and operating systems (e.g. zero-copy techniques).

 Our goal is to combine the flexibility and layering of low-level
 abstractions typical for systems work with the complete semantics
 that exemplify the database field.
-By {\em flexible} we mean that \yad{}  can implement a wide
-range of transactional data structures, that it can support a variety
+By {\em flexible} we mean that \yad{}  can support a wide
+range of transactional data structures {\em efficiently}, and that it can support a variety
 of policies for locking, commit, clusters and buffer management.
 Also, it is extensible for new core operations
 and new data structures. It is this flexibility that allows the
@ -190,16 +190,16 @@ delivers these properties as reusable building blocks for systems
 that implement complete transactions.

 Through examples and their good performance, we show how \yad{}
-supports a wide range of uses that fall in the gap between 
+efficiently supports a wide range of uses that fall in the gap between 
 database and filesystem technologies, including
-persistent objects, graph or XML based applications, and recoverable
+persistent objects, graph- or XML-based applications, and recoverable
 virtual memory~\cite{lrvm}.  

 For example, on an object serialization workload, we provide up to 
 a 4x speedup over an in-process MySQL implementation and a 3x speedup over Berkeley DB, while 
 cutting memory usage in half (Section~\ref{sec:oasys}). 
 We implemented this extension in 150 lines of C, including comments and boilerplate.  We did not have this type of optimization
-in mind when we wrote \yad, and in fact the idea came from a potential 
+in mind when we wrote \yad, and in fact the idea came from a
 user unfamiliar with \yad.

 %\e ab{others?  CVS, windows registry, berk DB, Grid FS?}
@ -207,14 +207,14 @@ user unfamiliar with \yad.

 This paper begins by contrasting \yads approach with that of
 conventional database and transactional storage systems.  It proceeds
-to discuss write-ahead-logging, and describe ways in which \yad can be
-customized to implement many existing (and some new) write-ahead-logging variants.  Implementations of some of these variants are
-presented, and benchmarked against popular real-world systems.  We
-conclude with a survey of the technologies the \yad implementation is
-based upon.
+to discuss write-ahead logging, and describe ways in which \yad can be
+customized to implement many existing (and some new) write-ahead
+logging variants.  We present implementations of some of these variants and
+benchmark them against popular real-world systems.  We
+conclude with a survey of the technologies upon which \yad is based.

 An (early) open-source implementation of
-the ideas presented here is available.
+the ideas presented here is available at \eab{where?}.

 \section{\yad is not a Database}
 \label{sec:notDB}
@ -261,6 +261,7 @@ be more appropriate~\cite{molap}.  While both OLTP and OLAP databases are based
 upon the relational model they make use of different physical models
 in order to serve different classes of applications.}

+\eab{need to expand the following and add evidence.}
 A key observation of this paper is that no known physical data model
 can efficiently support more than a small percentage of today's applications.  

@ -279,8 +280,8 @@ similar to ours.  Although these projects were successful in many
 respects, they fundamentally aimed to implement an extensible abstract
 data model, rather than take a bottom-up approach and allow
 applications to customize the physical model in order to support new
-high level abstractions.  In each case, this limits these systems to
-applications their physical models support well.
+high-level abstractions.  In each case, this limits these systems to
+applications their physical models support well.\eab{expand this claim}

 \subsubsection{Extensible databases}

@ -343,7 +344,7 @@ of the object to write to.  If a subaction or transaction abort their
 local copy is simply discarded.  At commit, the local copy replaces
 the global copy.}

-\rcs{Still need to mention CORBA / EJB + ORDBMS here. Also, missing a high level point:  Most research systems were backed with
+\rcs{Still need to mention CORBA / EJB + ORDBMS here. Also, missing a high-level point:  Most research systems were backed with
 non-concurrent transactional storage; current commercial systems (eg:
 EJB) tend to make use of object relational mappings.  Bill's stuff would be a good fit for that section, along with work describing how to let multiple threads / machines handle locking in an easy to reason about fashion.}

@ -414,7 +415,7 @@ applications presented in Section~\ref{sec:extensions} are efficiently
 supported by Berkeley DB.   This is a result of Berkeley DB's  
 assumptions regarding workloads and decisions regarding low level data
 representation.  Thus, although Berkeley DB could be built on top of \yad,
-Berkeley DB's data model and write-ahead-logging system are too specialized to support \yad.
+Berkeley DB's data model and write-ahead logging system are too specialized to support \yad.

 %cover P2 (the old one, not Pier 2 if there is time...

@ -456,9 +457,7 @@ We agree with the motivations behind RISC databases and the goal
 of highly modular database implementations.  In fact, we  hope
 our system will mature to the point where it can support 
 a competitive relational database.  However this is
-not our primary goal, as we seek instead to enable a wider range of data management options.
-
-\eab{discuss "wider range"}
+not our primary goal, as we seek instead to enable a wider range of data management options.\eab{expand on ``wider''}

 %For example, large scale application such as web search, map services,
 %e-mail use databases to store unstructured binary data, if at all.
@ -513,7 +512,7 @@ locks and discusses the alternatives \yad provides to application developers.
 Transactional storage algorithms work because they are able to
 atomically update portions of durable storage.  These small atomic
 updates are used to bootstrap transactions that are too large to be
-applied atomically.  In particular, write ahead logging (and therefore
+applied atomically.  In particular, write-ahead logging (and therefore
 \yad) relies on the ability to atomically write entries to the log
 file.

@ -761,10 +760,10 @@ of data, rather than atomic in-memory updates, as the term is normally
 used in systems work; %~\cite{GR97}; 
 the latter is covered by ``C'' and
 ``I''.}  ``Isolation'' is
-typically provided by locking, which is a higher-level (but
-comaptible) layer.  ``Consistency'' is less well defined but comes in
+typically provided by locking, which is a higher-level but
+comaptible layer.  ``Consistency'' is less well defined but comes in
 part from low-level mutexes that avoid races, and partially from
-higher level constructs such as unique key requirements.  \yad
+higher-level constructs such as unique key requirements.  \yad
 supports this by distinguishing between {\em latches} and {\em locks}.
 Latches are provided using operating system mutexes, and are held for
 short periods of time.  \yads default data structures use latches in a
@ -777,7 +776,7 @@ use of a lock manager.  Alternatively, applications may follow
 the example of \yads default data structures, and implement 
 deadlock avoidance, or other custom lock management schemes.\rcs{Citations here?}

-This allows higher level code to treat \yad as a conventional
+This allows higher-level code to treat \yad as a conventional
 reentrant data structure library.  It is the application's
 responsibility to provide locking, whether it be via a database-style
 lock manager, or an application-specific locking protocol.  Note that
@ -803,14 +802,13 @@ Hoard, a malloc implementation for SMP machines~\cite{hoard}.

 Note that both lock managers have implementations that are tied to the
 code they service, both implement deadlock avoidance, and both are
-transparent to higher layers.  General purpose database lock managers
+transparent to higher layers.  General-purpose database lock managers
 provide none of these features, supporting the idea that special
 purpose lock managers are a useful abstraction.\rcs{This would be a
-good place to cite Bill and others on higher level locking protocols}
+good place to cite Bill and others on higher-level locking protocols}

 Locking is largely orthoganol to the concepts desribed in this paper.
-We make no assumptions regarding lock managers being used by higher
-level code in the remainder of this discussion.
+We make no assumptions regarding lock managers being used by higher-level code in the remainder of this discussion.

 \section{LSN-free pages.}
 \label{sec:lsn-free}
@ -1017,7 +1015,7 @@ played back in order, each sector would contain the most up to date
 version after redo.

 Of course, we do not want to constrain log entries to update entire
-sectors at once.  In order to support finer grained logging, we simply
+sectors at once.  In order to support finer-grained logging, we simply
 repeat the above argument on the byte or bit level.  Each bit is
 either overwritten by redo, or has a known, correct, value before
 redo.  Since all operations performed by redo are blind writes, they
@ -1327,7 +1325,7 @@ disk activity.

 Furthermore, objects may be written to disk in an
 order that differs from the order in which they were updated, 
-violating one of the write-ahead-logging invariants.  One way to 
+violating one of the write-ahead logging invariants.  One way to 
 deal with this is to maintain multiple LSN's per page.  This means we would need to register a
 callback with the recovery routine to process the LSN's (a similar
 callback will be needed in Section~\ref{sec:zeroCopy}), and 
@ -1609,7 +1607,7 @@ is a common pattern in system software design, and manages
 dependencies and ordering constraints between sets of components.
 Over time, we hope to shrink \yads core to the point where it is
 simply a resource manager and a set of implementations of a few unavoidable
-algorithms related to write-ahead-logging.  For instance, 
+algorithms related to write-ahead logging.  For instance, 
 we suspect that support for appropriate callbacks will 
 allow us to hard-code a generic recovery algorithm into the 
 system.  Similarly, any code that manages book-keeping information, such as