cleanup,newfigs

2006-09-02 01:24:27 +00:00 · 2006-09-02 01:24:27 +00:00 · 967caf1ee7
commit 967caf1ee7
parent d552543eae
6 changed files with 86 additions and 62 deletions
--- a/doc/paper3/LLADD.bib
+++ b/doc/paper3/LLADD.bib
@ -345,6 +345,19 @@
  OPTannote = 	 {}
 }

+@Article{stonebraker81,
+  author =       {M. Stonebraker},
+  title = 	 {Operating System Support for Database Management},
+  journal = 	 {Communications of the ACM},
+  year = 	 {1981},
+  OPTkey = 	 {},
+  volume = 	 {24},
+  number = 	 {7},
+  pages = 	 {412--418},
+  month = 	 {July},
+}
+
+
@Article{postgres,
  author = 	 {M. Stonebraker and Greg Kemnitz},
  title = 	 {The {POSTGRES} Next-Generation Database Management System},
@ -397,6 +410,14 @@
 }


+@Book{GR97,
+  author = 	 {Jim Gray and Andreas Reuters},
+  title     = {Transaction Processing: Concepts and Techniques},
+  publisher = {Morgan Kaufmann},
+  year      = {1993},
+  isbn      = {1-55860-190-2},
+  bibsource = {DBLP, http://dblp.uni-trier.de}
+}

@InProceedings{libtp,
  author = 	 {Margo Seltzer and M Olsen},
--- a/doc/paper3/LLADD.tex
+++ b/doc/paper3/LLADD.tex
@ -212,7 +212,7 @@ the ideas presented here is available (see Section~\ref{sec:avail}).
 \label{sec:notDB}

 Database research has a long history, including the development of
-many technologies that our system builds upon.  This section explains
+many of the technologies we exploit.  This section explains
 why databases are fundamentally inappropriate tools for system
 developers, and covers some of the previous responses of the systems
 community.  These problems have been the focus of
@ -221,10 +221,10 @@ database and systems researchers for at least 25 years.
 \subsection{The Database View}

 The database community approaches the limited range of DBMSs by either
-creating new top-down models, such as XML databases, 
+creating new top-down models, such as XML databases~\cite{XMLdb}, 
 or by extending the relational model~\cite{codd} along some axis, such
-as new data types.  (We cover these attempts in more detail in
-Section~\ref{sec:related-work}.) \eab{add cites}
+as new data types.  We cover these attempts in more detail in
+Section~\ref{sec:related-work}.

 %Database systems are often thought of in terms of the high-level
 %abstractions they present.  For instance, relational database systems
@ -290,7 +290,7 @@ these in more detail in Section~\ref{sec:related-work}.
 In some sense, our hypothesis is trivially true in that there exists a
 bottom-up framework called the ``operating system'' that can implement
 all of the models. A famous database paper argues that it does so
-poorly (Stonebraker 1980~\cite{Stonebraker80}). Our task is really to
+poorly (Stonebraker 1981~\cite{Stonebraker81}). Our task is really to
 simplify the implementation of transactional systems through more
 powerful primitives that enable concurrent transactions with a variety
 of performance/robustness tradeoffs.
@ -309,9 +309,9 @@ hash tables, and other access methods.  It provides flags that
 let its users tweak aspects of the performance of these
 primitives, and selectively disable the features it provides.

-With the exception of the benchmark designed to fairly compare the two
+With the exception of the benchmark designed to compare the two
 systems, none of the \yad applications presented in
-Section~\ref{sec:extensions} are efficiently supported by Berkeley DB.
+Section~\ref{experiments} are efficiently supported by Berkeley DB.
 This is a result of Berkeley DB's assumptions regarding workloads and
 decisions regarding low-level data representation.  Thus, although
 Berkeley DB could be built on top of \yad, Berkeley DB's data model
@ -404,7 +404,7 @@ performance, since the synchronous writes to the log are sequential.
 Later, the pages are written out asynchronously, often
 as part of a larger sequential write.

-After a crash, we have to apply the REDO entries to those pages that
+After a crash, we have to apply the redo entries to those pages that
 were not updated on disk.  To decide which updates to reapply, we use
 a per-page version number called the {\em log-sequence number} or
 {\em LSN}. Each update to a page increments the LSN, writes it on the
@ -427,7 +427,7 @@ active transaction in progress all the time.  Systems that support
 {\em steal} avoid these problems by allowing pages to be written back
 early.  This implies we may need to undo updates on the page if the
 transaction aborts, and thus before we can write out the page we must
-write the UNDO information to the log. 
+write the undo information to the log. 

 On recovery, the redo phase applies all updates (even those from
 aborted transactions).  Then, an undo phase corrects stolen pages for
@ -451,7 +451,7 @@ argument.  The undo entry is analogous.\endnote{For efficiency, undo
 and redo operations are packed into a single log entry.  Both must take
 the same parameters.}  \yad ensures the correct ordering and timing
 of all log entries and page writes.  We describe operations in more
-detail in Section~\ref{operations}
+detail in Section~\ref{sec:operations}

 %\subsection{Multi-page Transactions}

@ -485,7 +485,7 @@ To understand the problems that arise with concurrent transactions,
 consider what would happen if one transaction, A, rearranges the
 layout of a data structure.  Next, a second transaction, B,
 modifies that structure and then A aborts.  When A rolls back, its
-UNDO entries will undo the rearrangement that it made to the data
+undo entries will undo the rearrangement that it made to the data
 structure, without regard to B's modifications.  This is likely to
 cause corruption.

@ -515,7 +515,7 @@ splitting tree nodes.
 The internal operations do not need to be undone if the
 containing transaction aborts; instead of removing the data item from
 the page, and merging any nodes that the insertion split, we simply
-remove the item from the set as application code would; we call the
+remove the item from the set as application code would --- we call the
 data structure's {\em remove} method.  That way, we can undo the
 insertion even if the nodes that were split no longer exist, or if the
 data item has been relocated to a different page.  This
@ -523,12 +523,11 @@ lets other transactions manipulate the data structure before the first
 transaction commits.

 In \yad, each nested top action performs a single logical operation by applying
-a number of physical operations to the page file.  Physical \rcs{get rid of ALL CAPS...} REDO and
-UNDO log entries are stored in the log so that recovery can repair any
+a number of physical operations to the page file.  Physical redo and undo log entries are stored in the log so that recovery can repair any
 temporary inconsistency that the nested top action introduces.  Once
-the nested top action has completed, a logical UNDO entry is recorded,
+the nested top action has completed, a logical undo entry is recorded,
 and a CLR is used to tell recovery and abort to skip the physical
-UNDO entries.
+undo entries.

 This leads to a mechanical approach for creating reentrant, concurrent
 operations:
@ -536,9 +535,9 @@ operations:
 \begin{enumerate}
 \item Wrap a mutex around each operation.  With care, it is possible 
  to use finer-grained latches in a \yad operation, but it is rarely necessary.
-\item Define a {\em logical} UNDO for each operation (rather than just
-  using a set of page-level UNDOs).  For example, this is easy for a
-  hash table: the UNDO for {\em insert} is {\em remove}.  This logical
+\item Define a {\em logical} undo for each operation (rather than just
+  using a set of page-level undos).  For example, this is easy for a
+  hash table: the undoS for {\em insert} is {\em remove}.  This logical
  undo function should arrange to acquire the mutex when invoked by
  abort or recovery.
 \item Add a ``begin nested top action'' right after the mutex
@ -567,6 +566,7 @@ with the variable-sized atomic updates covered in Section~\ref{sec:lsn-free}.


 \subsection{User-Defined Operations}
+\label{sec:operations}

 The first kind of extensibility enabled by \yad is user-defined operations.
 Figure~\ref{fig:structure} shows how operations interact with \yad.  A
@ -589,10 +589,10 @@ write-ahead logging rules required for steal/no-force transactions by
 controlling the timing and ordering of log and page writes.  Each
 operation should be deterministic, provide an inverse, and acquire all
 of its arguments from a struct that is passed via {\tt Tupdate()}, from
-the page it updates, or typically both.  The callbacks used
+the page it updates, or both.  The callbacks used
 during forward operation are also used during recovery.  Therefore
 operations provide a single redo function and a single undo function.
-(There is no ``do'' function.)  This reduces the amount of
+There is no ``do'' function, which reduces the amount of
 recovery-specific code in the system.  

 %{\tt Tupdate()} writes the struct
@ -629,7 +629,7 @@ implementation must obey a few more invariants:
      Tupdate()}.
 \item Page updates atomically update the page's LSN by pinning the page.
 %\item If the data seen by a wrapper function must match data seen
-%  during REDO, then the wrapper should use a latch to protect against
+%  during redo, then the wrapper should use a latch to protect against
 %  concurrent attempts to update the sensitive data (and against
 %  concurrent attempts to allocate log entries that update the data).
 \item Nested top actions (and logical undo) or ``big locks'' (total isolation) should be used to manage concurrency (Section~\ref{sec:nta}).
@ -723,8 +723,7 @@ The transactions described above only provide the
 typically provided by locking, which is a higher level but
 compatible layer.  ``Consistency'' is less well defined but comes in
 part from low-level mutexes that avoid races, and in part from
-higher-level constructs such as unique key requirements.  \yad (and many databases),
-supports this by distinguishing between {\em latches} and {\em locks}.
+higher-level constructs such as unique key requirements.  \yad and most databases support this by distinguishing between {\em latches} and {\em locks}.
 Latches are provided using OS mutexes, and are held for
 short periods of time.  \yads default data structures use latches in a
 way that does not deadlock.  This allows higher-level code to treat 
@ -1021,8 +1020,8 @@ optimizations and a wide-range of transactional systems.
 \yad provides applications with the ability to customize storage
 routines and recovery semantics.  In this section, we show that this
 flexibility does not come with a significant performance cost for
-general purpose transactional primitives, and show how a number of
-special purpose interfaces aid in the development of higher-level 
+general-purpose transactional primitives, and show how a number of
+special-purpose interfaces aid in the development of higher-level 
 code while significantly improving application performance.

 \subsection{Experimental setup}
@ -1119,8 +1118,7 @@ function~\cite{lht}, allowing it to increase capacity incrementally.
 It is based on a number of modular subcomponents.  Notably, the
 physical location of each bucket is stored in a growable array of
 fixed-length entries.  The bucket lists are provided by the user's
-choice of two different linked-list implementations. \eab{still
-unclear} \rcs{OK now?}
+choice of two different linked-list implementations.

 The hand-tuned hash table is also built on \yad and also uses a linear hash
 function.  However, it is monolithic and uses carefully ordered writes to
@ -1153,7 +1151,7 @@ optimize important primitives.
 %the transactional data structure implementation.

 Figure~\ref{fig:TPS} describes the performance of the two systems under
-highly concurrent workloads using the ext3 filesystem.endnote{The multi-threaded benchmarks
+highly concurrent workloads using the ext3 filesystem.\endnote{The multi-threaded benchmarks
  presented here were performed using an ext3 file system, as high
  concurrency caused both Berkeley DB and \yad to behave unpredictably
  when ReiserFS was used.  However, \yads multi-threaded throughput
@ -1206,18 +1204,18 @@ persistence library, \oasys.  \oasys makes use of pluggable storage
 modules that implement persistent storage, and includes plugins
 for Berkeley DB and MySQL.  

-This section will describe how the \yad \oasys plugin supports optimizations that reduce the
+This section describes how the \yads plugin supports optimizations that reduce the
 amount of data written to log and halve the amount of RAM required.
-We present three variants of the \yad plugin.  One treats
+We present three variants of the \yad plugin.  The basic one treats
 \yad like Berkeley DB.  The ``update/flush'' variant
 customizes the behavior of the buffer manager. Finally, the 
-``delta'' variant, uses update/flush, and only logs the differences
-between versions of objects.
+``delta'' variant, uses update/flush, but only logs the differences
+between versions.

 The update/flush variant allows the buffer manager's view of live
 application objects to become stale.  This is safe since the system is
 always able to reconstruct the appropriate page entry from the live
-copy of the object.  This reduces the number of times the \yad \oasys
+copy of the object.  This reduces the number of times the \oasys
 plugin must update serialized objects in the buffer manager, and
 allows us to drastically decrease the amount of memory used by the
 buffer manager.  
@ -1244,14 +1242,14 @@ allocations and deallocations based on the page LSN.  To redo an
 update, we first decide whether the object that is being updated
 exists on the page.  If so, we apply the blind update.  If not, then
 the object must have already been freed, so we do not apply the
-update. Because support for blind updates is not yet implemented, the
+update. Because support for blind updates is only partially implemented, the
 experiments presented below mimic this behavior at runtime, but do not
 support recovery.

 We also considered storing multiple LSNs per page and registering a
 callback with recovery to process the LSNs.  However, in such a
 scheme, the object allocation routine would need to track objects that
-were deleted but still may be manipulated during REDO.  Otherwise, it
+were deleted but still may be manipulated during redo.  Otherwise, it
 could inadvertently overwrite per-object LSNs that would be needed
 during recovery.
 %
@ -1313,10 +1311,15 @@ To determine the effect of the optimization in memory bound systems,
 we decreased \yads page cache size, and used O\_DIRECT to bypass the
 operating system's disk cache.  We partitioned the set of objects
 so that 10\% fit in a {\em hot set} \rcs{This doesn't make sense: that is small enough to fit into
-memory}.  Figure~\ref{fig:OASYS} presents \yads performance as we varied the
+memory}.  Figure~\ref{fig:OASYS} also presents \yads performance as we varied the
 percentage of object updates that manipulate the hot set.  In the
 memory bound test, we see that update/flush indeed improves memory
-utilization. \rcs{Graph axis should read ``percent of updates in hot set''}
+utilization.
+
+
+
+
+

 \subsection{Request reordering}

@ -1349,7 +1352,7 @@ reordering is inexpensive.}
 We are interested in using \yad to directly manipulate sequences of
 application requests.  By translating these requests into the logical
 operations that are used for logical undo, we can use parts of \yad to
-manipulate and interpret such requests.  Because logical generally
+manipulate and interpret such requests.  Because logical operations generally
 correspond to application-level operations, application developers can easily determine whether
 logical operations may be reordered, transformed, or even dropped from
 the stream of requests that \yad is processing.  For example,
@ -1386,16 +1389,16 @@ The second experiment measures the effect of graph locality
 (Figure~\ref{fig:hotGraph}).  Each node has a distinct hot set that
 includes the 10\% of the nodes that are closest to it in ring order.
 The remaining nodes are in the cold set.  We do not use ring edges for
-this test, so the graphs might not be connected. (We use the same set
-of graphs for both systems.)
+this test, so the graphs might not be connected. We use the same set
+of graphs for both systems.

 When the graph has good locality, a normal depth first search
 traversal and the prioritized traversal both perform well.  As
 locality decreases, the partitioned traversal algorithm outperforms
 the naive traversal.

-\rcs{Graph axis should read ``Percent of edges in hot set'', or
-``Percent local edges''.}
+
+

 \section{Related Work}
 \label{sec:related-work}
@ -1419,16 +1422,16 @@ subsequent systems (including \yad), it supports custom operations.
 Subsequent extensible database work builds upon these foundations.
 The Exodus~\cite{exodus} database toolkit is the successor to
 Genesis. It uses abstract data type definitions, access methods and
-cost models to automatically generate query optimizers and execution
-engines.
+cost models to  generate query optimizers and execution
+engines automatically.

 Object-oriented database systems (\rcs{cite something?}) and
 relational databases with support for user-definable abstract data
 types (such as in Postgres~\cite{postgres}) provide functionality
-similar to extensible database toolkits.  In contrast to database toolkits,
-which leverage type information as the database server is compiled, object
-oriented and object relational databases allow types to be defined at
-runtime.
+similar to extensible database toolkits.  In contrast to database
+toolkits, which leverage type information as the database server is
+compiled, object-oriented and object-relational databases allow types
+to be defined at runtime.

 Both approaches extend a fixed high-level data model with new
 abstract data types.  This is of limited use to applications that are
@ -1448,7 +1451,7 @@ unpredictable and unmanageable to scale up to the size of today's
 systems.  Similarly, they are a poor fit for small devices.  SQL's
 declarative interface only complicates the situation.

-The study suggests the adoption of highly modular {\em RISC} database
+The study suggests the adoption of highly modular ``RISC'' database
 architectures, both as a resource for researchers and as a real-world
 database system.  RISC databases have many elements in common with
 database toolkits.  However, they would take the idea one step
@ -1510,8 +1513,8 @@ Nested transactions simplify distributed systems; they isolate
 failures, manage concurrency, and provide durability.  In fact, they
 were developed as part of Argus, a language for reliable distributed applications.  An Argus
 program consists of guardians, which are essentially objects that
-encapsulate persistent and atomic data.  While accesses to {\em atomic} data are 
-serializable {\em persistent} data is not protected by the lock manager, 
+encapsulate persistent and atomic data.  Although accesses to {\em atomic} data are 
+serializable,  {\em persistent} data is not protected by the lock manager, 
 and is used to implement concurrent data structures~\cite{argus}.  
 Typically, the data structure is stored in persistent storage, but is augmented with
 information in atomic storage.  This extra data tracks the
@ -1592,17 +1595,15 @@ available.  In QuickSilver, nested transactions would
 be most useful when a series of program invocations
 form a larger logical unit~\cite{experienceWithQuickSilver}.

-\subsection{Transactional data structures}
-
-\rcs{Better section name?}
+\subsection{Data Structure Frameworks}

 As mentioned in Section~\ref{sec:system}, Berkeley DB is a system
 quite similar to \yad, and provides raw access to
 transactional data structures for application
-programmers~\cite{libtp}.  
+programmers~\cite{libtp}.  \eab{summary?}

 Cluster hash tables provide scalable, replicated hashtable
-implementation by partitioning the hash's buckets across multiple
+implementation by partitioning the table's buckets across multiple
 systems.  Boxwood treats each system in a cluster of machines as a
 ``chunk store,'' and builds a transactional, fault tolerant B-Tree on
 top of the chunks that these machines export.  
@ -1613,6 +1614,8 @@ fault tolerance.  In contrast, \yad makes it easy to push intelligence
 into the individual nodes, allowing them to provide primitives that
 are appropriate for the higher-level service.  

+
+
 \subsection{Data layout policies}
 \label{sec:malloc}
 Data layout policies make decisions based upon
@ -1801,11 +1804,11 @@ and read-only access methods.  The wrapper function modifies the state
 of the page file by packaging the information that will be needed for
 undo and redo into a data format of its choosing.  This data structure
 is passed into Tupdate().  Tupdate() copies the data to the log, and
-then passes the data into the operation's REDO function.
+then passes the data into the operation's redo function.
 
-REDO modifies the page file directly (or takes some other action).  It
+Redo modifies the page file directly (or takes some other action).  It
 is essentially an interpreter for the log entries it is associated
-with.  UNDO works analogously, but is invoked when an operation must
+with.  Undo works analogously, but is invoked when an operation must
 be undone (usually due to an aborted transaction, or during recovery).

 This pattern applies in many cases.  In
@ -1813,10 +1816,10 @@ order to implement a ``typical'' operation, the operation's
 implementation must obey a few more invariants:

 \begin{itemize}
-\item Pages should only be updated inside REDO and UNDO functions.
+\item Pages should only be updated inside redo and undo functions.
 \item Page updates atomically update the page's LSN by pinning the page.
 \item If the data seen by a wrapper function must match data seen
-  during REDO, then the wrapper should use a latch to protect against
+  during redo, then the wrapper should use a latch to protect against
  concurrent attempts to update the sensitive data (and against
  concurrent attempts to allocate log entries that update the data).
 \item Nested top actions (and logical undo) or ``big locks'' (total isolation but lower concurrency) should be used to manage concurrency (Section~\ref{sec:nta}).
--- a/doc/paper3/figs/bulk-load.pdf
+++ b/doc/paper3/figs/bulk-load.pdf
--- a/doc/paper3/figs/mem-pressure.pdf
+++ b/doc/paper3/figs/mem-pressure.pdf
--- a/doc/paper3/figs/object-diff.pdf
+++ b/doc/paper3/figs/object-diff.pdf
--- a/doc/paper3/figs/trans-closure-hotset.pdf
+++ b/doc/paper3/figs/trans-closure-hotset.pdf