sec 1 2

2005-03-26 00:57:00 +00:00 · 2005-03-26 00:57:00 +00:00 · 2d2e8cef0c
commit 2d2e8cef0c
parent 0a50a40ba1
1 changed files with 51 additions and 66 deletions
--- a/doc/paper2/LLADD.tex
+++ b/doc/paper2/LLADD.tex
@ -95,7 +95,7 @@ systems.
 Other systems that could benefit from transactions include file
 systems, version-control systems, bioinformatics, workflow
 applications, search engines, recoverable virtual memory, and
-programming languages with persistent objects (or structures).
+programming languages with persistent objects.

 In essence, there is an {\em impedance mismatch} between the data
 model provided by a DBMS and that required by these applications. This is
@ -109,7 +109,7 @@ The most obvious example of this mismatch is in the support for
 persistent objects in Java, called {\em Enterprise Java Beans}
 (EJB). In a typical usage, an array of objects is made persistent by
 mapping each object to a row in a table\footnote{If the object is
-stored in normalized relational format, it may span many rows and tables.~\cite{Hibernate}} 
+stored in normalized relational format, it may span many rows and tables~\cite{Hibernate}.} 
 and then issuing queries to
 keep the objects and rows consistent  A typical update must confirm
 it has the current version, modify the object, write out a serialized
@ -121,7 +121,7 @@ The DBMS actually has a navigational transaction system within it,
 which would be of great use to EJB, but it is not accessible except
 via the query language.  In general, this occurs because the internal
 transaction system is complex and highly optimized for
-high-performance update-in-place transactions (mostly financial).
+high-performance update-in-place transactions.

 In this paper, we introduce a flexible framework for ACID
 transactions, \yad, that is intended to support a broader range of
@ -154,21 +154,20 @@ way for systems to provide complete transactions.

 With these trends in mind, we have implemented a modular, extensible
 transaction system based on on ARIES that makes as few assumptions as
-possible about application data structures or workload. Where such
+possible about application data or workloads. Where such
 assumptions are inevitable, we have produced narrow APIs that allow
-the application developer to plug in alternative implementations or
+the developer to plug in alternative implementations or
 define custom operations. Rather than hiding the underlying complexity
 of the library from developers, we have produced narrow, simple APIs
 and a set of invariants that must be maintained in order to ensure
-transactional consistency, allowing application developers to produce
+transactional consistency, which allows developers to produce
 high-performance extensions with only a little effort.  

 Specifically, application developers using \yad can control: 1)
-on-disk representations, 2) access-method implementations (including
+on-disk representations, 2) data structure implementations (including
 adding new transactional access methods), 3) the granularity of
 concurrency, 4) the precise semantics of atomicity, isolation and
-durability, 5) request scheduling policies, and 6) the style of
-synchronization (e.g. deadlock detection or avoidance).  Developers
+durability, 5) request scheduling policies, and 6) choose deadlock detection or avoidance.  Developers
 can also exploit application-specific or workload-specific assumptions
 to improve performance.

@ -178,12 +177,12 @@ These features are enabled by the several mechanisms:
      transactional data representations (Section~\ref{page-layouts}).
 \item[Extensible log formats] provide high-level control over
      transaction data structures (Section~\ref{op-def}).
-\item [High and low level control over the log] such as calls to ``log this
+\item [High- and low-level control over the log] such as calls to ``log this
      operation'' or ``write a compensation record'' (Section~\ref{log-manager}).
 \item [In memory logical logging] provides a data store independent
      record of application requests, allowing ``in flight'' log
      reordering, manipulation and durability primitives to be
-      developed (Section~\ref{graph-traversal}).
+      developed (Section~\ref{TransClos}).
 \item[Extensible locking API] provides registration of custom lock managers
      and a generic lock manager implementation (Section~\ref{lock-manager}).
 \item[Custom durability operations] such as two phase commit's
@ -191,10 +190,8 @@ These features are enabled by the several mechanisms:
 \end{description}

 We have produced a high-concurrency, high performance and reusable
-open-source implementation of these concepts.  Portions of our
-implementation's API are still changing, but the interfaces to low
-level primitives, and implementations of basic functionality have
-stabilized.  
+open-source implementation of these mechanisms.  Portions of our
+implementation's API are still changing, but the interfaces to low-level primitives, and most implementations have stabilized.  

 To validate these claims, we walk
 through a sequence of optimizations for a transactional hash
@ -202,10 +199,9 @@ table in Section~\ref{sub:Linear-Hash-Table}, an object serialization
 scheme in Section~\ref{OASYS}, and a graph traversal algorithm in 
 Section~\ref{TransClos}.  Benchmarking figures are provided for each 
 application.  \yad also includes a cluster hash table 
-built upon two-phase commit which will not be described in detail 
-in this paper.  Similarly we did not have space to discuss \yad's 
+built upon two-phase commit, which will not be described.  Similarly we did not have space to discuss \yad's 
 blob implementation, which demonstrates how \yad can
-add transactional primitives to data stored in the file system.
+add transactional primitives to data stored in a file system.

 %To validate these claims, we developed a number of applications such
 %as an efficient persistent object layer, {\em @todo locality preserving
@ -284,12 +280,12 @@ largely filled this gap by providing a simpler, less concurrent
 database that can work with a variety of storage options including
 Berkeley DB (covered below) and regular files, although these
 alternatives affect the semantics of transactions, and sometimes 
-disable or interfere with high level database features.  MySQL 
-includes these multiple storage engines for performance reasons.  
+disable or interfere with high-level database features.  MySQL 
+includes these multiple storage options for performance reasons.  
 We argue that by reusing code, and providing for a greater amount 
 of customization, a modular storage engine can provide better 
-performance, increased transparency and more flexibility then a 
-set of monolithic storage engines.\eab{need to discuss other flaws! clusters? what else?}
+performance, transparency and flexibility than a 
+set of monolithic storage engines.

 %% Databases are designed for circumstances where development time often
 %% dominates cost, many users must share access to the same data, and
@ -313,11 +309,10 @@ add new index and object types.~\cite{newTypes}  Although some of the methods ar
 similar to ours, \yad also implements a lower-level
 interface that can coexist with these methods.  Without these
 low-level APIs, Postgres suffers from many of the limitations inherent
-to the database systems mentioned above.  This is because Postgres was
-designed to provide these extensions within the context of the
-relational model.  Therefore, these extensions focused upon improving
-query language and indexing support.  Instead of focusing upon this,
-\yad is more interested in lower-level systems. Therefore, although we
+to the database systems mentioned above, as its extensions focus on
+improving
+query language and indexing support.
+Although we
 believe that many of the high-level Postgres interfaces could be built
 on top of \yad, we have not yet tried to implement them.
 % seems to provide
@ -326,15 +321,13 @@ on top of \yad, we have not yet tried to implement them.
 %writes correctly) and those that refer to relations or application
 %data types, since \yad does not have a built-in concept of a relation.
 However, \yad does provide an iterator interface which we hope to
-extend to provide support for relational algebra, and common
-programming paradigms.
+extend to provide support for query processing.

 Object-oriented and XML database systems provide models tied closely
 to programming language abstractions or hierarchical data formats.
 Like the relational model, these models are extremely general, and are
 often inappropriate for applications with stringent performance
-demands, or that use these models in a way that was not anticipated by
-the database vendor.  Furthermore, data stored in these databases
+demands, or those that use  these models in unusual ways.  Furthermore, data stored in these databases
 often is formatted in a way that ties it to a specific application or
 class of algorithms~\cite{lamb}.  We will show that \yad can provide 
 specialized support for both classes of applications, via a persistent 
@ -368,32 +361,28 @@ order to serve these applications, many software systems have been
 developed.  Some are extremely complex, such as semantic file
 systems, where the file system understands the contents of the files
 that it contains, and is able to provide services such as rapid
-search, or file-type specific operations such as thumb-nailing,
-automatic content updates, and so on \cite{Reiser4,WinFS,BeOS,SemanticFSWork,SemanticWeb}.  Others are simpler, such as
+search, or file-type specific operations such as thumb nails \cite{Reiser4,WinFS,BeOS,SemanticFSWork,SemanticWeb}.  Others are simpler, such as
 Berkeley~DB~\cite{bdb, berkeleyDB}, which provides transactional
-% bdb's recno interface seems to be a specialized b-tree implementation - Rusty
 storage of data in indexed form using a hashtable or tree, or as a queue.  
+% bdb's recno interface seems to be a specialized b-tree implementation - Rusty

-\rcs{Eric, Mike:   How's this?}
-\eab{need a (careful) dedicated paragraph on Berkeley DB}
-
-While Berkeley DB's feature set is similar to the features provided by
+Although Berkeley DB's feature set is similar to the features provided by
 \yad's implementation, there is an important distinction.  Berkeley DB
 provides general implementations of a handful of transactional
 structures and provides flags to enable or tweak certain pieces of
-functionality such as lock managers, log forces, and so on.  While
-\yad provides some of the high level calls that Berkeley DB supports
+functionality such as lock management, log forces, and so on. Although
+\yad provides some of the high-level calls that Berkeley DB supports
 (and could probably be extended to provide most or all of these calls), \yad
-also provides lower level access to transactional primatives.  For
+also provides lower-level access to transactional primitives.  For
 instance, Berkeley DB does not allow data to be accessed by physical
 (page) offset, and does not let applications implement new types of
-log entries for recovery.  It only supports builtin page layout types,
+log entries for recovery.  It only supports built-in page layout types,
 and does not allow applications to directly access the functionality
-provided by these layouts.  While the usefulness of providing such
+provided by these layouts.  Although the usefulness of providing such
 low-level functionality to applications may not be immediately
 obvious, the focus of this paper is to describe how these limitations
 impact application performance, and ultimately complicate development
-and system deployment efforts.  
+and deployment efforts.  

 \rcs{Potential conclusion material after this line in the .tex file..}

@ -405,40 +394,37 @@ and system deployment efforts.
 %Berkeley DB, while Sections~\ref{OASYS} and~\ref{TransClos} show that
 %such optimizations have practical value.

-\eab{this paragraph needs work...}
 LRVM is a version of malloc() that provides
 transactional memory, and is similar to an object-oriented database
 but is much lighter weight, and lower level~\cite{lrvm}.  Unlike 
 the solutions mentioned above, it does not impose limitations upon 
-the layout of application data.
-However, its approach does not handle concurrent
-transactions well because the addition of concurrency support to transactional
-data structures typically requires control over log formats (Section~\ref{nested-top-actions}).  
+the layout of application data, although it does not provide full transactions.
+%However, its approach does not handle concurrent
+%transactions well because the addition of concurrency support to transactional
+%data structures typically requires control over log formats (Section~\ref{nested-top-actions}).  
 %However, LRVM's use of virtual memory to implement the buffer pool 
 %does not seem to be incompatible with our work, and it would be 
 %interesting to consider potential combinations of our approach 
 %with that of LRVM.  In particular, the recovery algorithm that is used to 
 %implement LRVM could be changed, and \yad's logging interface could 
 %replace the narrow interface that LRVM provides.  Also, 
-
-LRVM's inter- 
-and intra-transactional log optimizations collapse multiple updates 
-into a single log entry.  In the past, we have implemented such 
-optimizations in an ad-hoc fashion in \yad.  However, we believe 
-that we have developed the necessary API hooks 
-to allow extensions to \yad to transparently coalesce log entries in the future (Section~\ref{TransClos}).
+%LRVM's inter- 
+%and intra-transactional log optimizations collapse multiple updates 
+%into a single log entry.  In the past, we have implemented such 
+%optimizations in an ad-hoc fashion in \yad.  However, we believe 
+%that we have developed the necessary API hooks 
+%to allow extensions to \yad to transparently coalesce log entries in the future (Section~\ref{TransClos}).
 LRVM's
 approach of keeping a single in-memory copy of data in the applications
 address space is similar to the optimization presented in
-Section~\ref{OASYS}, but our approach circumvents the limitations of
-LRVM that were mentioned above, providing the full flexibility of the 
-ARIES algorithm.
+Section~\ref{OASYS}, but our approach circumvents can support full transactions as needed.
+

 %\begin{enumerate}
 %  \item {\bf Incredibly scalable, simple servers CHT's, google fs?, ...}

 Finally, some applications require incredibly simple but extremely
-scalable storage mechanisms.  Cluster hash tables are a good example
+scalable storage mechanisms.  Cluster hash tables~\cite{cht} are a good example
 of the type of system that serves these applications well, due to
 their relative simplicity and good scalability.  Depending
 on the fault model on which a cluster hash table is based, it is
@ -457,14 +443,13 @@ atomicity semantics may be relaxed under certain circumstances.  \yad is unique
 \rcs{compare and contrast with boxwood!!}


-We believe that \yad can support all of these
-applications. We will demonstrate several of them, but leave
-implementation of a real DBMS, LRVM and Boxwood to future work.
-However, in each case it is relatively easy to see how they would map
-onto \yad.
+We believe that \yad can support all of these systems. We will
+demonstrate several of them, but leave implementation of a real DBMS,
+LRVM and Boxwood to future work.  However, in each case it is
+relatively easy to see how they would map onto \yad.


-\eab{DB Toolkit from Wisconsin?}
+%\eab{DB Toolkit from Wisconsin?}