More refinements.

2005-03-31 02:48:34 +00:00 · 2005-03-31 02:48:34 +00:00 · dbef511fbc
commit dbef511fbc
parent 2a5a730b29
1 changed files with 80 additions and 51 deletions
--- a/doc/position-paper/LLADD.txt
+++ b/doc/position-paper/LLADD.txt
@ -14,14 +14,14 @@ applications to build upon and modify low level policies such as
 allocation strategies, page layout or details of recovery semantics.
 Furthermore, data structure implementations are typically not broken
 into separable, public APIs, encouraging a "from scratch" approach to
-the implementation of extensions.
+the implementation of new transactional data structures. 
 Contrast this to the handling of data structures within modern object
-oriented programming languages such as Java or C++.  Such languages
+oriented programming languages such as C++ or Java.  Such languages
 typically provide a large number of data storage algorithm
 implementations.  These structures may be used interchangeably with
 application-specific data collections, and collection implementations
-can be composed into more sophisticated data structures.
+may be composed into more sophisticated data structures.
 We have implemented LLADD (/yad/), an extensible transactional storage
 library that takes a composable and layered approach to transactional
@ -35,6 +35,7 @@ usability and performance of our system, allowing application
 developers to implement sophisticated cross-layer optimizations easily.
 Overview of the LLADD Architecture
 ----------------------------------
 General purpose transactional storage systems are extremely complex
 and only handle certain types of workloads efficiently.  However, new
@ -49,51 +50,75 @@ implement specialized data structures.  Essentially, we have
 implemented an extensible navigational database system.  We
 believe that this system will support modern development practices and
 address rapidly evolving applications before 
-appropriate general-purpose solutions have been developed.  In cases
+appropriate general-purpose solutions have been developed.  
 In cases
 where the development of a general-purpose solution is not economical, 
 our approach should lead to maintainable and efficient long-term 
-solutions.
+solutions.  Semi-structured data stores provide good examples of both 
 types of scenarios.  General XML storage technologies are improving
 rapidly, but still fail to handle many types of applications.  
 For instance, 
 we know of no general purpose solution that seriously addresses 
 semi-structured scientific information, such as the large repositories 
 typical of bioinformatics research efforts[PDB, NCBI, Gene Ontology].
 While many scientific projects are moving toward XML for their data 
 representation, we have found that XML is used primarily as a data 
 interchange format, and that existing XML tools fail to address the 
 needs of automated data mining, scientific computing and interactive 
 query systems.
 LLADD is based upon an extensible version of ARIES but does not
 hard-code details such as page format or data structure
 implementation.  It provides a number of "operation" implementations
 which consist of redo/undo methods and wrapper functions.  The redo/undo 
-methods apply log entries and the wrapper functions produce log entries.  
+methods manipulate the page file by applying log entries while the 
-During normal forward
+wrapper functions produce log entries.  Redo methods handle all page
-operations, page file writes are processed by applying redo entries
+file manipulation during normal forward operation, reducing the amount 
-from the log.  Other than the invocation of code that produces
+of code that must be developed in order to implement new data structures.
-log entries, there is no difference between the redo phase of
+LLADD handles the scheduling of redo/undo invocations, disk I/O, and all 
-recovery and normal forward operation.  This reduces the amount of
+of the other details specified by the ARIES recovery algorithm, allowing
-code that must be developed in order to implement new data structures
+operation implementors to focus on the details that are important to the 
-and page layouts.
+functionality their extension provides.
-Of course, LLADD ships with a number of default data structures and
+LLADD ships with a number of default data structures and
-layouts, ranging from byte-level page layouts to a linear hashtable
+layouts, ranging from byte-level page layouts to linear hashtables 
-that was built using high-level reusable components.  The hashtable is
+and application-specific recovery schemes and data structures.  
-implemented on top of a resizable array and a locality preserving
+These structures were developed with reusability in mind, encouraging
-linked list implementation.
+developers to compose existing operations into application-specific data
 structures.  For example, the hashtable is 
 implemented on top of reusable modules that implement a resizable array 
 and two exchangeable linked list variants.  
-Unlike existing solutions we view data structure implementations from
+In other work, we show that the system is competitive with
 a reusability standpoint.  This allows and encourages
 developers to compose existing transactional operations into
 application-specific data structures.
 In other work, we have shown that the system is competitive with
 Berkeley DB on traditional (hashtable based) workloads, and have shown
 significant performance improvements for less conventional workloads
 including custom data structure implementations, graph traversal
 algorithms and transactional object persistence workloads.
-We showed a 2-3x performance improvement over Berkeley DB on object
+The transactional object persistence system was based upon the
 observation that most object perstistence schemes cache a second copy
 of each in-memory object in a page file, and often keep a third copy
 in operating system cache.  By implementing custom operations that
 assume the program maintains a correctly implemented object cache, we
 allow LLADD to service object update requests without updating the
 page file.  
 Since LLADD implements no-force, the only reason to update
 the page file is to service future application read requests.
 Therefore, we defer page file updates until the object is evicted from
 the application's object cache, eliminating the need to maintain a large 
 page cache in order to efficiently service write requests.  We also 
 leveraged our customizable log format to log differences to objects 
 instead of entire copies of objects.
 With these optimizations, we showed a 2-3x performance improvement over Berkeley DB on object
 persistence across our benchmarks, and a 3-4x improvement over an
 in-process version of MySQL with the InnoDB backend.  (A traditional
-MySQL setup that made use of a seperate server process was prohibitively 
+MySQL setup that made use of a separate server process was prohibitively 
 slow.  InnoDB provided the best performance among MySQL's durable storage managers.)
-Furthermore, our system only keeps one copy of each object in memory
+Furthermore, our system uses memory more efficiently, 
 at a time, while most existing systems keep a second copy in the
 transactional system's page cache (and possibly a third copy in
 operating system cache).  Therefore, our system can cache roughly
 twice as many objects in memory as the systems we compared it to, 
 increasing its performance advantage in situations where the size of
 system memory is a bottleneck.  
@ -103,7 +128,7 @@ believe that further optimizations will improve our performance on
 these benchmarks significantly.
 LLADD's customizability provides superior performance over existing,
-complex systems.  Because of its natural integration into standard
+complicated systems.  Because of its natural integration into standard
 system software development practices, we think that LLADD can be
 naturally extended into networked and distributed domains.
@ -115,7 +140,10 @@ distribution, and conflict resolution algorithms.  Therefore, we plan
 to provide a networked, logical redo log as an application-level
 primitive, and to explore system designs that leverage this approach.
-However, LLADD's design assumes that application developers will
+Current Research Focus
 ----------------------
 LLADD's design assumes that application developers will
 implement high performance transactional data structures.  This is a
 big assumption, as these data structures are notoriously difficult to
 implement correctly.  Our current research attempts to address these
@ -157,9 +185,9 @@ generated during normal forward operation.
 By using coarse (one latch per logical operation) latching, we can
 drastically reduce the size of this space, allowing conventional
 state-state based search techniques (such as randomized or exhaustive
-state-space searches, or even standard unit testing techniques) to be
+state-space searches, or unit testing techniques) to be
 practical.  It has been shown that such coarse grained latching can
-yield high performance concurrent data structures if
+yield high-performance concurrent data structures if
 semantics-preserving optimizations such as page prefetching are
 applied[ARIES/IM].
@ -178,30 +206,31 @@ API that application developers must work with, and complicate any
 application code that made use of such optimizations.
 Compiler optimization techniques such as partial common subexpression
-elimination solve an analogous problem to remove unnecessary algebraic
+elimination solve an analogous problem to remove redundant algebraic
 computations.  We hope to extend such techniques to reduce the number
 of buffer manager and locking calls made by existing code at runtime.
-We suspect that similar optimization techniques are applicable to
+Anecdotal evidence and personal experience suggest
 that similar optimization techniques are applicable to
 application code.  Because local LLADD calls are simply normal
-function calls, it may even be possible to push the optimizations
+function calls, it may even be possible to apply the transformations that these optimizations
-mentioned up into application code that is unaware of the underlying 
+perform to application code that is unaware of the underlying storage implementation.
-transactional storage implementation, providing a class of
+This class of
-optimizations that would be very difficult to replicate with existing
+optimizations would be very difficult to implement with existing
-transactional storage systems.  However, combining this technique with
+transactional storage systems but should significantly improve application performance.
 distributed storage systems may raise a number of interesting
 questions.
 Our implementation of LLADD is still unstable and inappropriate for
-use on important data.  We hope to validate our static analysis tools
+use on important data.  We hope to validate our ideas about static analysis 
-by incorporating them into LLADD's development process as we increase
+by incorporating them into the development process as we increase
-the reliability and overall quality of our implementation and its
+the reliability and overall quality of LLADD's implementation and its
 APIs.
-LLADD provides a set of tools that allow applications to implement
+Our architecture provides a set of tools that allow applications to implement
 custom transactional data structures and page layouts.  This avoids
-"impedance mismatch," simplifying applications and improving
+"impedance mismatch," simplifying applications and providing appropriate
-performance.  By adding support for automated code verification and
+applications with performance that is comparable or superior to other 
 general-purpose solutions.  
 By adding support for automated code verification and
 transformations we hope to make it easy to produce correct extensions
 and to allow simple, maintainable implementations to compete with
-carefully crafted, hand-optimized code.
+special purpose, hand-optimized code.