Initial version of HPTS submission.

2005-03-29 02:09:12 +00:00 · 2005-03-29 02:09:12 +00:00 · 160caf0dfe
commit 160caf0dfe
parent cbd52297b5
1 changed files with 126 additions and 0 deletions
--- a/doc/position-paper/LLADD.txt
+++ b/doc/position-paper/LLADD.txt
@ -0,0 +1,126 @@
+
+Russell Sears
+Eric Brewer
+
+Automated Verification and Optimization of Extensions to Transactional
+Storage Systems.
+
+Existing transactional systems are designed to handle specific
+workloads well.  Unfortunately, these systems' implementations are
+geared toward specific workloads and data layouts such as those
+traditionally associated with SQL.  Lower level implementations such
+as Berkeley DB handle a wider variety of workloads and are built in a
+modular fashion.  However, they do not provide APIs to allow
+applications to build upon or modify low level policies such as
+allocation strategies, page layout or details of the recovery
+algorithm.  Furthermore, data structure implementations are typically
+not broken into separable, public API's, encouraging a "from scratch"
+approach to the implementation of extensions.
+
+Contrast this to the handling of data structures within modern object
+oriented programming languages such as Java or C++ that provide a
+large number of data storage algorithm implementations.  Such
+structures may be used interchangeably with application-specific data
+collections, and collection implementations can be composed into more
+sophisticated data structures.
+
+We have implemented LLADD (/yad/), an extensible transactional storage
+implementation that takes a more modular approach to transactional
+storage API's.  In other work, we show that its performance on
+traditional workloads is competitive with existing systems tuned for
+such workloads and present significant increases in throughput and
+memory utilization on specialized workloads.[XXX]
+
+However, our approach assumes that application developers will
+correctly implement new transactional structures but these data
+structures are notoriously difficult to implement correctly.  In this
+work we present our current attempts to address these concerns.
+
+We further argue that because of its natural integration into standard
+system software development our library can be naturally extended into
+networked and distributed domains.  Typical write-ahead-logging
+protocols implicitly implement machine independent, reorderable log
+entries in order to implement logical undo.  These two properties have
+been crucial in the development of past system software designs, and
+have been used to implement techniques such as data replication,
+distribution, and conflict resolution among others.  Therefore, we
+plan to provide a networked, logical redo log as an application-level
+primitive, and to explore system designs that leverage these
+primitives.
+
+For such infrastructure to be generally useful, however, the
+functionality that it provides should be efficient, reliable and easy
+to apply to new application domains.  We believe that ease of
+development is a prerequisite to the other two goals, since
+application developers typically have a limited amount of time to
+spend implementing and verifying application-specific storage
+extensions.  While the underlying data structure algorithms tend to be
+simple and easily understood, verification of implementation
+correctness is extremely difficult.
+
+Recovery based algorithms must behave correctly during forward
+operation, and also under arbitrary recovery scenarios.  The latter
+requirement is particularly difficult to verify due to the large
+number of materialized page file states that could occur after a
+crash.
+
+Fortunately, write-ahead-logging schemes such as ARIES that make use
+of nested-top-actions vastly simplify the problem, since, given the
+correctness of page based physical undo and redo, logical undo may
+assume that page spanning operations are applied to the data store
+atomically.  Furthermore, semantics-preserving optimizations such as
+prefetching pages that could be altered by a index operation allow
+efficient, concurrent modifications of data stores, even when
+extremely coarse latches are used.
+
+Existing work in the static-analysis community has verified that
+device driver implementations correctly adhere to complex operating
+system kernel locking schemes[SLAM]. By formalizing LLADD's latching
+and logging APIs, we believe that analyses such as these would be
+directly applicable, and allow us to verify (assuming the correctness
+of LLADD's implementation, and the correctness of page-wise physical
+redo) that data structure behavior during recovery is equivalent to
+its behavior on each prefix of the log that could exist during
+runtime.
+
+By using enforcing coarse (one latch per logical operation) latching,
+we can drastically reduce the size of this space, allowing
+conventional state-state based search techniques (such as randomized
+or exhaustive state-space searches, or simple unit testing techniques)
+to be practical.  It has been shown that such coarse grained latching
+can yield high performance data structures [ARIES/IM].
+
+A separate approach toward static analysis of LLADD extensions
+involves compiler optimization techniques, and is based upon the
+observation that code built on top of layered API's frequently make
+repeated calls to low level functions that must repeat work.  A common
+example in LLADD involves loops over data with good locality in the
+page file.  The vast majority of the time, these loops call high level
+API's that must pin and unpin the underlying data for each call.
+
+Each of these high level API calls could be copied into many different
+variants with different pinning/unpinning and latching/unlatching
+behavior, but this would greatly complicate the API that developers
+must learn to run, and application code would have to be convoluted in
+order to check to track changes in page numbers.  Compiler
+optimizations techniques such as partial common subexpression
+elimination solve an analogous problem to remove unnecessary algebraic
+computations.  We hope to extend such techniques to reduce the number
+of buffer manager and locking calls made by existing code at runtime.
+
+Our implementation of LLADD is still unstable and inappropriate for
+use on important data.  We hope to validate our static analysis tools
+by incorporating them into LLADD's development process as we increase
+the reliability and overall of our implementation and its API's.
+
+We believe that by providing application developers with high level
+tools that make it easy to implement custom data structures and page
+layouts in a transactional environment, we have already increased the
+space of application optimizations that are available to software
+developers.  By adding support for library-specific verification
+techniques and transformation of low level program semantics, we hope
+to make it easy to verify that these extensions are correct, and to
+provide automated optimizations that allow simple, maintainable code
+to compete with carefully crafted, hand-optimized code.
+
+