stasis-aries-wal/doc/position-paper/LLADD.txt

Russell Sears
Eric Brewer

Automated Verification and Optimization of Extensions to Transactional
Storage Systems.

Existing transactional systems are designed to handle specific
workloads well.  Unfortunately, these systems' implementations are
geared toward specific workloads and data layouts such as those
traditionally associated with SQL.  Lower level implementations such
as Berkeley DB handle a wider variety of workloads and are built in a
modular fashion.  However, they do not provide APIs to allow
applications to build upon and modify low level policies such as
allocation strategies, page layout or details of recovery semantics.
Furthermore, data structure implementations are typically
not broken into separable, public APIs, encouraging a "from scratch"
approach to the implementation of extensions.

Contrast this to the handling of data structures within modern object
oriented programming languages such as Java or C++.  Such languages typically provide a
large number of data storage algorithm implementations.  These
structures may be used interchangeably with application-specific data
collections, and collection implementations can be composed into more
sophisticated data structures.  

We have implemented LLADD (/yad/), an extensible transactional storage
library that takes a composable and layered approach to
transactional storage.  Below, we present some of the high level
features and performance characteristics of this system and discuss
our plans to extend the system into distributed domains.  Finally we
introduce our current research focus, the application of automated
program verification and optimization techniques to application specific extensions.  Such
techniques should significantly enhance the usability and performance
of our system.

Overview of the LLADD Architecture

General purpose transactional storage systems are extremely complex
and only handle certain types of workloads efficiently.  However, new
types of applications and workloads are introduced on a regular basis.
This results in the implementation of specialized, ad-hoc data storage
systems from scratch, wasting resources and preventing code reuse.

Instead of developing a set of general purpose data structures that
attempt to behave well across many workloads, we have implemented a
lower level API that makes it easy for application designers to
implement specialized data structures.  Essentially, we have
implemented a modern, extensible navigational database system.  We
believe that this system will support modern development practices and
address new applications that are evolving too quickly to allow
appropriate general-purpose solutions to be developed.

The library is based upon an extensible version of ARIES but does not
hard-code details such as page format or data structure implementation.
It provides a number of "operation" implementations which consist of
redo/undo implementations that apply log entries and wrapper
functions that produce log entries.
During normal forward operations, page file writes are processed by
applying redo entries from the log.  Other than the invocation of code
that allocates and writes log entries there is no difference between
the redo phase of recovery and normal forward operation.  This reduces
the amount of code that must be developed in order to implement new
data structures and page layouts.

Of course, LLADD ships with a number of default data structures and
layouts, ranging from byte-level page layouts to a linear hashtable 
that was built using high-level reusable components.  The 
hashtable is implemented on top of a resizable array and a
locality preserving linked list implementation.

Unlike existing solutions, we view data structure implementations from
a reusability standpoint, allowing and encouraging application
developers to compose existing transactional operations into
application-specific data structures.

In other work, we have shown that the system is competitive with
Berkeley DB on traditional (hashtable based) workloads, and have shown
significant performance improvements for less conventional workloads
including custom data structure implementations, graph traversal 
algorithms and transactional object persistence workloads.

We showed a 2-3x performance improvement over Berkeley DB on object
persistence across our benchmarks, and a 3-4x improvement over an
in-process version of MySQL with the InnoDB backend.  (A traditional,
IPC-based MySQL benchmark was prohibitively slow and InnoDB provided 
the best performance among MySQL's durable storage managers.)

Furthermore, our system only keeps one copy of each object in memory
at a time, while most existing systems keep a second copy in the
transactional system's page cache (and possibly a third copy in 
operating system cache).  Therefore, our system can cache roughly
twice as many objects in memory as the systems we compared it to.  We leave systematic
performance tuning of LLADD to future work, and believe that further 
optimizations would
improve our performance on these benchmarks significantly.

LLADD's customizability provides superior performance over existing,
complex systems.  Because of its natural 
integration into standard system software development practices, we think that LLADD
can be naturally extended into networked and distributed domains.  

For example, typical write-ahead-logging protocols implicitly implement machine
independent, reorderable log entries in order to implement logical
undo.  These two properties have been crucial in past system software
designs, including data replication, distribution, and conflict
resolution algorithms.  Therefore, we plan to provide a networked,
logical redo log as an application-level primitive, and to explore
system designs that leverage these primitives.

However, our approach assumes that application developers will
implement high performance transactional data structures.  This 
is a big assumption, as these
data structures are notoriously difficult to implement correctly.  
Our current research attempts to address these concerns.

For our infrastructure to be generally useful the
functionality that it provides should be efficient, reliable and
applicable to new application domains.  We believe that improvements 
to the development process can address each of these goals.

Application developers typically have a limited amount of time to
spend implementing and verifying application-specific storage
extensions, but bugs in these extensions have dire consequences.
Also, while data structure algorithms tend to be simple and
easily understood, performance tuning and verification of
implementation correctness is extremely difficult.

Recovery based algorithms must behave correctly during forward
operation and also under arbitrary recovery scenarios.  The latter
requirement is particularly difficult to verify due to the large
number of materialized page file states that could occur after a
crash.

Fortunately, write-ahead-logging schemes such as ARIES make use of
nested-top-actions to vastly simplify the problem.  Given the
correctness of page based physical undo and redo, logical undo may
assume that page spanning operations are applied to the data store
atomically.

Existing work in the static-analysis community has verified that
device driver implementations correctly adhere to complex operating
system kernel locking schemes[SLAM]. If we formalize LLADD's latching
and logging APIs, we believe that analyses such as these will be
directly applicable, allowing us to verify that data structure
behavior during recovery is equivalent to the behavior that would 
result if an abort() was issued on each prefix of the log that is 
generated during normal forward operation.

By using coarse (one latch per logical operation) latching, we can
drastically reduce the size of this space, allowing conventional
state-state based search techniques (such as randomized or exhaustive
state-space searches, or even standard unit testing techniques) to be
practical.  It has been shown that such coarse grained latching can
yield high performance concurrent data structures if
semantics-preserving optimizations such as page prefetching are
applied[ARIES/IM].

A separate approach to the static analysis of LLADD extensions
uses compiler optimization techniques.  Software built on top of
layered APIs frequently makes repeated calls to low level functions
that must repeat work.  A common example in LLADD involves loops over
data with good locality in the page file.  The vast majority of the
time, these loops call high level APIs that needlessly pin and unpin
the same underlying data.

The code for each of these high level API calls could be copied into
many different variants with different pinning/unpinning and
latching/unlatching behavior, but this would greatly complicate the
API that application developers must work with, and complicate any
application code that make use of such optimizations.

Compiler optimization techniques such as partial common subexpression
elimination solve an analogous problem to remove unnecessary algebraic
computations.  We hope to extend such techniques to reduce the number
of buffer manager and locking calls made by existing code at runtime.

We suspect that similar optimization techniques are applicable to
application code.  Because local LLADD calls are simply normal
function calls, it may even be possible to push the optimizations
mentioned above up into application code, providing a class of
optimizations that would be very difficult to replicate with existing
transactional storage systems.  However, combining this technique with
distributed storage systems may raise a number of interesting
questions.

Our implementation of LLADD is still unstable and inappropriate for
use on important data.  We hope to validate our static analysis tools
by incorporating them into LLADD's development process as we increase
the reliability and overall quality of our implementation and its
APIs.

LLADD provides a set of tools that allow applications to implement
custom transactional data structures and page layouts.  This avoids
"impedance mismatch," simplifying applications and improving
performance.  By adding support for automated code verification and
transformations we hope to make it easy to produce correct extensions
and to allow simple, maintainable implementations to compete with
carefully crafted, hand-optimized code.
Initial version of HPTS submission. 2005-03-29 02:09:12 +00:00			`Russell Sears`
			`Eric Brewer`

			`Automated Verification and Optimization of Extensions to Transactional`
			`Storage Systems.`

			`Existing transactional systems are designed to handle specific`
			`workloads well. Unfortunately, these systems' implementations are`
			`geared toward specific workloads and data layouts such as those`
			`traditionally associated with SQL. Lower level implementations such`
			`as Berkeley DB handle a wider variety of workloads and are built in a`
			`modular fashion. However, they do not provide APIs to allow`
Added more LLADD details; it probably has grammer mistakes now. 2005-03-30 01:42:14 +00:00			`applications to build upon and modify low level policies such as`
			`allocation strategies, page layout or details of recovery semantics.`
			`Furthermore, data structure implementations are typically`
			`not broken into separable, public APIs, encouraging a "from scratch"`
Initial version of HPTS submission. 2005-03-29 02:09:12 +00:00			`approach to the implementation of extensions.`

			`Contrast this to the handling of data structures within modern object`
Added more LLADD details; it probably has grammer mistakes now. 2005-03-30 01:42:14 +00:00			`oriented programming languages such as Java or C++. Such languages typically provide a`
			`large number of data storage algorithm implementations. These`
Initial version of HPTS submission. 2005-03-29 02:09:12 +00:00			`structures may be used interchangeably with application-specific data`
			`collections, and collection implementations can be composed into more`
Added more LLADD details; it probably has grammer mistakes now. 2005-03-30 01:42:14 +00:00			`sophisticated data structures.`
Initial version of HPTS submission. 2005-03-29 02:09:12 +00:00
			`We have implemented LLADD (/yad/), an extensible transactional storage`
Added more LLADD details; it probably has grammer mistakes now. 2005-03-30 01:42:14 +00:00			`library that takes a composable and layered approach to`
			`transactional storage. Below, we present some of the high level`
			`features and performance characteristics of this system and discuss`
			`our plans to extend the system into distributed domains. Finally we`
			`introduce our current research focus, the application of automated`
			`program verification and optimization techniques to application specific extensions. Such`
			`techniques should significantly enhance the usability and performance`
			`of our system.`

			`Overview of the LLADD Architecture`

			`General purpose transactional storage systems are extremely complex`
			`and only handle certain types of workloads efficiently. However, new`
			`types of applications and workloads are introduced on a regular basis.`
			`This results in the implementation of specialized, ad-hoc data storage`
			`systems from scratch, wasting resources and preventing code reuse.`

			`Instead of developing a set of general purpose data structures that`
			`attempt to behave well across many workloads, we have implemented a`
			`lower level API that makes it easy for application designers to`
			`implement specialized data structures. Essentially, we have`
			`implemented a modern, extensible navigational database system. We`
			`believe that this system will support modern development practices and`
			`address new applications that are evolving too quickly to allow`
			`appropriate general-purpose solutions to be developed.`

			`The library is based upon an extensible version of ARIES but does not`
			`hard-code details such as page format or data structure implementation.`
			`It provides a number of "operation" implementations which consist of`
			`redo/undo implementations that apply log entries and wrapper`
			`functions that produce log entries.`
			`During normal forward operations, page file writes are processed by`
			`applying redo entries from the log. Other than the invocation of code`
			`that allocates and writes log entries there is no difference between`
			`the redo phase of recovery and normal forward operation. This reduces`
			`the amount of code that must be developed in order to implement new`
			`data structures and page layouts.`

			`Of course, LLADD ships with a number of default data structures and`
			`layouts, ranging from byte-level page layouts to a linear hashtable`
			`that was built using high-level reusable components. The`
			`hashtable is implemented on top of a resizable array and a`
			`locality preserving linked list implementation.`

			`Unlike existing solutions, we view data structure implementations from`
			`a reusability standpoint, allowing and encouraging application`
			`developers to compose existing transactional operations into`
			`application-specific data structures.`

			`In other work, we have shown that the system is competitive with`
			`Berkeley DB on traditional (hashtable based) workloads, and have shown`
			`significant performance improvements for less conventional workloads`
			`including custom data structure implementations, graph traversal`
			`algorithms and transactional object persistence workloads.`

			`We showed a 2-3x performance improvement over Berkeley DB on object`
			`persistence across our benchmarks, and a 3-4x improvement over an`
			`in-process version of MySQL with the InnoDB backend. (A traditional,`
			`IPC-based MySQL benchmark was prohibitively slow and InnoDB provided`
			`the best performance among MySQL's durable storage managers.)`

			`Furthermore, our system only keeps one copy of each object in memory`
			`at a time, while most existing systems keep a second copy in the`
			`transactional system's page cache (and possibly a third copy in`
			`operating system cache). Therefore, our system can cache roughly`
			`twice as many objects in memory as the systems we compared it to. We leave systematic`
			`performance tuning of LLADD to future work, and believe that further`
			`optimizations would`
			`improve our performance on these benchmarks significantly.`

			`LLADD's customizability provides superior performance over existing,`
			`complex systems. Because of its natural`
			`integration into standard system software development practices, we think that LLADD`
			`can be naturally extended into networked and distributed domains.`

			`For example, typical write-ahead-logging protocols implicitly implement machine`
Ran ESC-q on whole buffer. 2005-03-29 03:00:26 +00:00			`independent, reorderable log entries in order to implement logical`
			`undo. These two properties have been crucial in past system software`
			`designs, including data replication, distribution, and conflict`
			`resolution algorithms. Therefore, we plan to provide a networked,`
			`logical redo log as an application-level primitive, and to explore`
			`system designs that leverage these primitives.`
Initial version of HPTS submission. 2005-03-29 02:09:12 +00:00
Updated some wording problems 2005-03-29 02:58:54 +00:00			`However, our approach assumes that application developers will`
Added more LLADD details; it probably has grammer mistakes now. 2005-03-30 01:42:14 +00:00			`implement high performance transactional data structures. This`
			`is a big assumption, as these`
			`data structures are notoriously difficult to implement correctly.`
			`Our current research attempts to address these concerns.`
Updated some wording problems 2005-03-29 02:58:54 +00:00
Added more LLADD details; it probably has grammer mistakes now. 2005-03-30 01:42:14 +00:00			`For our infrastructure to be generally useful the`
Ran ESC-q on whole buffer. 2005-03-29 03:00:26 +00:00			`functionality that it provides should be efficient, reliable and`
Added more LLADD details; it probably has grammer mistakes now. 2005-03-30 01:42:14 +00:00			`applicable to new application domains. We believe that improvements`
			`to the development process can address each of these goals.`
Updated some wording problems 2005-03-29 02:58:54 +00:00
Ran ESC-q on whole buffer. 2005-03-29 03:00:26 +00:00			`Application developers typically have a limited amount of time to`
Initial version of HPTS submission. 2005-03-29 02:09:12 +00:00			`spend implementing and verifying application-specific storage`
Added more LLADD details; it probably has grammer mistakes now. 2005-03-30 01:42:14 +00:00			`extensions, but bugs in these extensions have dire consequences.`
			`Also, while data structure algorithms tend to be simple and`
Ran ESC-q on whole buffer. 2005-03-29 03:00:26 +00:00			`easily understood, performance tuning and verification of`
Updated some wording problems 2005-03-29 02:58:54 +00:00			`implementation correctness is extremely difficult.`
Initial version of HPTS submission. 2005-03-29 02:09:12 +00:00
			`Recovery based algorithms must behave correctly during forward`
Updated some wording problems 2005-03-29 02:58:54 +00:00			`operation and also under arbitrary recovery scenarios. The latter`
Initial version of HPTS submission. 2005-03-29 02:09:12 +00:00			`requirement is particularly difficult to verify due to the large`
			`number of materialized page file states that could occur after a`
			`crash.`

Ran ESC-q on whole buffer. 2005-03-29 03:00:26 +00:00			`Fortunately, write-ahead-logging schemes such as ARIES make use of`
			`nested-top-actions to vastly simplify the problem. Given the`
Initial version of HPTS submission. 2005-03-29 02:09:12 +00:00			`correctness of page based physical undo and redo, logical undo may`
			`assume that page spanning operations are applied to the data store`
Ran ESC-q on whole buffer. 2005-03-29 03:00:26 +00:00			`atomically.`
Initial version of HPTS submission. 2005-03-29 02:09:12 +00:00
			`Existing work in the static-analysis community has verified that`
			`device driver implementations correctly adhere to complex operating`
Updated some wording problems 2005-03-29 02:58:54 +00:00			`system kernel locking schemes[SLAM]. If we formalize LLADD's latching`
			`and logging APIs, we believe that analyses such as these will be`
Added more LLADD details; it probably has grammer mistakes now. 2005-03-30 01:42:14 +00:00			`directly applicable, allowing us to verify that data structure`
			`behavior during recovery is equivalent to the behavior that would`
			`result if an abort() was issued on each prefix of the log that is`
			`generated during normal forward operation.`
Ran ESC-q on whole buffer. 2005-03-29 03:00:26 +00:00
			`By using coarse (one latch per logical operation) latching, we can`
			`drastically reduce the size of this space, allowing conventional`
			`state-state based search techniques (such as randomized or exhaustive`
Added more LLADD details; it probably has grammer mistakes now. 2005-03-30 01:42:14 +00:00			`state-space searches, or even standard unit testing techniques) to be`
Ran ESC-q on whole buffer. 2005-03-29 03:00:26 +00:00			`practical. It has been shown that such coarse grained latching can`
			`yield high performance concurrent data structures if`
			`semantics-preserving optimizations such as page prefetching are`
			`applied[ARIES/IM].`
Initial version of HPTS submission. 2005-03-29 02:09:12 +00:00
Added more LLADD details; it probably has grammer mistakes now. 2005-03-30 01:42:14 +00:00			`A separate approach to the static analysis of LLADD extensions`
			`uses compiler optimization techniques. Software built on top of`
			`layered APIs frequently makes repeated calls to low level functions`
Ran ESC-q on whole buffer. 2005-03-29 03:00:26 +00:00			`that must repeat work. A common example in LLADD involves loops over`
			`data with good locality in the page file. The vast majority of the`
Added more LLADD details; it probably has grammer mistakes now. 2005-03-30 01:42:14 +00:00			`time, these loops call high level APIs that needlessly pin and unpin`
Ran ESC-q on whole buffer. 2005-03-29 03:00:26 +00:00			`the same underlying data.`

			`The code for each of these high level API calls could be copied into`
			`many different variants with different pinning/unpinning and`
			`latching/unlatching behavior, but this would greatly complicate the`
			`API that application developers must work with, and complicate any`
			`application code that make use of such optimizations.`
Updated some wording problems 2005-03-29 02:58:54 +00:00
			`Compiler optimization techniques such as partial common subexpression`
Initial version of HPTS submission. 2005-03-29 02:09:12 +00:00			`elimination solve an analogous problem to remove unnecessary algebraic`
			`computations. We hope to extend such techniques to reduce the number`
			`of buffer manager and locking calls made by existing code at runtime.`

Added more LLADD details; it probably has grammer mistakes now. 2005-03-30 01:42:14 +00:00			`We suspect that similar optimization techniques are applicable to`
			`application code. Because local LLADD calls are simply normal`
			`function calls, it may even be possible to push the optimizations`
			`mentioned above up into application code, providing a class of`
			`optimizations that would be very difficult to replicate with existing`
			`transactional storage systems. However, combining this technique with`
			`distributed storage systems may raise a number of interesting`
			`questions.`

Initial version of HPTS submission. 2005-03-29 02:09:12 +00:00			`Our implementation of LLADD is still unstable and inappropriate for`
			`use on important data. We hope to validate our static analysis tools`
			`by incorporating them into LLADD's development process as we increase`
Ran ESC-q on whole buffer. 2005-03-29 03:00:26 +00:00			`the reliability and overall quality of our implementation and its`
Added more LLADD details; it probably has grammer mistakes now. 2005-03-30 01:42:14 +00:00			`APIs.`
Ran ESC-q on whole buffer. 2005-03-29 03:00:26 +00:00
			`LLADD provides a set of tools that allow applications to implement`
			`custom transactional data structures and page layouts. This avoids`
			`"impedance mismatch," simplifying applications and improving`
			`performance. By adding support for automated code verification and`
			`transformations we hope to make it easy to produce correct extensions`
			`and to allow simple, maintainable implementations to compete with`
			`carefully crafted, hand-optimized code.`