122 lines
6.2 KiB
Text
122 lines
6.2 KiB
Text
|
|
Russell Sears
|
|
Eric Brewer
|
|
|
|
Automated Verification and Optimization of Extensions to Transactional
|
|
Storage Systems.
|
|
|
|
Existing transactional systems are designed to handle specific
|
|
workloads well. Unfortunately, these systems' implementations are
|
|
geared toward specific workloads and data layouts such as those
|
|
traditionally associated with SQL. Lower level implementations such
|
|
as Berkeley DB handle a wider variety of workloads and are built in a
|
|
modular fashion. However, they do not provide APIs to allow
|
|
applications to build upon or modify low level policies such as
|
|
allocation strategies, page layout or details of the recovery
|
|
algorithm. Furthermore, data structure implementations are typically
|
|
not broken into separable, public API's, encouraging a "from scratch"
|
|
approach to the implementation of extensions.
|
|
|
|
Contrast this to the handling of data structures within modern object
|
|
oriented programming languages such as Java or C++ that provide a
|
|
large number of data storage algorithm implementations. Such
|
|
structures may be used interchangeably with application-specific data
|
|
collections, and collection implementations can be composed into more
|
|
sophisticated data structures.
|
|
|
|
We have implemented LLADD (/yad/), an extensible transactional storage
|
|
implementation that takes a composable and layered approach to
|
|
transactional storage. In other work, we show that its performance on
|
|
traditional workloads is competitive with existing systems and show
|
|
significant increases in throughput and memory utilization on
|
|
specialized workloads.[XXX]
|
|
|
|
We further argue that because of its natural integration into standard
|
|
system software development practices our library can be naturally
|
|
extended into networked and distributed domains. Typical
|
|
write-ahead-logging protocols implicitly implement machine
|
|
independent, reorderable log entries in order to implement logical
|
|
undo. These two properties have been crucial in past system software
|
|
designs, including data replication, distribution, and conflict
|
|
resolution algorithms. Therefore, we plan to provide a networked,
|
|
logical redo log as an application-level primitive, and to explore
|
|
system designs that leverage these primitives.
|
|
|
|
However, our approach assumes that application developers will
|
|
correctly implement new transactional structures even though these
|
|
data structures are notoriously difficult to implement correctly. In
|
|
this work we present our current attempts to address these concerns.
|
|
|
|
For such infrastructure to be generally useful, however, the
|
|
functionality that it provides should be efficient, reliable and
|
|
applicable to new application domains. We believe that ease of
|
|
development is a prerequisite to our other goals.
|
|
|
|
Application developers typically have a limited amount of time to
|
|
spend implementing and verifying application-specific storage
|
|
extensions, and bugs in these extensions affect data durability.
|
|
While the underlying data structure algorithms tend to be simple and
|
|
easily understood, performance tuning and verification of
|
|
implementation correctness is extremely difficult.
|
|
|
|
Recovery based algorithms must behave correctly during forward
|
|
operation and also under arbitrary recovery scenarios. The latter
|
|
requirement is particularly difficult to verify due to the large
|
|
number of materialized page file states that could occur after a
|
|
crash.
|
|
|
|
Fortunately, write-ahead-logging schemes such as ARIES make use of
|
|
nested-top-actions to vastly simplify the problem. Given the
|
|
correctness of page based physical undo and redo, logical undo may
|
|
assume that page spanning operations are applied to the data store
|
|
atomically.
|
|
|
|
Existing work in the static-analysis community has verified that
|
|
device driver implementations correctly adhere to complex operating
|
|
system kernel locking schemes[SLAM]. If we formalize LLADD's latching
|
|
and logging APIs, we believe that analyses such as these will be
|
|
directly applicable, and allow us to verify that data structure
|
|
behavior during recovery is equivalent to its behavior on each prefix
|
|
of the log produced during normal forward operation.
|
|
|
|
By using coarse (one latch per logical operation) latching, we can
|
|
drastically reduce the size of this space, allowing conventional
|
|
state-state based search techniques (such as randomized or exhaustive
|
|
state-space searches, or simple unit testing techniques) to be
|
|
practical. It has been shown that such coarse grained latching can
|
|
yield high performance concurrent data structures if
|
|
semantics-preserving optimizations such as page prefetching are
|
|
applied[ARIES/IM].
|
|
|
|
A separate approach toward static analysis of LLADD extensions
|
|
involves compiler optimization techniques. Software built on top of
|
|
layered API's frequently makes repeated calls to low level functions
|
|
that must repeat work. A common example in LLADD involves loops over
|
|
data with good locality in the page file. The vast majority of the
|
|
time, these loops call high level API's that needlessly pin and unpin
|
|
the same underlying data.
|
|
|
|
The code for each of these high level API calls could be copied into
|
|
many different variants with different pinning/unpinning and
|
|
latching/unlatching behavior, but this would greatly complicate the
|
|
API that application developers must work with, and complicate any
|
|
application code that make use of such optimizations.
|
|
|
|
Compiler optimization techniques such as partial common subexpression
|
|
elimination solve an analogous problem to remove unnecessary algebraic
|
|
computations. We hope to extend such techniques to reduce the number
|
|
of buffer manager and locking calls made by existing code at runtime.
|
|
|
|
Our implementation of LLADD is still unstable and inappropriate for
|
|
use on important data. We hope to validate our static analysis tools
|
|
by incorporating them into LLADD's development process as we increase
|
|
the reliability and overall quality of our implementation and its
|
|
API's.
|
|
|
|
LLADD provides a set of tools that allow applications to implement
|
|
custom transactional data structures and page layouts. This avoids
|
|
"impedance mismatch," simplifying applications and improving
|
|
performance. By adding support for automated code verification and
|
|
transformations we hope to make it easy to produce correct extensions
|
|
and to allow simple, maintainable implementations to compete with
|
|
carefully crafted, hand-optimized code.
|