2005-03-29 02:09:12 +00:00
|
|
|
Russell Sears
|
|
|
|
Eric Brewer
|
2005-03-31 15:28:27 +00:00
|
|
|
UC Berkeley
|
2005-03-29 02:09:12 +00:00
|
|
|
|
2005-03-31 15:28:27 +00:00
|
|
|
A Flexible, Extensible Transaction Framework
|
2005-03-29 02:09:12 +00:00
|
|
|
|
|
|
|
Existing transactional systems are designed to handle specific
|
|
|
|
workloads well. Unfortunately, these systems' implementations are
|
2005-03-31 15:28:27 +00:00
|
|
|
mononolithic and hide the transactional infrastructure underneath a
|
|
|
|
SQL interface. Lower-level implementations such as Berkeley DB handle
|
|
|
|
a wider variety of workloads and are built in a more modular fashion.
|
|
|
|
However, they do not provide APIs to allow applications to build upon
|
|
|
|
and modify low-level policies such as allocation strategies, page
|
|
|
|
layout or details of recovery semantics. Furthermore, data structure
|
|
|
|
implementations are typically not broken into separable, public APIs,
|
|
|
|
which discourages the implementation of new transactional data
|
|
|
|
structures.
|
|
|
|
|
|
|
|
Contrast this to the handling of data structures within modern
|
|
|
|
object-oriented programming languages such as C++ or Java. Such
|
|
|
|
languages typically provide a large number of data storage algorithm
|
2005-03-30 17:57:43 +00:00
|
|
|
implementations. These structures may be used interchangeably with
|
|
|
|
application-specific data collections, and collection implementations
|
2005-03-31 02:48:34 +00:00
|
|
|
may be composed into more sophisticated data structures.
|
2005-03-29 02:09:12 +00:00
|
|
|
|
|
|
|
We have implemented LLADD (/yad/), an extensible transactional storage
|
2005-03-30 17:57:43 +00:00
|
|
|
library that takes a composable and layered approach to transactional
|
2005-03-30 22:39:33 +00:00
|
|
|
storage. Below, we present some of its high level features and
|
|
|
|
performance characteristics and discuss our plans to
|
2005-03-30 17:57:43 +00:00
|
|
|
extend the system into distributed domains. Finally we introduce our
|
|
|
|
current research focus, the application of automated program
|
|
|
|
verification and optimization techniques to application specific
|
|
|
|
extensions. Such techniques should significantly enhance the
|
2005-03-30 22:39:33 +00:00
|
|
|
usability and performance of our system, allowing application
|
|
|
|
developers to implement sophisticated cross-layer optimizations easily.
|
2005-03-30 01:42:14 +00:00
|
|
|
|
|
|
|
Overview of the LLADD Architecture
|
2005-03-31 02:48:34 +00:00
|
|
|
----------------------------------
|
2005-03-30 01:42:14 +00:00
|
|
|
|
2005-03-31 15:28:27 +00:00
|
|
|
General-purpose transactional storage systems are extremely complex
|
2005-03-30 01:42:14 +00:00
|
|
|
and only handle certain types of workloads efficiently. However, new
|
|
|
|
types of applications and workloads are introduced on a regular basis.
|
|
|
|
This results in the implementation of specialized, ad-hoc data storage
|
|
|
|
systems from scratch, wasting resources and preventing code reuse.
|
|
|
|
|
|
|
|
Instead of developing a set of general purpose data structures that
|
|
|
|
attempt to behave well across many workloads, we have implemented a
|
2005-03-31 15:28:27 +00:00
|
|
|
lower-level API that makes it easy for application designers to
|
2005-03-30 01:42:14 +00:00
|
|
|
implement specialized data structures. Essentially, we have
|
2005-03-30 22:39:33 +00:00
|
|
|
implemented an extensible navigational database system. We
|
2005-03-30 01:42:14 +00:00
|
|
|
believe that this system will support modern development practices and
|
2005-03-31 15:28:27 +00:00
|
|
|
allows transactions to be used in a wider range of applications.
|
2005-03-31 02:48:34 +00:00
|
|
|
|
2005-03-31 15:28:27 +00:00
|
|
|
*** This paragraph doesn't make sense to me:
|
2005-03-31 02:48:34 +00:00
|
|
|
|
2005-03-31 15:28:27 +00:00
|
|
|
In cases where the development of a general-purpose solution is not
|
|
|
|
economical, our approach should lead to maintainable and efficient
|
|
|
|
long-term solutions. Semi-structured data stores provide good
|
|
|
|
examples of both types of scenarios. General XML storage technologies
|
|
|
|
are improving rapidly, but still fail to handle many types of
|
|
|
|
applications.
|
|
|
|
|
|
|
|
*** this is risky: there are many people working on XML databases
|
2005-03-31 02:48:34 +00:00
|
|
|
For instance,
|
2005-03-31 15:28:27 +00:00
|
|
|
we know of no general-purpose solution that seriously addresses
|
2005-03-31 02:48:34 +00:00
|
|
|
semi-structured scientific information, such as the large repositories
|
|
|
|
typical of bioinformatics research efforts[PDB, NCBI, Gene Ontology].
|
2005-03-31 15:28:27 +00:00
|
|
|
Although many scientific projects are moving toward XML for their data
|
2005-03-31 02:48:34 +00:00
|
|
|
representation, we have found that XML is used primarily as a data
|
|
|
|
interchange format, and that existing XML tools fail to address the
|
|
|
|
needs of automated data mining, scientific computing and interactive
|
|
|
|
query systems.
|
2005-03-30 01:42:14 +00:00
|
|
|
|
2005-03-30 22:39:33 +00:00
|
|
|
LLADD is based upon an extensible version of ARIES but does not
|
2005-03-30 17:57:43 +00:00
|
|
|
hard-code details such as page format or data structure
|
|
|
|
implementation. It provides a number of "operation" implementations
|
2005-03-30 22:39:33 +00:00
|
|
|
which consist of redo/undo methods and wrapper functions. The redo/undo
|
2005-03-31 02:48:34 +00:00
|
|
|
methods manipulate the page file by applying log entries while the
|
|
|
|
wrapper functions produce log entries. Redo methods handle all page
|
|
|
|
file manipulation during normal forward operation, reducing the amount
|
|
|
|
of code that must be developed in order to implement new data structures.
|
|
|
|
LLADD handles the scheduling of redo/undo invocations, disk I/O, and all
|
|
|
|
of the other details specified by the ARIES recovery algorithm, allowing
|
|
|
|
operation implementors to focus on the details that are important to the
|
|
|
|
functionality their extension provides.
|
|
|
|
|
|
|
|
LLADD ships with a number of default data structures and
|
|
|
|
layouts, ranging from byte-level page layouts to linear hashtables
|
|
|
|
and application-specific recovery schemes and data structures.
|
|
|
|
These structures were developed with reusability in mind, encouraging
|
|
|
|
developers to compose existing operations into application-specific data
|
|
|
|
structures. For example, the hashtable is
|
|
|
|
implemented on top of reusable modules that implement a resizable array
|
2005-03-31 15:28:27 +00:00
|
|
|
and two exchangeable linked-list variants.
|
2005-03-31 02:48:34 +00:00
|
|
|
|
|
|
|
In other work, we show that the system is competitive with
|
2005-03-30 01:42:14 +00:00
|
|
|
Berkeley DB on traditional (hashtable based) workloads, and have shown
|
|
|
|
significant performance improvements for less conventional workloads
|
2005-03-30 17:57:43 +00:00
|
|
|
including custom data structure implementations, graph traversal
|
2005-03-30 01:42:14 +00:00
|
|
|
algorithms and transactional object persistence workloads.
|
|
|
|
|
2005-03-31 02:48:34 +00:00
|
|
|
The transactional object persistence system was based upon the
|
|
|
|
observation that most object perstistence schemes cache a second copy
|
|
|
|
of each in-memory object in a page file, and often keep a third copy
|
|
|
|
in operating system cache. By implementing custom operations that
|
|
|
|
assume the program maintains a correctly implemented object cache, we
|
|
|
|
allow LLADD to service object update requests without updating the
|
|
|
|
page file.
|
|
|
|
|
|
|
|
Since LLADD implements no-force, the only reason to update
|
|
|
|
the page file is to service future application read requests.
|
|
|
|
Therefore, we defer page file updates until the object is evicted from
|
|
|
|
the application's object cache, eliminating the need to maintain a large
|
|
|
|
page cache in order to efficiently service write requests. We also
|
|
|
|
leveraged our customizable log format to log differences to objects
|
|
|
|
instead of entire copies of objects.
|
|
|
|
|
2005-03-31 15:28:27 +00:00
|
|
|
With these optimizations, we showed a 2-3x performance improvement
|
|
|
|
over Berkeley DB on object persistence across our benchmarks, and a
|
|
|
|
3-4x improvement over an in-process version of MySQL with the InnoDB
|
|
|
|
backend. (A traditional MySQL setup that made use of a separate
|
|
|
|
server process was prohibitively slow. InnoDB provided the best
|
|
|
|
performance among MySQL's durable storage managers.) Furthermore, our
|
|
|
|
system uses memory more efficiently, increasing its performance
|
|
|
|
advantage in situations where the size of system memory is a
|
|
|
|
bottleneck.
|
|
|
|
|
|
|
|
We leave systematic performance tuning of LLADD to future work, and
|
2005-03-30 22:39:33 +00:00
|
|
|
believe that further optimizations will improve our performance on
|
2005-03-31 15:28:27 +00:00
|
|
|
these benchmarks significantly. In general, LLADD's customizability
|
|
|
|
enables many optimizations that are difficult for other systems.
|
2005-03-30 01:42:14 +00:00
|
|
|
|
2005-03-31 15:28:27 +00:00
|
|
|
Because of its natural integration into standard
|
2005-03-30 17:57:43 +00:00
|
|
|
system software development practices, we think that LLADD can be
|
|
|
|
naturally extended into networked and distributed domains.
|
|
|
|
For example, typical write-ahead-logging protocols implicitly
|
|
|
|
implement machine independent, reorderable log entries in order to
|
|
|
|
implement logical undo. These two properties have been crucial in
|
|
|
|
past system software designs, including data replication,
|
|
|
|
distribution, and conflict resolution algorithms. Therefore, we plan
|
|
|
|
to provide a networked, logical redo log as an application-level
|
2005-03-30 22:39:33 +00:00
|
|
|
primitive, and to explore system designs that leverage this approach.
|
2005-03-29 02:09:12 +00:00
|
|
|
|
2005-03-31 02:48:34 +00:00
|
|
|
Current Research Focus
|
|
|
|
----------------------
|
|
|
|
|
|
|
|
LLADD's design assumes that application developers will
|
2005-03-31 15:28:27 +00:00
|
|
|
implement high-performance transactional data structures. This is a
|
2005-03-30 17:57:43 +00:00
|
|
|
big assumption, as these data structures are notoriously difficult to
|
|
|
|
implement correctly. Our current research attempts to address these
|
|
|
|
concerns.
|
2005-03-29 02:58:54 +00:00
|
|
|
|
2005-03-30 17:57:43 +00:00
|
|
|
For our infrastructure to be generally useful the functionality that
|
|
|
|
it provides should be efficient, reliable and applicable to new
|
|
|
|
application domains. We believe that improvements to the development
|
|
|
|
process can address each of these goals.
|
2005-03-29 02:58:54 +00:00
|
|
|
|
2005-03-29 03:00:26 +00:00
|
|
|
Application developers typically have a limited amount of time to
|
2005-03-29 02:09:12 +00:00
|
|
|
spend implementing and verifying application-specific storage
|
2005-03-30 01:42:14 +00:00
|
|
|
extensions, but bugs in these extensions have dire consequences.
|
2005-03-30 17:57:43 +00:00
|
|
|
Also, while data structure algorithms tend to be simple and easily
|
|
|
|
understood, performance tuning and verification of implementation
|
|
|
|
correctness is extremely difficult.
|
2005-03-29 02:09:12 +00:00
|
|
|
|
2005-03-31 15:28:27 +00:00
|
|
|
Recovery-based algorithms must behave correctly during forward
|
2005-03-29 02:58:54 +00:00
|
|
|
operation and also under arbitrary recovery scenarios. The latter
|
2005-03-29 02:09:12 +00:00
|
|
|
requirement is particularly difficult to verify due to the large
|
|
|
|
number of materialized page file states that could occur after a
|
|
|
|
crash.
|
|
|
|
|
2005-03-29 03:00:26 +00:00
|
|
|
Fortunately, write-ahead-logging schemes such as ARIES make use of
|
|
|
|
nested-top-actions to vastly simplify the problem. Given the
|
2005-03-31 15:28:27 +00:00
|
|
|
correctness of page-based physical undo and redo, logical undo may
|
2005-03-29 02:09:12 +00:00
|
|
|
assume that page spanning operations are applied to the data store
|
2005-03-29 03:00:26 +00:00
|
|
|
atomically.
|
2005-03-29 02:09:12 +00:00
|
|
|
|
|
|
|
Existing work in the static-analysis community has verified that
|
|
|
|
device driver implementations correctly adhere to complex operating
|
2005-03-29 02:58:54 +00:00
|
|
|
system kernel locking schemes[SLAM]. If we formalize LLADD's latching
|
|
|
|
and logging APIs, we believe that analyses such as these will be
|
2005-03-30 01:42:14 +00:00
|
|
|
directly applicable, allowing us to verify that data structure
|
2005-03-30 17:57:43 +00:00
|
|
|
behavior during recovery is equivalent to the behavior that would
|
|
|
|
result if an abort() was issued on each prefix of the log that is
|
2005-03-30 01:42:14 +00:00
|
|
|
generated during normal forward operation.
|
2005-03-29 03:00:26 +00:00
|
|
|
|
2005-03-31 15:28:27 +00:00
|
|
|
*** below implies that two operations have two latches and can thus run in parallel ***
|
|
|
|
By using coarse latching (one latch per logical operation), we can
|
2005-03-29 03:00:26 +00:00
|
|
|
drastically reduce the size of this space, allowing conventional
|
|
|
|
state-state based search techniques (such as randomized or exhaustive
|
2005-03-31 02:48:34 +00:00
|
|
|
state-space searches, or unit testing techniques) to be
|
2005-03-31 15:28:27 +00:00
|
|
|
practical. It has been shown that such coarse-grained latching can
|
2005-03-31 02:48:34 +00:00
|
|
|
yield high-performance concurrent data structures if
|
2005-03-29 03:00:26 +00:00
|
|
|
semantics-preserving optimizations such as page prefetching are
|
2005-03-31 15:28:27 +00:00
|
|
|
applied [ARIES/IM].
|
2005-03-29 02:09:12 +00:00
|
|
|
|
2005-03-30 17:57:43 +00:00
|
|
|
A separate approach to the static analysis of LLADD extensions uses
|
|
|
|
compiler optimization techniques. Software built on top of layered
|
2005-03-30 22:39:33 +00:00
|
|
|
APIs frequently makes repeated calls to low level functions that result
|
|
|
|
in repeated work. A common example in LLADD involves loops over data with
|
2005-03-30 17:57:43 +00:00
|
|
|
good locality in the page file. The vast majority of the time, these
|
2005-03-30 22:39:33 +00:00
|
|
|
loops result in a series of high level API calls that repeatedly pin
|
|
|
|
and unpin the same underlying data.
|
2005-03-29 03:00:26 +00:00
|
|
|
|
|
|
|
The code for each of these high level API calls could be copied into
|
|
|
|
many different variants with different pinning/unpinning and
|
|
|
|
latching/unlatching behavior, but this would greatly complicate the
|
|
|
|
API that application developers must work with, and complicate any
|
2005-03-30 22:39:33 +00:00
|
|
|
application code that made use of such optimizations.
|
2005-03-29 02:58:54 +00:00
|
|
|
|
2005-03-31 15:28:27 +00:00
|
|
|
*** code hoisting might be a better example
|
2005-03-29 02:58:54 +00:00
|
|
|
Compiler optimization techniques such as partial common subexpression
|
2005-03-31 02:48:34 +00:00
|
|
|
elimination solve an analogous problem to remove redundant algebraic
|
2005-03-29 02:09:12 +00:00
|
|
|
computations. We hope to extend such techniques to reduce the number
|
|
|
|
of buffer manager and locking calls made by existing code at runtime.
|
|
|
|
|
2005-03-31 15:28:27 +00:00
|
|
|
Anecdotal evidence and personal experience suggest that similar
|
|
|
|
optimization techniques are applicable to application code. Because
|
|
|
|
local LLADD calls are simply normal function calls, it may even be
|
|
|
|
possible to apply the transformations that these optimizations perform
|
|
|
|
to application code that is unaware of the underlying storage
|
|
|
|
implementation. This class of optimizations would be very difficult
|
|
|
|
to implement with existing transactional storage systems but should
|
|
|
|
significantly improve application performance.
|
|
|
|
|
|
|
|
*** no reason to say this: Our implementation of LLADD is still unstable and inappropriate for use on important data.
|
|
|
|
We hope to validate our ideas about static analysis by incorporating
|
|
|
|
them into the development process as we increase the reliability and
|
|
|
|
overall quality of LLADD's implementation and its APIs.
|
2005-03-29 03:00:26 +00:00
|
|
|
|
2005-03-31 02:48:34 +00:00
|
|
|
Our architecture provides a set of tools that allow applications to implement
|
2005-03-29 03:00:26 +00:00
|
|
|
custom transactional data structures and page layouts. This avoids
|
2005-03-31 02:48:34 +00:00
|
|
|
"impedance mismatch," simplifying applications and providing appropriate
|
|
|
|
applications with performance that is comparable or superior to other
|
|
|
|
general-purpose solutions.
|
|
|
|
By adding support for automated code verification and
|
2005-03-29 03:00:26 +00:00
|
|
|
transformations we hope to make it easy to produce correct extensions
|
|
|
|
and to allow simple, maintainable implementations to compete with
|
2005-03-31 02:48:34 +00:00
|
|
|
special purpose, hand-optimized code.
|