Final draft.
This commit is contained in:
parent
3b70b0b005
commit
c60787e8b2
1 changed files with 127 additions and 64 deletions
|
@ -6,26 +6,25 @@ A Flexible, Extensible Transaction Framework
|
||||||
|
|
||||||
Existing transactional systems are designed to handle specific
|
Existing transactional systems are designed to handle specific
|
||||||
workloads. Unfortunately, the implementations of these systems are
|
workloads. Unfortunately, the implementations of these systems are
|
||||||
monolithic and hide the transactional infrastructure underneath a SQL
|
monolithic and hide their transactional infrastructures underneath a
|
||||||
interface. Lower-level implementations such as Berkeley DB efficiently
|
SQL interface. Lower-level implementations such as Berkeley DB
|
||||||
serve a wider variety of workloads and are built in a more modular
|
efficiently serve a wider variety of workloads and are built in a more
|
||||||
fashion. However, they do not provide APIs to allow applications to
|
modular fashion. However, they do not provide APIs to allow
|
||||||
build upon and modify low-level policies such as allocation
|
applications to build upon and modify low-level policies such as
|
||||||
strategies, page layout or details of recovery semantics.
|
allocation strategies, page layout or details of recovery semantics.
|
||||||
Furthermore, data structure implementations are typically not broken
|
Furthermore, data structure implementations are typically not broken
|
||||||
into separable, public APIs, which discourages the implementation of
|
into separable, public APIs, which discourages the implementation of
|
||||||
new transactional data structures.
|
new transactional data structures.
|
||||||
|
|
||||||
Contrast this approach to the handling of data structures within
|
Modern object-oriented programming languages such as C++ or Java
|
||||||
modern object-oriented programming languages such as C++ or Java.
|
handle the problem differently. Such languages typically provide a
|
||||||
Such languages typically provide a large number of data storage
|
large number of data storage algorithm implementations. These
|
||||||
algorithm implementations. These structures may be used
|
structures may be used interchangeably with application-specific data
|
||||||
interchangeably with application-specific data collections, and
|
collections, and collection implementations may be composed into more
|
||||||
collection implementations may be composed into more sophisticated
|
sophisticated data structures.
|
||||||
data structures.
|
|
||||||
|
|
||||||
We have implemented LLADD (/yad/), an extensible transactional storage
|
We have implemented LLADD (/yad/), an extensible transactional storage
|
||||||
library that takes a composable and layered approach to transactional
|
library that takes a composable, layered approach to transactional
|
||||||
storage. Below, we present some of its high level features and
|
storage. Below, we present some of its high level features and
|
||||||
performance characteristics and discuss our plans to extend the system
|
performance characteristics and discuss our plans to extend the system
|
||||||
into distributed domains. Finally we introduce our current research
|
into distributed domains. Finally we introduce our current research
|
||||||
|
@ -36,7 +35,6 @@ of our system, allowing application developers to implement
|
||||||
sophisticated cross-layer optimizations easily.
|
sophisticated cross-layer optimizations easily.
|
||||||
|
|
||||||
Overview of the LLADD Architecture
|
Overview of the LLADD Architecture
|
||||||
----------------------------------
|
|
||||||
|
|
||||||
General-purpose transactional storage systems are extremely complex
|
General-purpose transactional storage systems are extremely complex
|
||||||
and only handle specific types of workloads efficiently. However, new
|
and only handle specific types of workloads efficiently. However, new
|
||||||
|
@ -49,45 +47,79 @@ attempt to perform well across many workloads, we have implemented a
|
||||||
lower-level API that makes it easy for application designers to
|
lower-level API that makes it easy for application designers to
|
||||||
implement specialized data structures. Essentially, we have
|
implement specialized data structures. Essentially, we have
|
||||||
implemented an extensible navigational database system. We believe
|
implemented an extensible navigational database system. We believe
|
||||||
that this system will support modern development practices and allows
|
that this system will support modern development practices and allow
|
||||||
transactions to be used in a wider range of applications.
|
transactions to be used in a wider range of applications.
|
||||||
|
|
||||||
|
In order to support our layered data structure implementations and to
|
||||||
|
support applications that require specialized recovery semantics,
|
||||||
|
LLADD provides the following functionality:
|
||||||
|
|
||||||
|
- Flexible Page Layouts for low-level control over transactional data
|
||||||
|
representations
|
||||||
|
|
||||||
|
- Extensible Log Formats for high level control over transactional
|
||||||
|
data structures
|
||||||
|
|
||||||
|
- High and low level control over the log, such as calls to "log this
|
||||||
|
operation" or "write a compensation record"
|
||||||
|
|
||||||
|
- In-memory logical logging for data store independent lists of
|
||||||
|
application requests, allowing "in flight" log reordering,
|
||||||
|
manipulation and durability primitives to be developed
|
||||||
|
|
||||||
|
- Extensible locking API for registration of custom lock managers and
|
||||||
|
a generic lock manager implementation
|
||||||
|
|
||||||
|
- Custom durability operations such as save points and two-phase
|
||||||
|
commit's prepare call.
|
||||||
|
|
||||||
|
We have shown that these primitives allow application developers to
|
||||||
|
control on-disk data representation, data structure implementations,
|
||||||
|
the granularity of concurrency, the precise semantics of atomicity,
|
||||||
|
isolation and durability, request scheduling policies, and deadlock /
|
||||||
|
avoidance schemes. The ability to control or replace these modules
|
||||||
|
and policies allows application developers to leverage application and
|
||||||
|
workload specific properties to enhance performance.
|
||||||
|
|
||||||
While implementations of general-purpose systems often lag behind the
|
While implementations of general-purpose systems often lag behind the
|
||||||
requirements of rapidly evolving applications, we believe that our
|
requirements of rapidly evolving applications, we believe that the
|
||||||
architecture's flexibility allows us to address such applications
|
flexibility of our architecture allows us to address such applications
|
||||||
rapidly. Our system also seems to be a reasonable long-term solution
|
rapidly. If the applications in question represent a large enough
|
||||||
in cases where the development of a general-purpose system is not
|
market or an important class of workloads, high-level declarative
|
||||||
economical.
|
systems may eventually replace lower level approaches based on our
|
||||||
|
system. For applications that represent a small market, the
|
||||||
|
implementation of such high-level declarative systems is probably not
|
||||||
|
feasible. In these cases, our system could provide a reasonable
|
||||||
|
long-term solution.
|
||||||
|
|
||||||
For example, XML storage systems are rapidly evolving but still fail
|
For example, XML storage systems are rapidly evolving but still fail
|
||||||
to handle many types of applications. Typical bioinformatics data
|
to handle many types of applications. Typical bioinformatics data
|
||||||
sets [PDB, NCBI, Gene Ontology] must be processed by computationally
|
sets [PDB, NCBI, GO] must be processed by computationally intensive
|
||||||
intensive applications with rigid data layout requirements. The
|
applications with rigid data layout requirements. The maintainers of
|
||||||
maintainers of these systems are slowly transitioning to XML, which is
|
these systems are slowly transitioning to XML, which is valuable as an
|
||||||
valuable as an interchange format, and supported by many general
|
interchange format and is also supported by many general purpose
|
||||||
purpose tools. However, many of the data processing applications that
|
tools. However, many of the data processing applications that use
|
||||||
use these databases still must employ ad-hoc solutions for data
|
these databases still must employ ad-hoc solutions for computationally
|
||||||
management.
|
expensive tasks and data production pipelines.
|
||||||
|
|
||||||
Whether or not general purpose XML database systems eventually meet
|
XML database systems may eventually meet all of the needs of of these
|
||||||
all of the needs of each of these distinct scientific applications,
|
scientific applications. However, extensions implemented on top of a
|
||||||
extensions implemented on top of a more flexible data storage
|
flexible storage system could have avoided the need for ad-hoc
|
||||||
implementation could have avoided the need for ad-hoc solutions, and
|
solutions, and served as a prototype for components of higher level
|
||||||
could serve as a partial prototype for higher level implementations.
|
implementations.
|
||||||
|
|
||||||
LLADD is based upon an extensible version of ARIES but does not
|
LLADD is based upon an extensible version of ARIES [ARIES] but does
|
||||||
hard-code details such as page format or data structure
|
not hard-code details such as page format or data structure
|
||||||
implementation. It provides a number of "operation" implementations
|
implementation. It provides a number of "operation" implementations
|
||||||
which consist of redo/undo methods and wrapper functions. The
|
which consist of redo/undo methods and wrapper functions. The
|
||||||
redo/undo methods manipulate the page file by applying log entries
|
redo/undo methods manipulate the page file by applying log entries
|
||||||
while the wrapper functions produce log entries. Redo methods handle
|
while the wrapper functions produce log entries. Redo methods handle
|
||||||
all page file manipulation during normal forward operation, reducing
|
all page file manipulation during normal forward operation, reducing
|
||||||
the amount of code that must be developed in order to implement new
|
the amount of code required to implement new data structures. LLADD
|
||||||
data structures. LLADD handles the scheduling of redo/undo
|
handles the scheduling of redo/undo invocations, disk I/O, and all of
|
||||||
invocations, disk I/O, and all of the other details specified by the
|
the other details specified by the ARIES recovery algorithm. This
|
||||||
ARIES recovery algorithm, allowing operation implementors to focus on
|
allows operation implementors to focus on the details that are
|
||||||
the details that are important to the functionality their extension
|
important to the functionality their extension provides.
|
||||||
provides.
|
|
||||||
|
|
||||||
LLADD ships with a number of default data structures and layouts,
|
LLADD ships with a number of default data structures and layouts,
|
||||||
ranging from byte-level page layouts to linear hashtables and
|
ranging from byte-level page layouts to linear hashtables and
|
||||||
|
@ -99,12 +131,12 @@ reusable modules that implement a resizable array and two exchangeable
|
||||||
linked-list variants.
|
linked-list variants.
|
||||||
|
|
||||||
In other work, we show that the system is competitive with Berkeley DB
|
In other work, we show that the system is competitive with Berkeley DB
|
||||||
on traditional (hashtable based) workloads, and have shown significant
|
on traditional (hashtable based) workloads, and show significant
|
||||||
performance improvements for less conventional workloads including
|
performance improvements for less conventional workloads including
|
||||||
custom data structure implementations, graph traversal algorithms and
|
custom data structure implementations, graph traversal algorithms and
|
||||||
transactional object persistence workloads.
|
transactional object persistence workloads.
|
||||||
|
|
||||||
The transactional object persistence system was based upon the
|
The transactional object persistence system is based upon the
|
||||||
observation that most object persistence schemes cache a second copy
|
observation that most object persistence schemes cache a second copy
|
||||||
of each in-memory object in a page file, and often keep a third copy
|
of each in-memory object in a page file, and often keep a third copy
|
||||||
in operating system cache. By implementing custom operations that
|
in operating system cache. By implementing custom operations that
|
||||||
|
@ -131,8 +163,8 @@ advantage in situations where the size of system memory is a
|
||||||
bottleneck.
|
bottleneck.
|
||||||
|
|
||||||
We leave systematic performance tuning of LLADD to future work, and
|
We leave systematic performance tuning of LLADD to future work, and
|
||||||
believe that further optimizations will improve our performance on
|
believe that further optimizations will improve performance on these
|
||||||
these benchmarks significantly.
|
benchmarks significantly.
|
||||||
|
|
||||||
Because of its natural integration into standard system software
|
Because of its natural integration into standard system software
|
||||||
development practices, we think that LLADD can be naturally extended
|
development practices, we think that LLADD can be naturally extended
|
||||||
|
@ -140,13 +172,12 @@ into networked and distributed domains. For example, typical
|
||||||
write-ahead-logging protocols implicitly implement machine
|
write-ahead-logging protocols implicitly implement machine
|
||||||
independent, reorderable log entries in order to implement logical
|
independent, reorderable log entries in order to implement logical
|
||||||
undo. These two properties have been crucial in past system software
|
undo. These two properties have been crucial in past system software
|
||||||
designs, including data replication, distribution, and conflict
|
designs, including data replication, distribution and conflict
|
||||||
resolution algorithms. Therefore, we plan to provide a networked,
|
resolution algorithms. Therefore, we plan to provide a networked,
|
||||||
logical redo log as an application-level primitive, and to explore
|
logical redo log as an application-level primitive, and to explore
|
||||||
system designs that leverage this approach.
|
system designs that leverage this approach.
|
||||||
|
|
||||||
Current Research Focus
|
Current Research Focus
|
||||||
----------------------
|
|
||||||
|
|
||||||
LLADD's design assumes that application developers will implement
|
LLADD's design assumes that application developers will implement
|
||||||
high-performance transactional data structures. However, these data
|
high-performance transactional data structures. However, these data
|
||||||
|
@ -169,10 +200,8 @@ Recovery-based algorithms must behave correctly during forward
|
||||||
operation and also under arbitrary recovery scenarios. Behavior
|
operation and also under arbitrary recovery scenarios. Behavior
|
||||||
during recovery is particularly difficult to verify due to the large
|
during recovery is particularly difficult to verify due to the large
|
||||||
number of materialized page file states that could occur after a
|
number of materialized page file states that could occur after a
|
||||||
crash.
|
crash. Fortunately, write-ahead-logging schemes such as ARIES make
|
||||||
|
use of nested-top-actions to simplify the problem. Given the
|
||||||
Fortunately, write-ahead-logging schemes such as ARIES make use of
|
|
||||||
nested-top-actions to vastly simplify the problem. Given the
|
|
||||||
correctness of page-based physical undo and redo, logical undo may
|
correctness of page-based physical undo and redo, logical undo may
|
||||||
assume that page spanning operations are applied to the data store
|
assume that page spanning operations are applied to the data store
|
||||||
atomically.
|
atomically.
|
||||||
|
@ -180,17 +209,16 @@ atomically.
|
||||||
Existing work in the static-analysis community has verified that
|
Existing work in the static-analysis community has verified that
|
||||||
device driver implementations correctly adhere to complex operating
|
device driver implementations correctly adhere to complex operating
|
||||||
system kernel locking schemes [SLAM]. We would like to formalize
|
system kernel locking schemes [SLAM]. We would like to formalize
|
||||||
LLADD's latching and logging APIs, so that these analyses will be
|
LLADD's latching and logging APIs so that these analyses will be
|
||||||
directly applicable to LLADD. This would allow us to verify that data
|
directly applicable to LLADD. This would allow us to verify that data
|
||||||
structure behavior during recovery is equivalent to the behavior that
|
structure behavior during recovery is equivalent to the behavior that
|
||||||
would result if an abort() was issued on each prefix of the log that
|
would result if an abort() was issued on each prefix of the log.
|
||||||
is generated during normal forward operation.
|
|
||||||
|
|
||||||
By using coarse latches that are held throughout entire logical
|
By using coarse latches that are held throughout entire logical
|
||||||
operation invocations, we can drastically reduce the size of this
|
operation invocations, we can drastically reduce the size of this
|
||||||
space, allowing conventional state-state based search techniques (such
|
space. This would allow conventional state-state based search
|
||||||
as randomized or exhaustive state-space searches, or unit testing
|
techniques (such as randomized or exhaustive state-space searches, or
|
||||||
techniques) to be practical. It has been shown that such
|
unit testing techniques) to be practical. It has been shown that such
|
||||||
coarse-grained latching can yield high-performance concurrent data
|
coarse-grained latching can yield high-performance concurrent data
|
||||||
structures if semantics-preserving optimizations such as page
|
structures if semantics-preserving optimizations such as page
|
||||||
prefetching are applied [ARIES/IM].
|
prefetching are applied [ARIES/IM].
|
||||||
|
@ -205,8 +233,8 @@ continually pin and unpin the same underlying data.
|
||||||
|
|
||||||
The code for each of these high level API calls could be copied into
|
The code for each of these high level API calls could be copied into
|
||||||
many different variants with different pinning/unpinning and
|
many different variants with different pinning/unpinning and
|
||||||
latching/unlatching behavior, but this would greatly complicate the
|
latching/unlatching behavior. This would greatly complicate the API
|
||||||
API that application developers must work with, and complicate any
|
that application developers must work with and complicate any
|
||||||
application code that made use of such optimizations.
|
application code that made use of such optimizations.
|
||||||
|
|
||||||
Compiler optimization techniques such as code hoisting and partial
|
Compiler optimization techniques such as code hoisting and partial
|
||||||
|
@ -216,13 +244,13 @@ conditionals, while partial common subexpression elimination inserts
|
||||||
checks that decide at runtime whether a particular computation is
|
checks that decide at runtime whether a particular computation is
|
||||||
redundant. We hope to extend such techniques to reduce the number of
|
redundant. We hope to extend such techniques to reduce the number of
|
||||||
buffer manager and locking calls made by existing code. In situations
|
buffer manager and locking calls made by existing code. In situations
|
||||||
where memory is abundant, these calls are a significant performance
|
where memory is abundant these calls are a significant performance
|
||||||
bottleneck, especially for read-only operations.
|
bottleneck, especially for read-only operations.
|
||||||
|
|
||||||
Similar optimization techniques are applicable to application code.
|
Similar optimization techniques are applicable to application code.
|
||||||
Local LLADD calls are simply normal function calls. Therefore it may
|
Local LLADD calls are normal function calls. Therefore it may be
|
||||||
even be possible to apply the transformations that these optimizations
|
possible to apply the transformations that these optimizations perform
|
||||||
perform to application code that is unaware of the underlying storage
|
to application code that is unaware of the underlying storage
|
||||||
implementation. This class of optimizations would be very difficult
|
implementation. This class of optimizations would be very difficult
|
||||||
to implement with existing transactional storage systems but should
|
to implement with existing transactional storage systems but should
|
||||||
significantly improve application performance.
|
significantly improve application performance.
|
||||||
|
@ -239,3 +267,38 @@ superior to other general-purpose solutions. By adding support for
|
||||||
automated code verification and transformations we hope to make it
|
automated code verification and transformations we hope to make it
|
||||||
easy to produce correct extensions and to allow simple, maintainable
|
easy to produce correct extensions and to allow simple, maintainable
|
||||||
implementations to compete with special purpose, hand-optimized code.
|
implementations to compete with special purpose, hand-optimized code.
|
||||||
|
|
||||||
|
Conclusion
|
||||||
|
|
||||||
|
We have described a simple, extensible architecture for transactional
|
||||||
|
systems and presented a number of situations where our implementation
|
||||||
|
outperforms existing transactional systems. Due to the flexibility of
|
||||||
|
the architecture, we believe that it is appropriate for evolving
|
||||||
|
applications and for applications where general-purpose, declarative
|
||||||
|
systems are inappropriate. Finally, we presented a number of
|
||||||
|
optimizations that our system can support, but that would be extremely
|
||||||
|
difficult to apply to existing transactional data stores. Therefore,
|
||||||
|
we believe that our approach is applicable to a wider range of
|
||||||
|
scenarios than existing systems.
|
||||||
|
|
||||||
|
Acknowledgements
|
||||||
|
|
||||||
|
Mike Demmer was responsible for LLADD's object persistence
|
||||||
|
functionality. Jimmy Kittiyachavalit, Jim Blomo and Jason Bayer
|
||||||
|
implemented the original version of LLADD. Gilad Arnold, and Amir
|
||||||
|
Kamil provided invaluable feedback regarding LLADD's API.
|
||||||
|
|
||||||
|
[SLAM] Ball, Thomas and Rajamani, Sriram. "Automatically
|
||||||
|
Validating Temporal Safety Properties of Interfaces,"
|
||||||
|
International Workshop on SPIN Model Checking, 2001.
|
||||||
|
[GO] Gene Ontology, http://www.geneontology.org/
|
||||||
|
[ARIES] C. Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh,
|
||||||
|
Peter Schwarz. "ARIES: a transaction recovery method
|
||||||
|
supporting fine-granularity locking and partial
|
||||||
|
rollbacks using write-ahead logging," TODS, 1992.
|
||||||
|
[ARIES/IM] C. Mohan, Frank Levine. "ARIES/IM: an efficient and
|
||||||
|
high concurrency index management method using
|
||||||
|
write-ahead logging," ACM SIGMOD, 1992.
|
||||||
|
[NCBI] National Center for Biotechnology Information,
|
||||||
|
http://www.ncbi.nlm.nih.gov/
|
||||||
|
[PDB] Protein Data Bank, http://www.rcsb.org/pdb/
|
||||||
|
|
Loading…
Reference in a new issue