Final draft.

This commit is contained in:
Sears Russell 2005-04-01 02:01:37 +00:00
parent 3b70b0b005
commit c60787e8b2

View file

@ -6,26 +6,25 @@ A Flexible, Extensible Transaction Framework
Existing transactional systems are designed to handle specific Existing transactional systems are designed to handle specific
workloads. Unfortunately, the implementations of these systems are workloads. Unfortunately, the implementations of these systems are
monolithic and hide the transactional infrastructure underneath a SQL monolithic and hide their transactional infrastructures underneath a
interface. Lower-level implementations such as Berkeley DB efficiently SQL interface. Lower-level implementations such as Berkeley DB
serve a wider variety of workloads and are built in a more modular efficiently serve a wider variety of workloads and are built in a more
fashion. However, they do not provide APIs to allow applications to modular fashion. However, they do not provide APIs to allow
build upon and modify low-level policies such as allocation applications to build upon and modify low-level policies such as
strategies, page layout or details of recovery semantics. allocation strategies, page layout or details of recovery semantics.
Furthermore, data structure implementations are typically not broken Furthermore, data structure implementations are typically not broken
into separable, public APIs, which discourages the implementation of into separable, public APIs, which discourages the implementation of
new transactional data structures. new transactional data structures.
Contrast this approach to the handling of data structures within Modern object-oriented programming languages such as C++ or Java
modern object-oriented programming languages such as C++ or Java. handle the problem differently. Such languages typically provide a
Such languages typically provide a large number of data storage large number of data storage algorithm implementations. These
algorithm implementations. These structures may be used structures may be used interchangeably with application-specific data
interchangeably with application-specific data collections, and collections, and collection implementations may be composed into more
collection implementations may be composed into more sophisticated sophisticated data structures.
data structures.
We have implemented LLADD (/yad/), an extensible transactional storage We have implemented LLADD (/yad/), an extensible transactional storage
library that takes a composable and layered approach to transactional library that takes a composable, layered approach to transactional
storage. Below, we present some of its high level features and storage. Below, we present some of its high level features and
performance characteristics and discuss our plans to extend the system performance characteristics and discuss our plans to extend the system
into distributed domains. Finally we introduce our current research into distributed domains. Finally we introduce our current research
@ -36,7 +35,6 @@ of our system, allowing application developers to implement
sophisticated cross-layer optimizations easily. sophisticated cross-layer optimizations easily.
Overview of the LLADD Architecture Overview of the LLADD Architecture
----------------------------------
General-purpose transactional storage systems are extremely complex General-purpose transactional storage systems are extremely complex
and only handle specific types of workloads efficiently. However, new and only handle specific types of workloads efficiently. However, new
@ -49,45 +47,79 @@ attempt to perform well across many workloads, we have implemented a
lower-level API that makes it easy for application designers to lower-level API that makes it easy for application designers to
implement specialized data structures. Essentially, we have implement specialized data structures. Essentially, we have
implemented an extensible navigational database system. We believe implemented an extensible navigational database system. We believe
that this system will support modern development practices and allows that this system will support modern development practices and allow
transactions to be used in a wider range of applications. transactions to be used in a wider range of applications.
In order to support our layered data structure implementations and to
support applications that require specialized recovery semantics,
LLADD provides the following functionality:
- Flexible Page Layouts for low-level control over transactional data
representations
- Extensible Log Formats for high level control over transactional
data structures
- High and low level control over the log, such as calls to "log this
operation" or "write a compensation record"
- In-memory logical logging for data store independent lists of
application requests, allowing "in flight" log reordering,
manipulation and durability primitives to be developed
- Extensible locking API for registration of custom lock managers and
a generic lock manager implementation
- Custom durability operations such as save points and two-phase
commit's prepare call.
We have shown that these primitives allow application developers to
control on-disk data representation, data structure implementations,
the granularity of concurrency, the precise semantics of atomicity,
isolation and durability, request scheduling policies, and deadlock /
avoidance schemes. The ability to control or replace these modules
and policies allows application developers to leverage application and
workload specific properties to enhance performance.
While implementations of general-purpose systems often lag behind the While implementations of general-purpose systems often lag behind the
requirements of rapidly evolving applications, we believe that our requirements of rapidly evolving applications, we believe that the
architecture's flexibility allows us to address such applications flexibility of our architecture allows us to address such applications
rapidly. Our system also seems to be a reasonable long-term solution rapidly. If the applications in question represent a large enough
in cases where the development of a general-purpose system is not market or an important class of workloads, high-level declarative
economical. systems may eventually replace lower level approaches based on our
system. For applications that represent a small market, the
implementation of such high-level declarative systems is probably not
feasible. In these cases, our system could provide a reasonable
long-term solution.
For example, XML storage systems are rapidly evolving but still fail For example, XML storage systems are rapidly evolving but still fail
to handle many types of applications. Typical bioinformatics data to handle many types of applications. Typical bioinformatics data
sets [PDB, NCBI, Gene Ontology] must be processed by computationally sets [PDB, NCBI, GO] must be processed by computationally intensive
intensive applications with rigid data layout requirements. The applications with rigid data layout requirements. The maintainers of
maintainers of these systems are slowly transitioning to XML, which is these systems are slowly transitioning to XML, which is valuable as an
valuable as an interchange format, and supported by many general interchange format and is also supported by many general purpose
purpose tools. However, many of the data processing applications that tools. However, many of the data processing applications that use
use these databases still must employ ad-hoc solutions for data these databases still must employ ad-hoc solutions for computationally
management. expensive tasks and data production pipelines.
Whether or not general purpose XML database systems eventually meet XML database systems may eventually meet all of the needs of of these
all of the needs of each of these distinct scientific applications, scientific applications. However, extensions implemented on top of a
extensions implemented on top of a more flexible data storage flexible storage system could have avoided the need for ad-hoc
implementation could have avoided the need for ad-hoc solutions, and solutions, and served as a prototype for components of higher level
could serve as a partial prototype for higher level implementations. implementations.
LLADD is based upon an extensible version of ARIES but does not LLADD is based upon an extensible version of ARIES [ARIES] but does
hard-code details such as page format or data structure not hard-code details such as page format or data structure
implementation. It provides a number of "operation" implementations implementation. It provides a number of "operation" implementations
which consist of redo/undo methods and wrapper functions. The which consist of redo/undo methods and wrapper functions. The
redo/undo methods manipulate the page file by applying log entries redo/undo methods manipulate the page file by applying log entries
while the wrapper functions produce log entries. Redo methods handle while the wrapper functions produce log entries. Redo methods handle
all page file manipulation during normal forward operation, reducing all page file manipulation during normal forward operation, reducing
the amount of code that must be developed in order to implement new the amount of code required to implement new data structures. LLADD
data structures. LLADD handles the scheduling of redo/undo handles the scheduling of redo/undo invocations, disk I/O, and all of
invocations, disk I/O, and all of the other details specified by the the other details specified by the ARIES recovery algorithm. This
ARIES recovery algorithm, allowing operation implementors to focus on allows operation implementors to focus on the details that are
the details that are important to the functionality their extension important to the functionality their extension provides.
provides.
LLADD ships with a number of default data structures and layouts, LLADD ships with a number of default data structures and layouts,
ranging from byte-level page layouts to linear hashtables and ranging from byte-level page layouts to linear hashtables and
@ -99,12 +131,12 @@ reusable modules that implement a resizable array and two exchangeable
linked-list variants. linked-list variants.
In other work, we show that the system is competitive with Berkeley DB In other work, we show that the system is competitive with Berkeley DB
on traditional (hashtable based) workloads, and have shown significant on traditional (hashtable based) workloads, and show significant
performance improvements for less conventional workloads including performance improvements for less conventional workloads including
custom data structure implementations, graph traversal algorithms and custom data structure implementations, graph traversal algorithms and
transactional object persistence workloads. transactional object persistence workloads.
The transactional object persistence system was based upon the The transactional object persistence system is based upon the
observation that most object persistence schemes cache a second copy observation that most object persistence schemes cache a second copy
of each in-memory object in a page file, and often keep a third copy of each in-memory object in a page file, and often keep a third copy
in operating system cache. By implementing custom operations that in operating system cache. By implementing custom operations that
@ -131,8 +163,8 @@ advantage in situations where the size of system memory is a
bottleneck. bottleneck.
We leave systematic performance tuning of LLADD to future work, and We leave systematic performance tuning of LLADD to future work, and
believe that further optimizations will improve our performance on believe that further optimizations will improve performance on these
these benchmarks significantly. benchmarks significantly.
Because of its natural integration into standard system software Because of its natural integration into standard system software
development practices, we think that LLADD can be naturally extended development practices, we think that LLADD can be naturally extended
@ -140,13 +172,12 @@ into networked and distributed domains. For example, typical
write-ahead-logging protocols implicitly implement machine write-ahead-logging protocols implicitly implement machine
independent, reorderable log entries in order to implement logical independent, reorderable log entries in order to implement logical
undo. These two properties have been crucial in past system software undo. These two properties have been crucial in past system software
designs, including data replication, distribution, and conflict designs, including data replication, distribution and conflict
resolution algorithms. Therefore, we plan to provide a networked, resolution algorithms. Therefore, we plan to provide a networked,
logical redo log as an application-level primitive, and to explore logical redo log as an application-level primitive, and to explore
system designs that leverage this approach. system designs that leverage this approach.
Current Research Focus Current Research Focus
----------------------
LLADD's design assumes that application developers will implement LLADD's design assumes that application developers will implement
high-performance transactional data structures. However, these data high-performance transactional data structures. However, these data
@ -169,10 +200,8 @@ Recovery-based algorithms must behave correctly during forward
operation and also under arbitrary recovery scenarios. Behavior operation and also under arbitrary recovery scenarios. Behavior
during recovery is particularly difficult to verify due to the large during recovery is particularly difficult to verify due to the large
number of materialized page file states that could occur after a number of materialized page file states that could occur after a
crash. crash. Fortunately, write-ahead-logging schemes such as ARIES make
use of nested-top-actions to simplify the problem. Given the
Fortunately, write-ahead-logging schemes such as ARIES make use of
nested-top-actions to vastly simplify the problem. Given the
correctness of page-based physical undo and redo, logical undo may correctness of page-based physical undo and redo, logical undo may
assume that page spanning operations are applied to the data store assume that page spanning operations are applied to the data store
atomically. atomically.
@ -180,17 +209,16 @@ atomically.
Existing work in the static-analysis community has verified that Existing work in the static-analysis community has verified that
device driver implementations correctly adhere to complex operating device driver implementations correctly adhere to complex operating
system kernel locking schemes [SLAM]. We would like to formalize system kernel locking schemes [SLAM]. We would like to formalize
LLADD's latching and logging APIs, so that these analyses will be LLADD's latching and logging APIs so that these analyses will be
directly applicable to LLADD. This would allow us to verify that data directly applicable to LLADD. This would allow us to verify that data
structure behavior during recovery is equivalent to the behavior that structure behavior during recovery is equivalent to the behavior that
would result if an abort() was issued on each prefix of the log that would result if an abort() was issued on each prefix of the log.
is generated during normal forward operation.
By using coarse latches that are held throughout entire logical By using coarse latches that are held throughout entire logical
operation invocations, we can drastically reduce the size of this operation invocations, we can drastically reduce the size of this
space, allowing conventional state-state based search techniques (such space. This would allow conventional state-state based search
as randomized or exhaustive state-space searches, or unit testing techniques (such as randomized or exhaustive state-space searches, or
techniques) to be practical. It has been shown that such unit testing techniques) to be practical. It has been shown that such
coarse-grained latching can yield high-performance concurrent data coarse-grained latching can yield high-performance concurrent data
structures if semantics-preserving optimizations such as page structures if semantics-preserving optimizations such as page
prefetching are applied [ARIES/IM]. prefetching are applied [ARIES/IM].
@ -205,8 +233,8 @@ continually pin and unpin the same underlying data.
The code for each of these high level API calls could be copied into The code for each of these high level API calls could be copied into
many different variants with different pinning/unpinning and many different variants with different pinning/unpinning and
latching/unlatching behavior, but this would greatly complicate the latching/unlatching behavior. This would greatly complicate the API
API that application developers must work with, and complicate any that application developers must work with and complicate any
application code that made use of such optimizations. application code that made use of such optimizations.
Compiler optimization techniques such as code hoisting and partial Compiler optimization techniques such as code hoisting and partial
@ -216,13 +244,13 @@ conditionals, while partial common subexpression elimination inserts
checks that decide at runtime whether a particular computation is checks that decide at runtime whether a particular computation is
redundant. We hope to extend such techniques to reduce the number of redundant. We hope to extend such techniques to reduce the number of
buffer manager and locking calls made by existing code. In situations buffer manager and locking calls made by existing code. In situations
where memory is abundant, these calls are a significant performance where memory is abundant these calls are a significant performance
bottleneck, especially for read-only operations. bottleneck, especially for read-only operations.
Similar optimization techniques are applicable to application code. Similar optimization techniques are applicable to application code.
Local LLADD calls are simply normal function calls. Therefore it may Local LLADD calls are normal function calls. Therefore it may be
even be possible to apply the transformations that these optimizations possible to apply the transformations that these optimizations perform
perform to application code that is unaware of the underlying storage to application code that is unaware of the underlying storage
implementation. This class of optimizations would be very difficult implementation. This class of optimizations would be very difficult
to implement with existing transactional storage systems but should to implement with existing transactional storage systems but should
significantly improve application performance. significantly improve application performance.
@ -239,3 +267,38 @@ superior to other general-purpose solutions. By adding support for
automated code verification and transformations we hope to make it automated code verification and transformations we hope to make it
easy to produce correct extensions and to allow simple, maintainable easy to produce correct extensions and to allow simple, maintainable
implementations to compete with special purpose, hand-optimized code. implementations to compete with special purpose, hand-optimized code.
Conclusion
We have described a simple, extensible architecture for transactional
systems and presented a number of situations where our implementation
outperforms existing transactional systems. Due to the flexibility of
the architecture, we believe that it is appropriate for evolving
applications and for applications where general-purpose, declarative
systems are inappropriate. Finally, we presented a number of
optimizations that our system can support, but that would be extremely
difficult to apply to existing transactional data stores. Therefore,
we believe that our approach is applicable to a wider range of
scenarios than existing systems.
Acknowledgements
Mike Demmer was responsible for LLADD's object persistence
functionality. Jimmy Kittiyachavalit, Jim Blomo and Jason Bayer
implemented the original version of LLADD. Gilad Arnold, and Amir
Kamil provided invaluable feedback regarding LLADD's API.
[SLAM] Ball, Thomas and Rajamani, Sriram. "Automatically
Validating Temporal Safety Properties of Interfaces,"
International Workshop on SPIN Model Checking, 2001.
[GO] Gene Ontology, http://www.geneontology.org/
[ARIES] C. Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh,
Peter Schwarz. "ARIES: a transaction recovery method
supporting fine-granularity locking and partial
rollbacks using write-ahead logging," TODS, 1992.
[ARIES/IM] C. Mohan, Frank Levine. "ARIES/IM: an efficient and
high concurrency index management method using
write-ahead logging," ACM SIGMOD, 1992.
[NCBI] National Center for Biotechnology Information,
http://www.ncbi.nlm.nih.gov/
[PDB] Protein Data Bank, http://www.rcsb.org/pdb/