ran esc-q

This commit is contained in:
Sears Russell 2005-03-30 17:57:43 +00:00
parent f7ce3b70a6
commit 1dafb98029

View file

@ -12,26 +12,26 @@ as Berkeley DB handle a wider variety of workloads and are built in a
modular fashion. However, they do not provide APIs to allow modular fashion. However, they do not provide APIs to allow
applications to build upon and modify low level policies such as applications to build upon and modify low level policies such as
allocation strategies, page layout or details of recovery semantics. allocation strategies, page layout or details of recovery semantics.
Furthermore, data structure implementations are typically Furthermore, data structure implementations are typically not broken
not broken into separable, public APIs, encouraging a "from scratch" into separable, public APIs, encouraging a "from scratch" approach to
approach to the implementation of extensions. the implementation of extensions.
Contrast this to the handling of data structures within modern object Contrast this to the handling of data structures within modern object
oriented programming languages such as Java or C++. Such languages typically provide a oriented programming languages such as Java or C++. Such languages
large number of data storage algorithm implementations. These typically provide a large number of data storage algorithm
structures may be used interchangeably with application-specific data implementations. These structures may be used interchangeably with
collections, and collection implementations can be composed into more application-specific data collections, and collection implementations
sophisticated data structures. can be composed into more sophisticated data structures.
We have implemented LLADD (/yad/), an extensible transactional storage We have implemented LLADD (/yad/), an extensible transactional storage
library that takes a composable and layered approach to library that takes a composable and layered approach to transactional
transactional storage. Below, we present some of the high level storage. Below, we present some of the high level features and
features and performance characteristics of this system and discuss performance characteristics of this system and discuss our plans to
our plans to extend the system into distributed domains. Finally we extend the system into distributed domains. Finally we introduce our
introduce our current research focus, the application of automated current research focus, the application of automated program
program verification and optimization techniques to application specific extensions. Such verification and optimization techniques to application specific
techniques should significantly enhance the usability and performance extensions. Such techniques should significantly enhance the
of our system. usability and performance of our system.
Overview of the LLADD Architecture Overview of the LLADD Architecture
@ -51,22 +51,22 @@ address new applications that are evolving too quickly to allow
appropriate general-purpose solutions to be developed. appropriate general-purpose solutions to be developed.
The library is based upon an extensible version of ARIES but does not The library is based upon an extensible version of ARIES but does not
hard-code details such as page format or data structure implementation. hard-code details such as page format or data structure
It provides a number of "operation" implementations which consist of implementation. It provides a number of "operation" implementations
redo/undo implementations that apply log entries and wrapper which consist of redo/undo implementations that apply log entries and
functions that produce log entries. wrapper functions that produce log entries. During normal forward
During normal forward operations, page file writes are processed by operations, page file writes are processed by applying redo entries
applying redo entries from the log. Other than the invocation of code from the log. Other than the invocation of code that allocates and
that allocates and writes log entries there is no difference between writes log entries there is no difference between the redo phase of
the redo phase of recovery and normal forward operation. This reduces recovery and normal forward operation. This reduces the amount of
the amount of code that must be developed in order to implement new code that must be developed in order to implement new data structures
data structures and page layouts. and page layouts.
Of course, LLADD ships with a number of default data structures and Of course, LLADD ships with a number of default data structures and
layouts, ranging from byte-level page layouts to a linear hashtable layouts, ranging from byte-level page layouts to a linear hashtable
that was built using high-level reusable components. The that was built using high-level reusable components. The hashtable is
hashtable is implemented on top of a resizable array and a implemented on top of a resizable array and a locality preserving
locality preserving linked list implementation. linked list implementation.
Unlike existing solutions, we view data structure implementations from Unlike existing solutions, we view data structure implementations from
a reusability standpoint, allowing and encouraging application a reusability standpoint, allowing and encouraging application
@ -89,41 +89,42 @@ Furthermore, our system only keeps one copy of each object in memory
at a time, while most existing systems keep a second copy in the at a time, while most existing systems keep a second copy in the
transactional system's page cache (and possibly a third copy in transactional system's page cache (and possibly a third copy in
operating system cache). Therefore, our system can cache roughly operating system cache). Therefore, our system can cache roughly
twice as many objects in memory as the systems we compared it to. We leave systematic twice as many objects in memory as the systems we compared it to. We
performance tuning of LLADD to future work, and believe that further leave systematic performance tuning of LLADD to future work, and
optimizations would believe that further optimizations would improve our performance on
improve our performance on these benchmarks significantly. these benchmarks significantly.
LLADD's customizability provides superior performance over existing, LLADD's customizability provides superior performance over existing,
complex systems. Because of its natural complex systems. Because of its natural integration into standard
integration into standard system software development practices, we think that LLADD system software development practices, we think that LLADD can be
can be naturally extended into networked and distributed domains. naturally extended into networked and distributed domains.
For example, typical write-ahead-logging protocols implicitly implement machine For example, typical write-ahead-logging protocols implicitly
independent, reorderable log entries in order to implement logical implement machine independent, reorderable log entries in order to
undo. These two properties have been crucial in past system software implement logical undo. These two properties have been crucial in
designs, including data replication, distribution, and conflict past system software designs, including data replication,
resolution algorithms. Therefore, we plan to provide a networked, distribution, and conflict resolution algorithms. Therefore, we plan
logical redo log as an application-level primitive, and to explore to provide a networked, logical redo log as an application-level
system designs that leverage these primitives. primitive, and to explore system designs that leverage these
primitives.
However, our approach assumes that application developers will However, our approach assumes that application developers will
implement high performance transactional data structures. This implement high performance transactional data structures. This is a
is a big assumption, as these big assumption, as these data structures are notoriously difficult to
data structures are notoriously difficult to implement correctly. implement correctly. Our current research attempts to address these
Our current research attempts to address these concerns. concerns.
For our infrastructure to be generally useful the For our infrastructure to be generally useful the functionality that
functionality that it provides should be efficient, reliable and it provides should be efficient, reliable and applicable to new
applicable to new application domains. We believe that improvements application domains. We believe that improvements to the development
to the development process can address each of these goals. process can address each of these goals.
Application developers typically have a limited amount of time to Application developers typically have a limited amount of time to
spend implementing and verifying application-specific storage spend implementing and verifying application-specific storage
extensions, but bugs in these extensions have dire consequences. extensions, but bugs in these extensions have dire consequences.
Also, while data structure algorithms tend to be simple and Also, while data structure algorithms tend to be simple and easily
easily understood, performance tuning and verification of understood, performance tuning and verification of implementation
implementation correctness is extremely difficult. correctness is extremely difficult.
Recovery based algorithms must behave correctly during forward Recovery based algorithms must behave correctly during forward
operation and also under arbitrary recovery scenarios. The latter operation and also under arbitrary recovery scenarios. The latter
@ -155,13 +156,13 @@ yield high performance concurrent data structures if
semantics-preserving optimizations such as page prefetching are semantics-preserving optimizations such as page prefetching are
applied[ARIES/IM]. applied[ARIES/IM].
A separate approach to the static analysis of LLADD extensions A separate approach to the static analysis of LLADD extensions uses
uses compiler optimization techniques. Software built on top of compiler optimization techniques. Software built on top of layered
layered APIs frequently makes repeated calls to low level functions APIs frequently makes repeated calls to low level functions that must
that must repeat work. A common example in LLADD involves loops over repeat work. A common example in LLADD involves loops over data with
data with good locality in the page file. The vast majority of the good locality in the page file. The vast majority of the time, these
time, these loops call high level APIs that needlessly pin and unpin loops call high level APIs that needlessly pin and unpin the same
the same underlying data. underlying data.
The code for each of these high level API calls could be copied into The code for each of these high level API calls could be copied into
many different variants with different pinning/unpinning and many different variants with different pinning/unpinning and