Made one full pass

This commit is contained in:
Sears Russell 2006-04-24 22:34:24 +00:00
parent 5441e2f758
commit c0d143529c

View file

@ -688,7 +688,7 @@ amount of redo information that must be written to the log file.
\subsection{Nested top actions} \subsection{Nested top actions}
\label{sec:nta}
So far, we have glossed over the behavior of our system when concurrent So far, we have glossed over the behavior of our system when concurrent
transactions modify the same data structure. To understand the problems that transactions modify the same data structure. To understand the problems that
arise in this case, consider what arise in this case, consider what
@ -748,8 +748,8 @@ implementations, although \yad does not preclude the use of more
complex schemes that lead to higher concurrency. complex schemes that lead to higher concurrency.
\subsection{LSN-Free pages} \subsection{Blind Writes}
\label{sec:blindWrites}
As described above, and in all database implementations of which we As described above, and in all database implementations of which we
are aware, transactional pages use LSNs on each page. This makes it are aware, transactional pages use LSNs on each page. This makes it
difficult to map large objects onto multiple pages, as the LSNs break difficult to map large objects onto multiple pages, as the LSNs break
@ -1032,22 +1032,22 @@ Although the beginning of this paper describes the limitations of
physical database models and relational storage systems in great physical database models and relational storage systems in great
detail, these systems are the basis of most common transactional detail, these systems are the basis of most common transactional
storage routines. Therefore, we implement a key-based access storage routines. Therefore, we implement a key-based access
method in this section. We argue that obtaining method in this section. We argue that
obtaining reasonable performance in such a system under \yad is obtaining reasonable performance in such a system under \yad is
straightforward, and compare a simple hash table to a hand-tuned (not straightforward. We then compare our simple, straightforward
straightforward) hash table, and Berkeley DB's implementation. implementation to our hand-tuned version and Berkeley DB's implementation.
The simple hash table uses nested top actions to atomically update its The simple hash table uses nested top actions to atomically update its
internal structure. It is based on a linear hash function, allowing internal structure. It is based on a {\em linear} hash function~\cite{lht}, allowing
it to incrementally grow its buffer list. It is based on a number of it to incrementally grow its buffer list. It is based on a number of
modular subcomponents. Notably, its bucket list is a growable array modular subcomponents. Notably, its bucket list is a growable array
of fixed length entries (a linkset, in the terms of the physical of fixed length entries (a linkset, in the terms of the physical
database model) and the user's choice of two different linked list database model) and the user's choice of two different linked list
implementations. implementations.
The hand-tuned hashtable also uses a {\em linear} hash The hand-tuned hashtable also uses a linear hash
function,~\cite{lht} but is monolithic, and uses carefully ordered writes to function. However, it is monolithic and uses carefully ordered writes to
reduce log bandwidth, and other runtime overhead. Berkeley DB's reduce runtime overheads such as log bandwidth. Berkeley DB's
hashtable is a popular, commonly deployed implementation, and serves hashtable is a popular, commonly deployed implementation, and serves
as a baseline for our experiements. as a baseline for our experiements.
@ -1059,10 +1059,10 @@ to Berkeley DB. Instead, this test shows that \yad is comparable to
existing systems, and that its modular design does not introduce gross existing systems, and that its modular design does not introduce gross
inefficiencies at runtime. inefficiencies at runtime.
The comparison between our two hash implementations is more The comparison between the \yad implementations is more
enlightening. The performance of the simple hash table shows that enlightening. The performance of the simple hash table shows that
quick, straightfoward datastructure implementations composed from straightfoward datastructure implementations composed from
simpler structures can perform as well as implementations included simpler structures can perform as well as the implementations included
in existing monolithic systems. The hand-tuned in existing monolithic systems. The hand-tuned
implementation shows that \yad allows application developers to implementation shows that \yad allows application developers to
optimize the primitives they build their applications upon. optimize the primitives they build their applications upon.
@ -1075,7 +1075,7 @@ optimize the primitives they build their applications upon.
%forced to redesign and application to avoid sub-optimal properties of %forced to redesign and application to avoid sub-optimal properties of
%the transactional data structure implementation. %the transactional data structure implementation.
Figure~\ref{lhtThread} describes performance of the two systems under Figure~\ref{fig:TPS} describes performance of the two systems under
highly concurrent workloads. For this test, we used the simple highly concurrent workloads. For this test, we used the simple
(unoptimized) hash table, since we are interested in the performance a (unoptimized) hash table, since we are interested in the performance a
clean, modular data structure that a typical system implementor would clean, modular data structure that a typical system implementor would
@ -1117,14 +1117,14 @@ different styles of object serialization have been eimplemented in
mechanism for a statically typed functional programming language, a mechanism for a statically typed functional programming language, a
dynamically typed scripting language, or a particular application, dynamically typed scripting language, or a particular application,
such as an email server. In each case, \yads lack of a hardcoded data such as an email server. In each case, \yads lack of a hardcoded data
model would allow us to choose a representation and transactional model would allow us to choose the representation and transactional
semantics that made the most sense for the system at hand. semantics that make the most sense for the system at hand.
The first object persistance mechanism, pobj, provides transactional updates to objects in The first object persistance mechanism, pobj, provides transactional updates to objects in
Titanium, a Java variant. It transparently loads and persists Titanium, a Java variant. It transparently loads and persists
entire graphs of objects. entire graphs of objects, but will not be discussed in further detail.
The second variant was built on top of a generic C++ object The second variant was built on top of a C++ object
serialization library, \oasys. \oasys makes use of pluggable storage serialization library, \oasys. \oasys makes use of pluggable storage
modules that implement persistant storage, and includes plugins modules that implement persistant storage, and includes plugins
for Berkeley DB and MySQL. for Berkeley DB and MySQL.
@ -1140,11 +1140,11 @@ manager. Instead of maintaining an up-to-date version of each object
in the buffer manager or page file, it allows the buffer manager's in the buffer manager or page file, it allows the buffer manager's
view of live application objects to become stale. This is safe since view of live application objects to become stale. This is safe since
the system is always able to reconstruct the appropriate page entry the system is always able to reconstruct the appropriate page entry
form the live copy of the object. from the live copy of the object.
By allowing the buffer manager to contain stale data, we reduce the By allowing the buffer manager to contain stale data, we reduce the
number of times the \yad \oasys plugin must serialize objects to number of times the \yad \oasys plugin must serialize objects to
update the page file. The reduced number of serializations decreases update the page file. Reducing the number of serializations decreases
CPU utilization, and it also allows us to drastically decrease the CPU utilization, and it also allows us to drastically decrease the
size of the page file. In turn this allows us to increase the size of size of the page file. In turn this allows us to increase the size of
the application's cache of live objects. the application's cache of live objects.
@ -1162,37 +1162,35 @@ page file, increasing the working set of the program, and increasing
disk activity. disk activity.
Furthermore, because objects may be written to disk in an Furthermore, because objects may be written to disk in an
order that differs from the order in which they were updated, we need order that differs from the order in which they were updated,
to maintain multiple LSN's per page. This means we would need to register a violating one of the write-ahead-logging invariants. One way to
callback with the recovery routine to process the LSN's. (A similar deal with this is to maintain multiple LSN's per page. This means we would need to register a
callback will be needed in Section~\ref{sec:zeroCopy}.) Also, callback with the recovery routine to process the LSN's (a similar
we must prevent \yads storage routine from overwriting the per-object callback will be needed in Section~\ref{sec:zeroCopy}), and
extend \yads page format to contain per-record LSN's.
Also, we must prevent \yads storage allocation routine from overwriting the per-object
LSN's of deleted objects that may still be addressed during abort or recovery. LSN's of deleted objects that may still be addressed during abort or recovery.
\yad can support this approach.
Alternatively, we could arrange for the object pool to cooperate Alternatively, we could arrange for the object pool to cooperate
further with the buffer pool by atomically updating the buffer further with the buffer pool by atomically updating the buffer
manager's copy of all objects that share a given page, removing the manager's copy of all objects that share a given page, removing the
need for multiple LSN's per page, and simplifying storage allocation. need for multiple LSN's per page, and simplifying storage allocation.
However, the simplest solution to this problem is based on the observation that However, the simplest solution, and the one we take here, is based on the observation that
updates (not allocations or deletions) to fixed length objects meet updates (not allocations or deletions) to fixed length objects are blind writes.
the requirements of an LSN free transactional update scheme, and that This allows us to do away with per-object LSN's entirely. Allocation and deletion can then be handled
we may do away with per-object LSN's entirely.\endnote{\yad does not
yet implement LSN-free pages. In order to obtain performance
numbers for object serialization, we made use of our LSN page
implementation. The runtime performance impact of LSN-free pages
should be negligible.} Allocation and deletion can then be handled
as updates to normal LSN containing pages. At recovery time, object as updates to normal LSN containing pages. At recovery time, object
updates are executed based on the existence of the object on the page updates are executed based on the existence of the object on the page
and a conservative estimate of its LSN. (If the page doesn't contain and a conservative estimate of its LSN. (If the page doesn't contain
the object during REDO, then it must have been written back to disk the object during REDO, then it must have been written back to disk
after the object was deleted. Therefore, we do not need to apply the after the object was deleted. Therefore, we do not need to apply the
REDO.) This means that the system can ``forget'' about objects that REDO.) This means that the system can ``forget'' about objects that
were freed by committed transaction, simplifying space reuse were freed by committed transactions, simplifying space reuse
tremendously. tremendously.
The third \yad plugin to \oasys incorporates all of these buffer The third \yad plugin to \oasys incorporates the buffer
manager optimizations. However, it only write the changed portions of manager optimizations. However, it only writes the changed portions of
objects to the log. Because of \yad's support for custom log entry objects to the log. Because of \yad's support for custom log entry
formats, this optimization is straightforward. formats, this optimization is straightforward.
@ -1272,18 +1270,18 @@ reordering is inexpensive.}
\end{figure} \end{figure}
Database optimizers operate over relational algebra expressions that Database optimizers operate over relational algebra expressions that
correspond to perform logical operations over streams of data at runtime. \yad correspond to logical operations over streams of data at runtime. \yad
does not provide query languages, relational algebra, or other such query processing primitives. does not provide query languages, relational algebra, or other such query processing primitives.
However, it does include an extensible logging infrastructure, and any However, it does include an extensible logging infrastructure, and many
operations that make user of physiological logging implicitly operations that make use of physiological logging implicitly
implement UNDO (and often REDO) functions that interpret logical implement UNDO (and often REDO) functions that interpret logical
requests. requests.
Logical operations often have some nice properties that this section Logical operations often have some nice properties that this section
will exploit. Because they can be invoked at arbitrary times in the will exploit. Because they can be invoked at arbitrary times in the
future, they tend to be independent of the database's physical state. future, they tend to be independent of the database's physical state.
Often, they correspond to operations that programmer's understand. Often, they correspond to operations that programmers understand.
Because of this, application developers can easily determine whether Because of this, application developers can easily determine whether
logical operations may be reordered, transformed, or even logical operations may be reordered, transformed, or even
@ -1293,7 +1291,7 @@ If requests can be partitioned in a natural way, load
balancing can be implemented by splitting requests across many nodes. balancing can be implemented by splitting requests across many nodes.
Similarly, a node can easily service streams of requests from multiple Similarly, a node can easily service streams of requests from multiple
nodes by combining them into a single log, and processing the log nodes by combining them into a single log, and processing the log
using operaiton implementations. For example, this type of optimization using operation implementations. For example, this type of optimization
is used by RVM's log-merging operations~\cite{rvm}. is used by RVM's log-merging operations~\cite{rvm}.
Furthermore, application-specific Furthermore, application-specific
@ -1313,7 +1311,7 @@ during the traversal of a random graph. The graph traversal system
takes a sequence of (read) requests, and partitions them using some takes a sequence of (read) requests, and partitions them using some
function. It then proceses each partition in isolation from the function. It then proceses each partition in isolation from the
others. We considered two partitioning functions. The first divides the page file others. We considered two partitioning functions. The first divides the page file
up into equally sized contiguous regions, which enables locality. The second takes the hash into equally sized contiguous regions, which increases locality. The second takes the hash
of the page's offset in the file, which enables load balancing. of the page's offset in the file, which enables load balancing.
%% The second policy is interesting %% The second policy is interesting
%The first, partitions the %The first, partitions the
@ -1322,10 +1320,8 @@ of the page's offset in the file, which enables load balancing.
%latency limited, as each node would stream large sequences of %latency limited, as each node would stream large sequences of
%asynchronous requests to the other nodes.) %asynchronous requests to the other nodes.)
The second partitioning function, which was used in our benchmarks, Our benchmarks partition requests by location. We chose the
partitions requests by their position in the page file. We chose the position size so that each partition can fit in \yads buffer pool.
position size so that each partition can fit in \yads buffer pool,
ensuring locality.
We ran two experiments. Both stored a graph of fixed size objects in We ran two experiments. Both stored a graph of fixed size objects in
the growable array implementation that is used as our linear the growable array implementation that is used as our linear
@ -1333,7 +1329,7 @@ hashtable's bucket list.
The first experiment (Figure~\ref{fig:oo7}) The first experiment (Figure~\ref{fig:oo7})
is loosely based on the oo7 database benchmark.~\cite{oo7}. We is loosely based on the oo7 database benchmark.~\cite{oo7}. We
hardcode the out-degree of each node, and use a directed graph. OO7 hardcode the out-degree of each node, and use a directed graph. OO7
constructs graphs by by first connecting nodes together into a ring. constructs graphs by first connecting nodes together into a ring.
It then randomly adds edges between the nodes until the desired It then randomly adds edges between the nodes until the desired
out-degree is obtained. This structure ensures graph connectivity. out-degree is obtained. This structure ensures graph connectivity.
If the nodes are laid out in ring order on disk, it also ensures that If the nodes are laid out in ring order on disk, it also ensures that
@ -1349,7 +1345,7 @@ instead of ring edges for this test. This does not ensure graph
connectivity, but we used the same random seeds for the two systems. connectivity, but we used the same random seeds for the two systems.
When the graph has good locality, a normal depth first search When the graph has good locality, a normal depth first search
traversal and the prioritized traversal both performs well. The traversal and the prioritized traversal both perform well. The
prioritied traversal is slightly slower due to the overhead of extra prioritied traversal is slightly slower due to the overhead of extra
log manipulation. As locality decreases, the partitioned traversal log manipulation. As locality decreases, the partitioned traversal
algorithm's outperforms the naive traversal. algorithm's outperforms the naive traversal.
@ -1357,20 +1353,21 @@ algorithm's outperforms the naive traversal.
\subsection{LSN-Free pages} \subsection{LSN-Free pages}
\label{sec:zeroCopy} \label{sec:zeroCopy}
In Section~\ref{todo}, we describe how operations can avoid recording In Section~\ref{sec:blindWrites}, we describe how operations can avoid recording
LSN's on the pages they modify. Essentially, opeartions that make use LSN's on the pages they modify. Essentially, operations that make use
of purely physical logging need not heed page boundaries, as of purely physical logging need not heed page boundaries, as
physiological operations must. Recall that purely physical logging physiological operations must. Recall that purely physical logging
interacts poorly with concurrent transactions that modify the same interacts poorly with concurrent transactions that modify the same
data structures or pages, so LSN-Free pages are not applicable in all data structures or pages, so LSN-Free pages are not applicable in all
situations. situations.
Consider the retreival of a large (page spanning) object stored on Consider the retrieval of a large (page spanning) object stored on
pages that contain LSN's. The object's data will not be contiguous. pages that contain LSN's. The object's data will not be contiguous.
Therefore, in order to retrive the object, the transaction system must Therefore, in order to retrive the object, the transaction system must
load the pages contained on disk into memory, allocate buffer space to load the pages contained on disk into memory, and perform a byte-by-byte copy of the
allow the object to be read, and perform a byte-by-byte copy of the portions of the pages that contain the large object's data into a second buffer.
portions of the pages that contain the large object's data. Compare
Compare
this approach to a modern filesystem, which allows applications to this approach to a modern filesystem, which allows applications to
perform a DMA copy of the data into memory, avoiding the expensive perform a DMA copy of the data into memory, avoiding the expensive
byte-by-byte copy of the data, and allowing the CPU to be used for byte-by-byte copy of the data, and allowing the CPU to be used for
@ -1391,14 +1388,16 @@ portions of the log (the portion that stores the blob) in the
page file, or other addressable storage. In the worst case, page file, or other addressable storage. In the worst case,
the blob would have to be relocated in order to defragment the the blob would have to be relocated in order to defragment the
storage. Assuming the blob was relocated once, this would amount storage. Assuming the blob was relocated once, this would amount
to a total of three, mostly sequential disk operation. (Two to a total of three, mostly sequential disk operations. (Two
writes and one read.) A conventional blob system would need writes and one read.)
A conventional blob system would need
to write the blob twice, but also may need to create complex to write the blob twice, but also may need to create complex
structures such as B-Trees, or may evict a large number of structures such as B-Trees, or may evict a large number of
unrelated pages from the buffer pool as the blob is being written unrelated pages from the buffer pool as the blob is being written
to disk. to disk.
Alternatively, we could use DMA to overwrite the blob to the page file Alternatively, we could use DMA to overwrite the blob in the page file
in a non-atomic fashion, providing filesystem style semantics. in a non-atomic fashion, providing filesystem style semantics.
(Existing database servers often provide this mode based on the (Existing database servers often provide this mode based on the
observation that many blobs are static data that does not really need observation that many blobs are static data that does not really need
@ -1409,7 +1408,7 @@ objects~\cite{esm}.
Finally, RVM, recoverable virtual memory, made use of LSN-free pages Finally, RVM, recoverable virtual memory, made use of LSN-free pages
so that it could use mmap() to map portions of the page file into so that it could use mmap() to map portions of the page file into
application memory.\cite{rvm} However, without support for logical log entries application memory\cite{rvm}. However, without support for logical log entries
and nested top actions, it would be difficult to implement a and nested top actions, it would be difficult to implement a
concurrent, durable data structure using RVM. We plan to add RVM concurrent, durable data structure using RVM. We plan to add RVM
style transactional memory to \yad in a way that is compatible with style transactional memory to \yad in a way that is compatible with
@ -1423,95 +1422,84 @@ extensions, and explained why can \yad support them. This section
will describe existing ideas in the literature that we would like to will describe existing ideas in the literature that we would like to
incorporate into \yad. incorporate into \yad.
Many approaches toward the physical layout of large objects have been Different large object storage systems provide different API's.
proposed. Some allow arbitrary insertion and deletion of Some allow arbitrary insertion and deletion of bytes~\cite{esm} or
bytes~\cite{esm} or pages~\cite{sqlserver} within the object, while pages~\cite{sqlserver} within the object, while typical filesystems
typical filesystems provide append only storage~\cite{ffs,ntfs}. provide append-only storage allocation~\cite{ffs,ntfs}.
Record-oriented file systems are an older, but still used Record-oriented file systems are an older, but still-used
alternative~\cite{multics,gfs}. None of these alternatives serve all alternative~\cite{vmsFiles11,gfs}. Each of these API's addresses
workloads well. In fact, hybrid systems that use two different different workloads.
storage mechanisms depending on object size are common. Modern
databases that support blobs work this way, and a number of
filesystems pack multiple small files into a single page, while
allocating space by the page or extent for larger files~\cite{reiserfs3,didFFSdoThis}.
Similarly, a multitude of allocation strategies exist. Relational While most filesystems attempt to lay out data in logically sequential
database allocation routines are optimized for dynamic tables of order, write-optimized filesystems lay files out in the order they
relatively homogenous tuples, and often leave portions of pages were written~\cite{lfs}. Schemes to improve locality between small
unallocated to reduce fragmentation. Some filesystems attempt to lay objects exist as well. Relational databases allow users to specify the order
out data in logically sequential order, while log-based filesystems in which tuples will be layed out, and often leave portions of pages
lay files out in the order they were written~\cite{lfs}. Our recent unallocated to reduce fragmentation as new records are allocated.
survey of NTFS and Microsoft SQL Server fragmentation found that
neither system outperforms the other on all workloads, but that their
performance varied wildly. Also, we found that neither system's
allocation algorithm made use of the fact that some of our workloads
consisted of constant sized objects~\cite{msrTechReport}.
Memory allocation routines also address this problem. For example, the Hoard memory
allocator is a highly concurrent version of malloc that
makes use of thread context to allocate memory in a way that favors
cache locality~\cite{hoard}. Other work makes use of the caller's stack to infer
information about memory management.~\cite{xxx} \rcs{Eric, do you have
a reference for this?}
Finally, many systems take a hybrid approach to allocation. Examples include
databases with blob support\cite{something}, and a number of
filesystems~\cite{reiserfs3,didFFSdoThis}.
Although fragmentation becomes less of a concern, allocation of small We are interested in allowing applications to store records in
objects is complex as well, and has been studied extensively in the
programming languages literature as well as the database literature. In particular, the
Hoard memory allocator~\cite{hoard} is a highly concurrent version of
malloc that makes use of thread context to allocate memory in a way
that favors cache locality. More recent work has
made use of the caller's stack to infer information about memory
management.~\cite{xxx} \rcs{Eric, do you have a reference for this?}
We are interested in allowing applcations to store records in
the transacation log. Assuming log fragmentation is kept to a the transacation log. Assuming log fragmentation is kept to a
minimum, this is particularly attractive on a single disk system. We minimum, this is particularly attractive on a single disk system. We
plan to use ideas from LFS~\cite{lfs} and POSTGRES~\cite{postgres} plan to use ideas from LFS~\cite{lfs} and POSTGRES~\cite{postgres}
to implement this. to implement this.
Starburst's~\cite{starburst} physical data model consists of {\em Starburst~\cite{starburst} provides a flexible approach to index
storage methods}. Storage methods support {\em attachment types} managment, and database trigger support, as well as hints for small
that allow triggers and active databases to be implemented. An object layout.
attachment type is associated with some data on disk, and is invoked
via an event queue whenever the data is modified. In addition to
providing triggers, attachment types are used to facilitate index
management. Also, starburst's space allocation routines support hints
that allow the application to request physical locality between
records. While these ideas sound like a good fit with \yad, other
Starburst features, such as a type system that supports multiple
inheritance, and a query language are too high level for our goals.
The Boxwood system provides a networked, fault-tolerant transactional The Boxwood system provides a networked, fault-tolerant transactional
B-Tree and ``Chunk Manager.'' We believe that \yad is an interesting B-Tree and ``Chunk Manager.'' We believe that \yad is an interesting
complement to such a system, especially given \yads focus on complement to such a system, especially given \yads focus on
intelligence and optimizations within a single node, and Boxwoods intelligence and optimizations within a single node, and Boxwood's
focus on multiple node systems. In particular, when implementing focus on multiple node systems. In particular, it would be
applications with predictable locality properties, it would be
interesting to explore extensions to the Boxwood approach that make interesting to explore extensions to the Boxwood approach that make
use of \yads customizable semantics (Section~\ref{wal}), and fully logical logging use of \yads customizable semantics (Section~\ref{wal}), and fully logical logging
mechanism. (Section~\ref{logging}) mechanism. (Section~\ref{logging})
\section{Future Work}
Complexity problems may begin to arise as we attempt to implement more Complexity problems may begin to arise as we attempt to implement more
extensions to \yad. However, we have observered that \yads source extensions to \yad. However, \yads implementation is still fairly simple:
code {\em shrinks} over time. Currently, the code is roughly broken
into three categories:
\begin{itemize} \begin{itemize}
\item The core of \yad which is roughly 3000 lines \item The core of \yad is roughly 3000 lines
of code, and implements the buffer manager, IO, recovery, and other of code, and implements the buffer manager, IO, recovery, and other
sytems sytems
\item Custom operations, which account for another 3000 lines of code \item Custom operations account for another 3000 lines of code
\item Page layouts and logging implementations, which account for 1600 lines of code. \item Page layouts and logging implementations account for 1600 lines of code.
\end{itemize} \end{itemize}
The complexity of the core of \yad is our primary concern, as it The complexity of the core of \yad is our primary concern, as it
contains hardcoded policies and assumptions. Over time, the core has contains hardcoded policies and assumptions. Over time, the core has
shrunk as functionality has been moved into extensions. We exepect shrunk as functionality has been moved into extensions. We exepect
this trend to continue as development progresses. A resource manager this trend to continue as development progresses.
A resource manager
is a common pattern in system software design, and manages is a common pattern in system software design, and manages
dependencies and ordering constraings between sets of components. dependencies and ordering constraints between sets of components.
Over time, we hope to shrink \yads core to the point where it is Over time, we hope to shrink \yads core to the point where it is
essentially a resource manager and the implementation of a few unavoidable simply a resource manager and a set of implementations of a few unavoidable
algorithms related to write-ahead logging, such as a generic recovery algorithms related to write-ahead logging. For instance,
algorithm, and code that manages bookkeeping information, such as we suspect that support for appropriaite callbacks will
LSN's at runtime. \yads current functionality, and some of the algorithms allow us to hardcode a generic recovery agorithm into the
mentioned above would be shipped as modular, well-tested extensions. system. Similarly, and code that manages book-keeping information, such as
Highly specialized \oasys extensions, and other systems would be built LSN's seems to be general enough to be hardcoded.
by reusing \yads default extensions as appropriate.
Of course, we also plan to provide \yads current functionality, including the algorithms
mentioned above as modular, well-tested extensions.
Highly specialized \yad extensions, and other systems would be built
by reusing \yads default extensions and implementing new ones.
\section{Conclusion} \section{Conclusion}
@ -1525,18 +1513,18 @@ limitations of existing systems, breaking guarantees regarding data
integrity, or reimplementing the entire storage infrastructure from integrity, or reimplementing the entire storage infrastructure from
scratch. scratch.
We have experimentally demonstrated that \yad provides fully We have demonstrated that \yad provides fully
concurrent, high performance transactions, and explained how it can concurrent, high performance transactions, and explained how it can
support a number of systems that typically make use of suboptimal or support a number of systems that currently make use of suboptimal or
ad-hoc storage approaches. Finally, we have explained how \yad can be ad-hoc storage approaches. Finally, we have explained how \yad can be
extended in the future to support a larger range of systems. extended in the future to support a larger range of systems.
\section{Acknowledgements} \section{Acknowledgements}
The idea behind the \oasys buffer manager optimization is from Mike The idea behind the \oasys buffer manager optimization is from Mike
Demmer. He and Bowei Du implemented \oasys. Gilad and Amir were Demmer. He and Bowei Du implemented \oasys. Gilad Arnold and Amir Kamil implemented
responsible for pobj. Jim Blomo, Jason Bayer, and Jimmy responsible for pobj. Jim Blomo, Jason Bayer, and Jimmy
Kittiyachavalit worked on an earliy version of \yad. Kittiyachavalit worked on an early version of \yad.
Thanks to C. Mohan for pointing out the need for tombstones with Thanks to C. Mohan for pointing out the need for tombstones with
per-object LSN's. Jim Gray provided feedback on an earlier version of per-object LSN's. Jim Gray provided feedback on an earlier version of