This commit is contained in:
Eric Brewer 2006-08-15 01:00:55 +00:00
parent 8bf2cb65ef
commit 9e4cb7d7c4

View file

@ -141,7 +141,7 @@ management~\cite{perl}, with mixed success~\cite{excel}.
Our hypothesis is that 1) each of these areas has a distinct top-down
conceptual model (which may not map well to the relational model); and
2) there exists a bottom-up layering that can better support all of these
2) there exists a bottom-up layered framework that can better support all of these
models and others.
Just within databases, relational, object-oriented, XML, and streaming
@ -311,7 +311,7 @@ all of these systems. We look at these in more detail in
Section~\ref{related=work}.
In some sense, our hypothesis is trivially true in that there exists a
bottom-up layering called the ``operating system'' that can implement
bottom-up framework called the ``operating system'' that can implement
all of the models. A famous database paper argues that it does so
poorly (Stonebraker 1980~\cite{Stonebraker80}). Our task is really to
simplify the implementation of transactional systems through more
@ -328,7 +328,7 @@ databases~\cite{libtp}. At its core, it provides the physical database model
%most relational database systems~\cite{libtp}.
In particular,
it provides fully transactional (ACID) operations over B-Trees,
hashtables, and other access methods. It provides flags that
hash tables, and other access methods. It provides flags that
let its users tweak various aspects of the performance of these
primitives, and selectively disable the features it provides.
@ -437,7 +437,7 @@ it into the operation implementation.
In this portion of the discussion, operations are limited
to a single page, and provide an undo function. Operations that
affect multiple pages and that do not provide inverses will be
affect multiple pages or do not provide inverses will be
discussed later.
Operations are limited to a single page because their results must be
@ -452,8 +452,8 @@ pages and failed sectors, this does not
require any sort of logging, but is quite inefficient in practice, as
it forces the disk to perform a potentially random write each time the
page file is updated. The rest of this section describes how recovery
can be extended, first to efficiently support multiple operations per
transaction, and then to allow more than one transaction to modify the
can be extended, first to support multiple operations per
transaction efficiently, and then to allow more than one transaction to modify the
same data before committing.
\subsubsection{\yads Recovery Algorithm}
@ -461,12 +461,11 @@ same data before committing.
Recovery relies upon the fact that each log entry is assigned a {\em
Log Sequence Number (LSN)}. The LSN is monitonically increasing and
unique. The LSN of the log entry that was most recently applied to
each page is stored with the page, which allows recovery to selectively
replay log entries. This only works if log entries change exactly one
each page is stored with the page, which allows recovery to replay log entries selectively. This only works if log entries change exactly one
page and if they are applied to the page atomically.
Recovery occurs in three phases, Analysis, Redo and Undo.
``Analysis'' is beyond the scope of this paper. ``Redo'' plays the
``Analysis'' is beyond the scope of this paper, but essentially determines the commit/abort status of every transaction. ``Redo'' plays the
log forward in time, applying any updates that did not make it to disk
before the system crashed. ``Undo'' runs the log backwards in time,
only applying portions that correspond to aborted transactions. This
@ -475,7 +474,7 @@ the distinction between physical and logical undo.
A summary of the stages of recovery and the invariants
they establish is presented in Figure~\ref{fig:conventional-recovery}.
Redo is the only phase that makes use of LSN's stored on pages.
Redo is the only phase that makes use of LSNs stored on pages.
It simply compares the page LSN to the LSN of each log entry. If the
log entry's LSN is higher than the page LSN, then the log entry is
applied. Otherwise, the log entry is skipped. Redo does not write
@ -556,12 +555,11 @@ increases concurrency. However, it means that follow-on transactions that use
that data may need to abort if a current transaction aborts ({\em
cascading aborts}). %Related issues are studied in great detail in terms of optimistic concurrency control~\cite{optimisticConcurrencyControl, optimisticConcurrencyPerformance}.
Unfortunately, the long locks held by total isolation cause bottlenecks when applied to key
data structures.
Nested top actions are essentially mini-transactions that can
commit even if their containing transaction aborts; thus follow-on
transactions can use the data structure without fear of cascading
aborts.
Unfortunately, the long locks held by total isolation cause
bottlenecks when applied to key data structures. Nested top actions
are essentially mini-transactions that can commit even if their
containing transaction aborts; thus follow-on transactions can use the
data structure without fear of cascading aborts.
The key idea is to distinguish between the {\em logical operations} of a
data structure, such as inserting a key, and the {\em physical operations}
@ -593,7 +591,7 @@ concurrent operations:
to use finer-grained latches in a \yad operation, but it is rarely necessary.
\item Define a {\em logical} UNDO for each operation (rather than just
using a set of page-level UNDO's). For example, this is easy for a
hashtable: the UNDO for {\em insert} is {\em remove}. This logical
hash table: the UNDO for {\em insert} is {\em remove}. This logical
undo function should arrange to acquire the mutex when invoked by
abort or recovery.
\item Add a ``begin nested top action'' right after the mutex
@ -626,7 +624,7 @@ not able to safely combine them to create concurrent transactions.
Note that the transactions described above only provide the
``Atomicity'' and ``Durability'' properties of ACID.\endnote{The ``A'' in ACID really means atomic persistence
of data, rather than atomic in-memory updates, as the term is normally
used in systems work; %~\cite{GR97};
used in systems work~\cite{GR97};
the latter is covered by ``C'' and
``I''.} ``Isolation'' is
typically provided by locking, which is a higher-level but
@ -679,22 +677,22 @@ We make no assumptions regarding lock managers being used by higher-level code i
\section{LSN-free pages.}
\label{sec:lsn-free}
The recovery algorithm described above uses LSN's to determine the
The recovery algorithm described above uses LSNs to determine the
version number of each page during recovery. This is a common
technique. As far as we know, is used by all database systems that
update data in place. Unfortunately, this makes it difficult to map
large objects onto pages, as the LSN's break up the object. It
is tempting to store the LSN's elsewhere, but then they would not be
large objects onto pages, as the LSNs break up the object. It
is tempting to store the LSNs elsewhere, but then they would not be
written atomically with their page, which defeats their purpose.
This section explains how we can avoid storing LSN's on pages in \yad
This section explains how we can avoid storing LSNs on pages in \yad
without giving up durable transactional updates. The techniques here
are similar to those used by RVM~\cite{lrvm}, a system that supports
transactional updates to virtual memory. However, \yad generalizes
the concept, allowing it to co-exist with traditional pages and fully
support concurrent transactions.
In the process of removing LSN's from pages, we
In the process of removing LSNs from pages, we
are able to relax the atomicity assumptions that we make regarding
writes to disk. These relaxed assumptions allow recovery to repair
torn pages without performing media recovery, and allow arbitrary
@ -707,7 +705,7 @@ protocol for atomically and durably applying updates to the page file.
This will require the addition of a new page type (\yad currently has
3 such types, not including a few minor variants). The new page type
will need to communicate with the logger and recovery modules in order
to estimate page LSN's, which will need to make use of callbacks in
to estimate page LSNs, which will need to make use of callbacks in
those modules. Of course, upon providing support for LSN free pages,
we will want to add operations to \yad that make use of them. We plan
to eventually support the coexistance of LSN-free pages, traditional
@ -715,7 +713,7 @@ pages, and similar third-party modules within the same page file, log,
transactions, and even logical operations.
\subsection{Blind writes}
Recall that LSN's were introduced to prevent recovery from applying
Recall that LSNs were introduced to prevent recovery from applying
updates more than once, and to prevent recovery from applying old
updates to newer versions of pages. This was necessary because some
operations that manipulate pages are not idempotent, or simply make
@ -769,14 +767,14 @@ practical problem.
The rest of this section describes how concurrent, LSN-free pages
allow standard file system and database optimizations to be easily
combined, and shows that the removal of LSN's from pages actually
combined, and shows that the removal of LSNs from pages actually
simplifies some aspects of recovery.
\subsection{Zero-copy I/O}
We originally developed LSN-free pages as an efficient method for
transactionally storing and updating large (multi-page) objects. If a
large object is stored in pages that contain LSN's, then in order to
large object is stored in pages that contain LSNs, then in order to
read that large object the system must read each page individually,
and then use the CPU to perform a byte-by-byte copy of the portions of
the page that contain object data into a second buffer.
@ -819,14 +817,14 @@ objects~\cite{esm}.
Our LSN-free pages are somewhat similar to the recovery scheme used by
RVM, recoverable virtual memory. \rcs{, and camelot, argus(?)} That system used purely physical
logging and LSN-free pages so that it could use mmap() to map portions
of the page file into application memory\cite{lrvm}. However, without
of the page file into application memory~\cite{lrvm}. However, without
support for logical log entries and nested top actions, it would be
difficult to implement a concurrent, durable data structure using RVM.
In contrast, LSN-free pages allow for logical undo, allowing for the
use of nested top actions and concurrent transactions.
We plan to add RVM style transactional memory to \yad in a way that is
We plan to add RVM-style transactional memory to \yad in a way that is
compatible with fully concurrent collections such as hash tables and
tree structures. Of course, since \yad will support coexistance of
conventional and LSN-free pages, applications would be free to use the
@ -835,7 +833,7 @@ conventional and LSN-free pages, applications would be free to use the
\subsection{Page-independent transactions}
\label{sec:torn-page}
\rcs{I don't like this section heading...} Recovery schemes that make
use of per-page LSN's assume that each page is written to disk
use of per-page LSNs assume that each page is written to disk
atomically even though that is generally not the case. Such schemes
deal with this problem by using page formats that allow partially
written pages to be detected. Media recovery allows them to recover
@ -944,7 +942,7 @@ around typical problems with existing transactional storage systems.
system. Many of the customizations described below can be implemented
using custom log operations. In this section, we describe how to implement an
``ARIES style'' concurrent, steal/no-force operation using
\diff{physical redo, logical undo} and per-page LSN's.
\diff{physical redo, logical undo} and per-page LSNs.
Such operations are typical of high-performance commercial database
engines.
@ -973,7 +971,7 @@ with. UNDO works analogously, but is invoked when an operation must
be undone (usually due to an aborted transaction, or during recovery).
This pattern applies in many cases. In
order to implement a ``typical'' operation, the operations
order to implement a ``typical'' operation, the operation's
implementation must obey a few more invariants:
\begin{itemize}
@ -983,22 +981,27 @@ implementation must obey a few more invariants:
during REDO, then the wrapper should use a latch to protect against
concurrent attempts to update the sensitive data (and against
concurrent attempts to allocate log entries that update the data).
\item Nested top actions (and logical undo), or ``big locks'' (total isolation but lower concurrency) should be used to implement multi-page updates. (Section~\ref{sec:nta})
\item Nested top actions (and logical undo) or ``big locks'' (total isolation but lower concurrency) should be used to manage concurrency (Section~\ref{sec:nta}).
\end{itemize}
\section{Experiments}
\label{experiments}
\eab{add transition that explains where we are going}
\subsection{Experimental setup}
\label{sec:experimental_setup}
We chose Berkeley DB in the following experiments because, among
commonly used systems, it provides transactional storage primitives
that are most similar to \yad. Also, Berkeley DB is commercially
supported and is designed to provide high performance and high
that are most similar to \yad. Also, Berkeley DB is
supported commercially and is designed to provide high performance and high
concurrency. For all tests, the two libraries provide the same
transactional semantics, unless explicitly noted.
transactional semantics unless explicitly noted.
All benchmarks were run on an Intel Xeon 2.8 GHz with 1GB of RAM and a
10K RPM SCSI drive formatted using with ReiserFS~\cite{reiserfs}.\endnote{We found that the
@ -1039,15 +1042,17 @@ multiple machines and file systems.
\subsection{Linear hash table}
\label{sec:lht}
\begin{figure}[t]
\includegraphics[%
width=1\columnwidth]{figs/bulk-load.pdf}
%\includegraphics[%
% width=1\columnwidth]{bulk-load-raw.pdf}
%\vspace{-30pt}
\caption{\sf\label{fig:BULK_LOAD} Performance of \yad and Berkeley DB hashtable implementations. The
\caption{\sf\label{fig:BULK_LOAD} Performance of \yad and Berkeley DB hash table implementations. The
test is run as a single transaction, minimizing overheads due to synchronous log writes.}
\end{figure}
\begin{figure}[t]
%\hspace*{18pt}
%\includegraphics[%
@ -1055,35 +1060,37 @@ test is run as a single transaction, minimizing overheads due to synchronous log
\includegraphics[%
width=1\columnwidth]{figs/tps-extended.pdf}
%\vspace{-36pt}
\caption{\sf\label{fig:TPS} High concurrency performance of Berkeley DB and \yad. We were unable to get Berkeley DB to work correctly with more than 50 threads. (See text)
\caption{\sf\label{fig:TPS} High concurrency hash table performance of Berkeley DB and \yad. We were unable to get Berkeley DB to work correctly with more than 50 threads (see text).
}
\end{figure}
Although the beginning of this paper describes the limitations of
physical database models and relational storage systems in great
detail, these systems are the basis of most common transactional
storage routines. Therefore, we implement a key-based access
method in this section. We argue that
obtaining reasonable performance in such a system under \yad is
straightforward. We then compare our simple, straightforward
implementation to our hand-tuned version and Berkeley DB's implementation.
storage routines. Therefore, we implement a key-based access method
in this section. We argue that obtaining reasonable performance in
such a system under \yad is straightforward. We then compare our
simple, straightforward implementation to our hand-tuned version and
Berkeley DB's implementation.
The simple hash table uses nested top actions to update its
internal structure atomically. It uses a {\em linear} hash function~\cite{lht}, allowing
it to incrementally grow its buffer list. It is based on a number of
modular subcomponents. Notably, its bucket list is a growable array
of fixed length entries (a linkset, in the terms of the physical
database model) and the user's choice of two different linked list
implementations.
The simple hash table uses nested top actions to update its internal
structure atomically. It uses a {\em linear} hash
function~\cite{lht}, allowing it to increase capacity
incrementally. It is based on a number of modular subcomponents.
Notably, its ``table'' is a growable array of fixed-length entries (a
linkset, in the terms of the physical database model) and the user's
choice of two different linked-list implementations. \eab{still
unclear}
The hand-tuned hashtable also uses a linear hash
The hand-tuned hash table is also built on \yad and also uses a linear hash
function. However, it is monolithic and uses carefully ordered writes to
reduce runtime overheads such as log bandwidth. Berkeley DB's
hashtable is a popular, commonly deployed implementation, and serves
hash table is a popular, commonly deployed implementation, and serves
as a baseline for our experiments.
Both of our hashtables outperform Berkeley DB on a workload that
bulk loads the tables by repeatedly inserting (key, value) pairs.
Both of our hash tables outperform Berkeley DB on a workload that bulk
loads the tables by repeatedly inserting (key, value) pairs
(Figure~\ref{fig:BULK_LOAD}).
%although we do not wish to imply this is always the case.
%We do not claim that our partial implementation of \yad
%generally outperforms, or is a robust alternative
@ -1122,13 +1129,12 @@ a single synchronous I/O.\endnote{The multi-threaded benchmarks
\yad scaled quite well, delivering over 6000 transactions per
second,\endnote{The concurrency test was run without lock managers, and the
transactions obeyed the A, C, and D properties. Since each
transaction performed exactly one hashtable write and no reads, they also
transaction performed exactly one hash table write and no reads, they also
obeyed I (isolation) in a trivial sense.} and provided roughly
double Berkeley DB's throughput (up to 50 threads). We do not report
the data here, but we implemented a simple load generator that makes
use of a fixed pool of threads with a fixed think time. We found that
the latencies of Berkeley DB and \yad were similar, showing that \yad is
not simply trading latency for throughput during the concurrency benchmark.
double Berkeley DB's throughput (up to 50 threads). Although not
shown here, we found that the latencies of Berkeley DB and \yad were
similar, which confirms that \yad is not simply trading latency for
throughput during the concurrency benchmark.
\begin{figure*}
@ -1140,10 +1146,12 @@ not simply trading latency for throughput during the concurrency benchmark.
The effect of \yad object serialization optimizations under low and high memory pressure.}
\end{figure*}
\subsection{Object persistence}
\label{sec:oasys}
Numerous schemes are used for object serialization. Support for two
different styles of object serialization have been implemented in
different styles of object serialization has been implemented in
\yad. We could have just as easily implemented a persistence
mechanism for a statically typed functional programming language, a
dynamically typed scripting language, or a particular application,
@ -1160,17 +1168,21 @@ serialization library, \oasys. \oasys makes use of pluggable storage
modules that implement persistent storage, and includes plugins
for Berkeley DB and MySQL.
This section will describe how the \yad
\oasys plugin reduces amount of data written to log, while using half as much system
memory as the other two systems.
This section will describe how the \yad \oasys plugin reduces the
amount of data written to log, while using half as much system memory
as the other two systems.
We present three variants of the \yad plugin here. The first treats \yad like
Berkeley DB. The second, ``update/flush'' customizes the behavior of the buffer
manager. Instead of maintaining an up-to-date version of each object
in the buffer manager or page file, it allows the buffer manager's
view of live application objects to become stale. This is safe since
the system is always able to reconstruct the appropriate page entry
from the live copy of the object.
We present three variants of the \yad plugin here. The first treats
\yad like Berkeley DB. The second, the ``update/flush'' variant
customizes the behavior of the buffer manager, and the third,
``delta'', extends the second wiht support for logging only the deltas
between versions.
The update/flush variant avoids maintaining an up-to-date
version of each object in the buffer manager or page file: it allows
the buffer manager's view of live application objects to become stale.
This is safe since the system is always able to reconstruct the
appropriate page entry from the live copy of the object.
By allowing the buffer manager to contain stale data, we reduce the
number of times the \yad \oasys plugin must update serialized objects in the buffer manager.
@ -1186,41 +1198,45 @@ updates the page file.
The reason it would be difficult to do this with Berkeley DB is that
we still need to generate log entries as the object is being updated.
This would cause Berkeley DB to write data back to the
page file, increasing the working set of the program, and increasing
disk activity.
This would cause Berkeley DB to write data back to the page file,
increasing the working set of the program, and increasing disk
activity.
Furthermore, objects may be written to disk in an
order that differs from the order in which they were updated,
violating one of the write-ahead logging invariants. One way to
deal with this is to maintain multiple LSN's per page. This means we would need to register a
callback with the recovery routine to process the LSN's (a similar
deal with this is to maintain multiple LSNs per page. This means we would need to register a
callback with the recovery routine to process the LSNs (a similar
callback will be needed in Section~\ref{sec:zeroCopy}), and
extend \yads page format to contain per-record LSN's.
extend \yads page format to contain per-record LSNs.
Also, we must prevent \yads storage allocation routine from overwriting the per-object
LSN's of deleted objects that may still be addressed during abort or recovery.
LSNs of deleted objects that may still be addressed during abort or recovery.\eab{tombstones discussion here?}
\eab{we should at least implement this callback if we have not already}
Alternatively, we could arrange for the object pool to cooperate
further with the buffer pool by atomically updating the buffer
manager's copy of all objects that share a given page, removing the
need for multiple LSN's per page, and simplifying storage allocation.
need for multiple LSNs per page, and simplifying storage allocation.
However, the simplest solution, and the one we take here, is based on the observation that
updates (not allocations or deletions) of fixed length objects are blind writes.
This allows us to do away with per-object LSN's entirely. Allocation and deletion can then be handled
as updates to normal LSN containing pages. At recovery time, object
updates are executed based on the existence of the object on the page
and a conservative estimate of its LSN. (If the page doesn't contain
the object during REDO then it must have been written back to disk
after the object was deleted. Therefore, we do not need to apply the
REDO.) This means that the system can ``forget'' about objects that
were freed by committed transactions, simplifying space reuse
tremendously. (Because LSN-free pages and recovery are not yet implemented,
this benchmark mimics their behavior at runtime, but does not support recovery.)
However, the simplest solution, and the one we take here, is based on
the observation that updates (not allocations or deletions) of
fixed-length objects are blind writes. This allows us to do away with
per-object LSNs entirely. Allocation and deletion can then be
handled as updates to normal LSN containing pages. At recovery time,
object updates are executed based on the existence of the object on
the page and a conservative estimate of its LSN. (If the page doesn't
contain the object during REDO then it must have been written back to
disk after the object was deleted. Therefore, we do not need to apply
the REDO.) This means that the system can ``forget'' about objects
that were freed by committed transactions, simplifying space reuse
tremendously. (Because LSN-free pages and recovery are not yet
implemented, this benchmark mimics their behavior at runtime, but does
not support recovery.)
The third \yad plugin, ``delta'' incorporates the buffer
manager optimizations. However, it only writes the changed portions of
objects to the log. Because of \yads support for custom log entry
The third plugin variant, ``delta'', incorporates the update/flush
optimizations, but only writes the changed portions of
objects to the log. Because of \yads support for custom log-entry
formats, this optimization is straightforward.
%In addition to the buffer-pool optimizations, \yad provides several
@ -1264,8 +1280,8 @@ close, but does not quite provide the correct durability semantics.)
The operations required for these two optimizations required
150 lines of C code, including whitespace, comments and boilerplate
function registrations.\endnote{These figures do not include the
simple LSN free object logic required for recovery, as \yad does not
yet support LSN free operations.} Although the reasoning required
simple LSN-free object logic required for recovery, as \yad does not
yet support LSN-free operations.} Although the reasoning required
to ensure the correctness of this code is complex, the simplicity of
the implementation is encouraging.
@ -1289,6 +1305,9 @@ we see that update/flush indeed improves memory utilization.
\subsection{Manipulation of logical log entries}
\eab{this section unclear, including title}
\label{sec:logging}
\begin{figure}
\includegraphics[width=1\columnwidth]{figs/graph-traversal.pdf}
@ -1345,7 +1364,7 @@ is used by RVM's log-merging operations~\cite{lrvm}.
Furthermore, application-specific
procedures that are analogous to standard relational algebra methods
(join, project and select) could be used to efficiently transform the data
while it is still layed out sequentially
while it is still laid out sequentially
in non-transactional memory.
%Note that read-only operations do not necessarily generate log
@ -1371,9 +1390,9 @@ position size so that each partition can fit in \yads buffer pool.
We ran two experiments. Both stored a graph of fixed size objects in
the growable array implementation that is used as our linear
hashtable's bucket list.
hash table's bucket list.
The first experiment (Figure~\ref{fig:oo7})
is loosely based on the OO7 database benchmark.~\cite{oo7}. We
is loosely based on the OO7 database benchmark~\cite{oo7}. We
hard-code the out-degree of each node, and use a directed graph. OO7
constructs graphs by first connecting nodes together into a ring.
It then randomly adds edges between the nodes until the desired
@ -1583,7 +1602,7 @@ databases~\cite{libtp}. At its core, it provides the physical database model
%most relational database systems~\cite{libtp}.
In particular,
it provides fully transactional (ACID) operations over B-Trees,
hashtables, and other access methods. It provides flags that
hash tables, and other access methods. It provides flags that
let its users tweak various aspects of the performance of these
primitives, and selectively disable the features it provides.
@ -1642,14 +1661,16 @@ Although most file systems attempt to lay out data in logically sequential
order, write-optimized file systems lay files out in the order they
were written~\cite{lfs}. Schemes to improve locality between small
objects exist as well. Relational databases allow users to specify the order
in which tuples will be layed out, and often leave portions of pages
in which tuples will be laid out, and often leave portions of pages
unallocated to reduce fragmentation as new records are allocated.
\rcs{The new allocator is written + working, so this should be reworded. We have one that is based on hoard; support for other possibilities would be nice.}
Memory allocation routines also address this problem. For example, the Hoard memory
allocator is a highly concurrent version of malloc that
makes use of thread context to allocate memory in a way that favors
cache locality~\cite{hoard}. %Other work makes use of the caller's stack to infer
cache locality~\cite{hoard}.
%Other work makes use of the caller's stack to infer
%information about memory management.~\cite{xxx} \rcs{Eric, do you have
% a reference for this?}
@ -1664,7 +1685,7 @@ plan to use ideas from LFS~\cite{lfs} and POSTGRES~\cite{postgres}
to implement this.
Starburst~\cite{starburst} provides a flexible approach to index
management, and database trigger support, as well as hints for small
management and database trigger support, as well as hints for small
object layout.
The Boxwood system provides a networked, fault-tolerant transactional
@ -1673,8 +1694,8 @@ complement to such a system, especially given \yads focus on
intelligence and optimizations within a single node, and Boxwood's
focus on multiple node systems. In particular, it would be
interesting to explore extensions to the Boxwood approach that make
use of \yads customizable semantics (Section~\ref{sec:wal}), and fully logical logging
mechanism. (Section~\ref{sec:logging})
use of \yads customizable semantics (Section~\ref{sec:wal}) and fully logical logging
mechanisms (Section~\ref{sec:logging}).
@ -1706,7 +1727,7 @@ algorithms related to write-ahead logging. For instance,
we suspect that support for appropriate callbacks will
allow us to hard-code a generic recovery algorithm into the
system. Similarly, any code that manages book-keeping information, such as
LSN's may be general enough to be hard-coded.
LSNs may be general enough to be hard-coded.
Of course, we also plan to provide \yads current functionality, including the algorithms
mentioned above as modular, well-tested extensions.
@ -1733,13 +1754,15 @@ extended in the future to support a larger range of systems.
\section{Acknowledgements}
The idea behind the \oasys buffer manager optimization is from Mike
Demmer. He and Bowei Du implemented \oasys. Gilad Arnold and Amir Kamil implemented
Thanks to shepherd Bill Weihl for helping us present these ideas well,
or at least better. The idea behind the \oasys buffer manager
optimization is from Mike Demmer. He and Bowei Du implemented \oasys.
Gilad Arnold and Amir Kamil implemented
pobj. Jim Blomo, Jason Bayer, and Jimmy
Kittiyachavalit worked on an early version of \yad.
Thanks to C. Mohan for pointing out the need for tombstones with
per-object LSN's. Jim Gray provided feedback on an earlier version of
per-object LSNs. Jim Gray provided feedback on an earlier version of
this paper, and suggested we use a resource manager to manage
dependencies within \yads API. Joe Hellerstein and Mike Franklin
provided us with invaluable feedback.