diff --git a/doc/paper3/LLADD.tex b/doc/paper3/LLADD.tex index 20851db..52b5f75 100644 --- a/doc/paper3/LLADD.tex +++ b/doc/paper3/LLADD.tex @@ -141,7 +141,7 @@ management~\cite{perl}, with mixed success~\cite{excel}. Our hypothesis is that 1) each of these areas has a distinct top-down conceptual model (which may not map well to the relational model); and -2) there exists a bottom-up layering that can better support all of these +2) there exists a bottom-up layered framework that can better support all of these models and others. Just within databases, relational, object-oriented, XML, and streaming @@ -311,7 +311,7 @@ all of these systems. We look at these in more detail in Section~\ref{related=work}. In some sense, our hypothesis is trivially true in that there exists a -bottom-up layering called the ``operating system'' that can implement +bottom-up framework called the ``operating system'' that can implement all of the models. A famous database paper argues that it does so poorly (Stonebraker 1980~\cite{Stonebraker80}). Our task is really to simplify the implementation of transactional systems through more @@ -328,7 +328,7 @@ databases~\cite{libtp}. At its core, it provides the physical database model %most relational database systems~\cite{libtp}. In particular, it provides fully transactional (ACID) operations over B-Trees, -hashtables, and other access methods. It provides flags that +hash tables, and other access methods. It provides flags that let its users tweak various aspects of the performance of these primitives, and selectively disable the features it provides. @@ -437,7 +437,7 @@ it into the operation implementation. In this portion of the discussion, operations are limited to a single page, and provide an undo function. Operations that -affect multiple pages and that do not provide inverses will be +affect multiple pages or do not provide inverses will be discussed later. Operations are limited to a single page because their results must be @@ -452,8 +452,8 @@ pages and failed sectors, this does not require any sort of logging, but is quite inefficient in practice, as it forces the disk to perform a potentially random write each time the page file is updated. The rest of this section describes how recovery -can be extended, first to efficiently support multiple operations per -transaction, and then to allow more than one transaction to modify the +can be extended, first to support multiple operations per +transaction efficiently, and then to allow more than one transaction to modify the same data before committing. \subsubsection{\yads Recovery Algorithm} @@ -461,12 +461,11 @@ same data before committing. Recovery relies upon the fact that each log entry is assigned a {\em Log Sequence Number (LSN)}. The LSN is monitonically increasing and unique. The LSN of the log entry that was most recently applied to -each page is stored with the page, which allows recovery to selectively -replay log entries. This only works if log entries change exactly one +each page is stored with the page, which allows recovery to replay log entries selectively. This only works if log entries change exactly one page and if they are applied to the page atomically. Recovery occurs in three phases, Analysis, Redo and Undo. -``Analysis'' is beyond the scope of this paper. ``Redo'' plays the +``Analysis'' is beyond the scope of this paper, but essentially determines the commit/abort status of every transaction. ``Redo'' plays the log forward in time, applying any updates that did not make it to disk before the system crashed. ``Undo'' runs the log backwards in time, only applying portions that correspond to aborted transactions. This @@ -475,7 +474,7 @@ the distinction between physical and logical undo. A summary of the stages of recovery and the invariants they establish is presented in Figure~\ref{fig:conventional-recovery}. -Redo is the only phase that makes use of LSN's stored on pages. +Redo is the only phase that makes use of LSNs stored on pages. It simply compares the page LSN to the LSN of each log entry. If the log entry's LSN is higher than the page LSN, then the log entry is applied. Otherwise, the log entry is skipped. Redo does not write @@ -556,12 +555,11 @@ increases concurrency. However, it means that follow-on transactions that use that data may need to abort if a current transaction aborts ({\em cascading aborts}). %Related issues are studied in great detail in terms of optimistic concurrency control~\cite{optimisticConcurrencyControl, optimisticConcurrencyPerformance}. -Unfortunately, the long locks held by total isolation cause bottlenecks when applied to key -data structures. -Nested top actions are essentially mini-transactions that can -commit even if their containing transaction aborts; thus follow-on -transactions can use the data structure without fear of cascading -aborts. +Unfortunately, the long locks held by total isolation cause +bottlenecks when applied to key data structures. Nested top actions +are essentially mini-transactions that can commit even if their +containing transaction aborts; thus follow-on transactions can use the +data structure without fear of cascading aborts. The key idea is to distinguish between the {\em logical operations} of a data structure, such as inserting a key, and the {\em physical operations} @@ -593,7 +591,7 @@ concurrent operations: to use finer-grained latches in a \yad operation, but it is rarely necessary. \item Define a {\em logical} UNDO for each operation (rather than just using a set of page-level UNDO's). For example, this is easy for a - hashtable: the UNDO for {\em insert} is {\em remove}. This logical + hash table: the UNDO for {\em insert} is {\em remove}. This logical undo function should arrange to acquire the mutex when invoked by abort or recovery. \item Add a ``begin nested top action'' right after the mutex @@ -626,7 +624,7 @@ not able to safely combine them to create concurrent transactions. Note that the transactions described above only provide the ``Atomicity'' and ``Durability'' properties of ACID.\endnote{The ``A'' in ACID really means atomic persistence of data, rather than atomic in-memory updates, as the term is normally -used in systems work; %~\cite{GR97}; +used in systems work~\cite{GR97}; the latter is covered by ``C'' and ``I''.} ``Isolation'' is typically provided by locking, which is a higher-level but @@ -679,22 +677,22 @@ We make no assumptions regarding lock managers being used by higher-level code i \section{LSN-free pages.} \label{sec:lsn-free} -The recovery algorithm described above uses LSN's to determine the +The recovery algorithm described above uses LSNs to determine the version number of each page during recovery. This is a common technique. As far as we know, is used by all database systems that update data in place. Unfortunately, this makes it difficult to map -large objects onto pages, as the LSN's break up the object. It -is tempting to store the LSN's elsewhere, but then they would not be +large objects onto pages, as the LSNs break up the object. It +is tempting to store the LSNs elsewhere, but then they would not be written atomically with their page, which defeats their purpose. -This section explains how we can avoid storing LSN's on pages in \yad +This section explains how we can avoid storing LSNs on pages in \yad without giving up durable transactional updates. The techniques here are similar to those used by RVM~\cite{lrvm}, a system that supports transactional updates to virtual memory. However, \yad generalizes the concept, allowing it to co-exist with traditional pages and fully support concurrent transactions. -In the process of removing LSN's from pages, we +In the process of removing LSNs from pages, we are able to relax the atomicity assumptions that we make regarding writes to disk. These relaxed assumptions allow recovery to repair torn pages without performing media recovery, and allow arbitrary @@ -707,7 +705,7 @@ protocol for atomically and durably applying updates to the page file. This will require the addition of a new page type (\yad currently has 3 such types, not including a few minor variants). The new page type will need to communicate with the logger and recovery modules in order -to estimate page LSN's, which will need to make use of callbacks in +to estimate page LSNs, which will need to make use of callbacks in those modules. Of course, upon providing support for LSN free pages, we will want to add operations to \yad that make use of them. We plan to eventually support the coexistance of LSN-free pages, traditional @@ -715,7 +713,7 @@ pages, and similar third-party modules within the same page file, log, transactions, and even logical operations. \subsection{Blind writes} -Recall that LSN's were introduced to prevent recovery from applying +Recall that LSNs were introduced to prevent recovery from applying updates more than once, and to prevent recovery from applying old updates to newer versions of pages. This was necessary because some operations that manipulate pages are not idempotent, or simply make @@ -769,14 +767,14 @@ practical problem. The rest of this section describes how concurrent, LSN-free pages allow standard file system and database optimizations to be easily -combined, and shows that the removal of LSN's from pages actually +combined, and shows that the removal of LSNs from pages actually simplifies some aspects of recovery. \subsection{Zero-copy I/O} We originally developed LSN-free pages as an efficient method for transactionally storing and updating large (multi-page) objects. If a -large object is stored in pages that contain LSN's, then in order to +large object is stored in pages that contain LSNs, then in order to read that large object the system must read each page individually, and then use the CPU to perform a byte-by-byte copy of the portions of the page that contain object data into a second buffer. @@ -819,14 +817,14 @@ objects~\cite{esm}. Our LSN-free pages are somewhat similar to the recovery scheme used by RVM, recoverable virtual memory. \rcs{, and camelot, argus(?)} That system used purely physical logging and LSN-free pages so that it could use mmap() to map portions -of the page file into application memory\cite{lrvm}. However, without +of the page file into application memory~\cite{lrvm}. However, without support for logical log entries and nested top actions, it would be difficult to implement a concurrent, durable data structure using RVM. In contrast, LSN-free pages allow for logical undo, allowing for the use of nested top actions and concurrent transactions. -We plan to add RVM style transactional memory to \yad in a way that is +We plan to add RVM-style transactional memory to \yad in a way that is compatible with fully concurrent collections such as hash tables and tree structures. Of course, since \yad will support coexistance of conventional and LSN-free pages, applications would be free to use the @@ -835,7 +833,7 @@ conventional and LSN-free pages, applications would be free to use the \subsection{Page-independent transactions} \label{sec:torn-page} \rcs{I don't like this section heading...} Recovery schemes that make -use of per-page LSN's assume that each page is written to disk +use of per-page LSNs assume that each page is written to disk atomically even though that is generally not the case. Such schemes deal with this problem by using page formats that allow partially written pages to be detected. Media recovery allows them to recover @@ -944,7 +942,7 @@ around typical problems with existing transactional storage systems. system. Many of the customizations described below can be implemented using custom log operations. In this section, we describe how to implement an ``ARIES style'' concurrent, steal/no-force operation using -\diff{physical redo, logical undo} and per-page LSN's. +\diff{physical redo, logical undo} and per-page LSNs. Such operations are typical of high-performance commercial database engines. @@ -973,7 +971,7 @@ with. UNDO works analogously, but is invoked when an operation must be undone (usually due to an aborted transaction, or during recovery). This pattern applies in many cases. In -order to implement a ``typical'' operation, the operations +order to implement a ``typical'' operation, the operation's implementation must obey a few more invariants: \begin{itemize} @@ -983,22 +981,27 @@ implementation must obey a few more invariants: during REDO, then the wrapper should use a latch to protect against concurrent attempts to update the sensitive data (and against concurrent attempts to allocate log entries that update the data). -\item Nested top actions (and logical undo), or ``big locks'' (total isolation but lower concurrency) should be used to implement multi-page updates. (Section~\ref{sec:nta}) +\item Nested top actions (and logical undo) or ``big locks'' (total isolation but lower concurrency) should be used to manage concurrency (Section~\ref{sec:nta}). \end{itemize} + + + + \section{Experiments} +\label{experiments} + +\eab{add transition that explains where we are going} + \subsection{Experimental setup} - - - \label{sec:experimental_setup} We chose Berkeley DB in the following experiments because, among commonly used systems, it provides transactional storage primitives -that are most similar to \yad. Also, Berkeley DB is commercially -supported and is designed to provide high performance and high +that are most similar to \yad. Also, Berkeley DB is +supported commercially and is designed to provide high performance and high concurrency. For all tests, the two libraries provide the same -transactional semantics, unless explicitly noted. +transactional semantics unless explicitly noted. All benchmarks were run on an Intel Xeon 2.8 GHz with 1GB of RAM and a 10K RPM SCSI drive formatted using with ReiserFS~\cite{reiserfs}.\endnote{We found that the @@ -1039,15 +1042,17 @@ multiple machines and file systems. \subsection{Linear hash table} \label{sec:lht} + \begin{figure}[t] \includegraphics[% width=1\columnwidth]{figs/bulk-load.pdf} %\includegraphics[% % width=1\columnwidth]{bulk-load-raw.pdf} %\vspace{-30pt} -\caption{\sf\label{fig:BULK_LOAD} Performance of \yad and Berkeley DB hashtable implementations. The +\caption{\sf\label{fig:BULK_LOAD} Performance of \yad and Berkeley DB hash table implementations. The test is run as a single transaction, minimizing overheads due to synchronous log writes.} \end{figure} + \begin{figure}[t] %\hspace*{18pt} %\includegraphics[% @@ -1055,35 +1060,37 @@ test is run as a single transaction, minimizing overheads due to synchronous log \includegraphics[% width=1\columnwidth]{figs/tps-extended.pdf} %\vspace{-36pt} -\caption{\sf\label{fig:TPS} High concurrency performance of Berkeley DB and \yad. We were unable to get Berkeley DB to work correctly with more than 50 threads. (See text) +\caption{\sf\label{fig:TPS} High concurrency hash table performance of Berkeley DB and \yad. We were unable to get Berkeley DB to work correctly with more than 50 threads (see text). } \end{figure} Although the beginning of this paper describes the limitations of physical database models and relational storage systems in great detail, these systems are the basis of most common transactional -storage routines. Therefore, we implement a key-based access -method in this section. We argue that -obtaining reasonable performance in such a system under \yad is -straightforward. We then compare our simple, straightforward -implementation to our hand-tuned version and Berkeley DB's implementation. +storage routines. Therefore, we implement a key-based access method +in this section. We argue that obtaining reasonable performance in +such a system under \yad is straightforward. We then compare our +simple, straightforward implementation to our hand-tuned version and +Berkeley DB's implementation. -The simple hash table uses nested top actions to update its -internal structure atomically. It uses a {\em linear} hash function~\cite{lht}, allowing -it to incrementally grow its buffer list. It is based on a number of -modular subcomponents. Notably, its bucket list is a growable array -of fixed length entries (a linkset, in the terms of the physical -database model) and the user's choice of two different linked list -implementations. +The simple hash table uses nested top actions to update its internal +structure atomically. It uses a {\em linear} hash +function~\cite{lht}, allowing it to increase capacity + incrementally. It is based on a number of modular subcomponents. +Notably, its ``table'' is a growable array of fixed-length entries (a +linkset, in the terms of the physical database model) and the user's +choice of two different linked-list implementations. \eab{still +unclear} -The hand-tuned hashtable also uses a linear hash +The hand-tuned hash table is also built on \yad and also uses a linear hash function. However, it is monolithic and uses carefully ordered writes to reduce runtime overheads such as log bandwidth. Berkeley DB's -hashtable is a popular, commonly deployed implementation, and serves +hash table is a popular, commonly deployed implementation, and serves as a baseline for our experiments. -Both of our hashtables outperform Berkeley DB on a workload that -bulk loads the tables by repeatedly inserting (key, value) pairs. +Both of our hash tables outperform Berkeley DB on a workload that bulk +loads the tables by repeatedly inserting (key, value) pairs +(Figure~\ref{fig:BULK_LOAD}). %although we do not wish to imply this is always the case. %We do not claim that our partial implementation of \yad %generally outperforms, or is a robust alternative @@ -1122,13 +1129,12 @@ a single synchronous I/O.\endnote{The multi-threaded benchmarks \yad scaled quite well, delivering over 6000 transactions per second,\endnote{The concurrency test was run without lock managers, and the transactions obeyed the A, C, and D properties. Since each - transaction performed exactly one hashtable write and no reads, they also + transaction performed exactly one hash table write and no reads, they also obeyed I (isolation) in a trivial sense.} and provided roughly -double Berkeley DB's throughput (up to 50 threads). We do not report -the data here, but we implemented a simple load generator that makes -use of a fixed pool of threads with a fixed think time. We found that -the latencies of Berkeley DB and \yad were similar, showing that \yad is -not simply trading latency for throughput during the concurrency benchmark. +double Berkeley DB's throughput (up to 50 threads). Although not +shown here, we found that the latencies of Berkeley DB and \yad were +similar, which confirms that \yad is not simply trading latency for +throughput during the concurrency benchmark. \begin{figure*} @@ -1140,10 +1146,12 @@ not simply trading latency for throughput during the concurrency benchmark. The effect of \yad object serialization optimizations under low and high memory pressure.} \end{figure*} + \subsection{Object persistence} \label{sec:oasys} + Numerous schemes are used for object serialization. Support for two -different styles of object serialization have been implemented in +different styles of object serialization has been implemented in \yad. We could have just as easily implemented a persistence mechanism for a statically typed functional programming language, a dynamically typed scripting language, or a particular application, @@ -1160,17 +1168,21 @@ serialization library, \oasys. \oasys makes use of pluggable storage modules that implement persistent storage, and includes plugins for Berkeley DB and MySQL. -This section will describe how the \yad -\oasys plugin reduces amount of data written to log, while using half as much system -memory as the other two systems. +This section will describe how the \yad \oasys plugin reduces the +amount of data written to log, while using half as much system memory +as the other two systems. -We present three variants of the \yad plugin here. The first treats \yad like -Berkeley DB. The second, ``update/flush'' customizes the behavior of the buffer -manager. Instead of maintaining an up-to-date version of each object -in the buffer manager or page file, it allows the buffer manager's -view of live application objects to become stale. This is safe since -the system is always able to reconstruct the appropriate page entry -from the live copy of the object. +We present three variants of the \yad plugin here. The first treats +\yad like Berkeley DB. The second, the ``update/flush'' variant +customizes the behavior of the buffer manager, and the third, +``delta'', extends the second wiht support for logging only the deltas +between versions. + +The update/flush variant avoids maintaining an up-to-date +version of each object in the buffer manager or page file: it allows +the buffer manager's view of live application objects to become stale. +This is safe since the system is always able to reconstruct the +appropriate page entry from the live copy of the object. By allowing the buffer manager to contain stale data, we reduce the number of times the \yad \oasys plugin must update serialized objects in the buffer manager. @@ -1186,41 +1198,45 @@ updates the page file. The reason it would be difficult to do this with Berkeley DB is that we still need to generate log entries as the object is being updated. - This would cause Berkeley DB to write data back to the -page file, increasing the working set of the program, and increasing -disk activity. + This would cause Berkeley DB to write data back to the page file, +increasing the working set of the program, and increasing disk +activity. Furthermore, objects may be written to disk in an order that differs from the order in which they were updated, violating one of the write-ahead logging invariants. One way to -deal with this is to maintain multiple LSN's per page. This means we would need to register a -callback with the recovery routine to process the LSN's (a similar +deal with this is to maintain multiple LSNs per page. This means we would need to register a +callback with the recovery routine to process the LSNs (a similar callback will be needed in Section~\ref{sec:zeroCopy}), and -extend \yads page format to contain per-record LSN's. +extend \yads page format to contain per-record LSNs. Also, we must prevent \yads storage allocation routine from overwriting the per-object -LSN's of deleted objects that may still be addressed during abort or recovery. +LSNs of deleted objects that may still be addressed during abort or recovery.\eab{tombstones discussion here?} + +\eab{we should at least implement this callback if we have not already} Alternatively, we could arrange for the object pool to cooperate further with the buffer pool by atomically updating the buffer manager's copy of all objects that share a given page, removing the -need for multiple LSN's per page, and simplifying storage allocation. +need for multiple LSNs per page, and simplifying storage allocation. -However, the simplest solution, and the one we take here, is based on the observation that -updates (not allocations or deletions) of fixed length objects are blind writes. -This allows us to do away with per-object LSN's entirely. Allocation and deletion can then be handled -as updates to normal LSN containing pages. At recovery time, object -updates are executed based on the existence of the object on the page -and a conservative estimate of its LSN. (If the page doesn't contain -the object during REDO then it must have been written back to disk -after the object was deleted. Therefore, we do not need to apply the -REDO.) This means that the system can ``forget'' about objects that -were freed by committed transactions, simplifying space reuse -tremendously. (Because LSN-free pages and recovery are not yet implemented, -this benchmark mimics their behavior at runtime, but does not support recovery.) +However, the simplest solution, and the one we take here, is based on +the observation that updates (not allocations or deletions) of +fixed-length objects are blind writes. This allows us to do away with +per-object LSNs entirely. Allocation and deletion can then be +handled as updates to normal LSN containing pages. At recovery time, +object updates are executed based on the existence of the object on +the page and a conservative estimate of its LSN. (If the page doesn't +contain the object during REDO then it must have been written back to +disk after the object was deleted. Therefore, we do not need to apply +the REDO.) This means that the system can ``forget'' about objects +that were freed by committed transactions, simplifying space reuse +tremendously. (Because LSN-free pages and recovery are not yet +implemented, this benchmark mimics their behavior at runtime, but does +not support recovery.) -The third \yad plugin, ``delta'' incorporates the buffer -manager optimizations. However, it only writes the changed portions of -objects to the log. Because of \yads support for custom log entry +The third plugin variant, ``delta'', incorporates the update/flush +optimizations, but only writes the changed portions of +objects to the log. Because of \yads support for custom log-entry formats, this optimization is straightforward. %In addition to the buffer-pool optimizations, \yad provides several @@ -1264,8 +1280,8 @@ close, but does not quite provide the correct durability semantics.) The operations required for these two optimizations required 150 lines of C code, including whitespace, comments and boilerplate function registrations.\endnote{These figures do not include the - simple LSN free object logic required for recovery, as \yad does not - yet support LSN free operations.} Although the reasoning required + simple LSN-free object logic required for recovery, as \yad does not + yet support LSN-free operations.} Although the reasoning required to ensure the correctness of this code is complex, the simplicity of the implementation is encouraging. @@ -1289,6 +1305,9 @@ we see that update/flush indeed improves memory utilization. \subsection{Manipulation of logical log entries} + +\eab{this section unclear, including title} + \label{sec:logging} \begin{figure} \includegraphics[width=1\columnwidth]{figs/graph-traversal.pdf} @@ -1345,7 +1364,7 @@ is used by RVM's log-merging operations~\cite{lrvm}. Furthermore, application-specific procedures that are analogous to standard relational algebra methods (join, project and select) could be used to efficiently transform the data -while it is still layed out sequentially +while it is still laid out sequentially in non-transactional memory. %Note that read-only operations do not necessarily generate log @@ -1371,9 +1390,9 @@ position size so that each partition can fit in \yads buffer pool. We ran two experiments. Both stored a graph of fixed size objects in the growable array implementation that is used as our linear -hashtable's bucket list. +hash table's bucket list. The first experiment (Figure~\ref{fig:oo7}) -is loosely based on the OO7 database benchmark.~\cite{oo7}. We +is loosely based on the OO7 database benchmark~\cite{oo7}. We hard-code the out-degree of each node, and use a directed graph. OO7 constructs graphs by first connecting nodes together into a ring. It then randomly adds edges between the nodes until the desired @@ -1583,7 +1602,7 @@ databases~\cite{libtp}. At its core, it provides the physical database model %most relational database systems~\cite{libtp}. In particular, it provides fully transactional (ACID) operations over B-Trees, -hashtables, and other access methods. It provides flags that +hash tables, and other access methods. It provides flags that let its users tweak various aspects of the performance of these primitives, and selectively disable the features it provides. @@ -1642,14 +1661,16 @@ Although most file systems attempt to lay out data in logically sequential order, write-optimized file systems lay files out in the order they were written~\cite{lfs}. Schemes to improve locality between small objects exist as well. Relational databases allow users to specify the order -in which tuples will be layed out, and often leave portions of pages +in which tuples will be laid out, and often leave portions of pages unallocated to reduce fragmentation as new records are allocated. \rcs{The new allocator is written + working, so this should be reworded. We have one that is based on hoard; support for other possibilities would be nice.} Memory allocation routines also address this problem. For example, the Hoard memory allocator is a highly concurrent version of malloc that makes use of thread context to allocate memory in a way that favors -cache locality~\cite{hoard}. %Other work makes use of the caller's stack to infer +cache locality~\cite{hoard}. + +%Other work makes use of the caller's stack to infer %information about memory management.~\cite{xxx} \rcs{Eric, do you have % a reference for this?} @@ -1664,7 +1685,7 @@ plan to use ideas from LFS~\cite{lfs} and POSTGRES~\cite{postgres} to implement this. Starburst~\cite{starburst} provides a flexible approach to index -management, and database trigger support, as well as hints for small +management and database trigger support, as well as hints for small object layout. The Boxwood system provides a networked, fault-tolerant transactional @@ -1673,8 +1694,8 @@ complement to such a system, especially given \yads focus on intelligence and optimizations within a single node, and Boxwood's focus on multiple node systems. In particular, it would be interesting to explore extensions to the Boxwood approach that make -use of \yads customizable semantics (Section~\ref{sec:wal}), and fully logical logging -mechanism. (Section~\ref{sec:logging}) +use of \yads customizable semantics (Section~\ref{sec:wal}) and fully logical logging +mechanisms (Section~\ref{sec:logging}). @@ -1706,7 +1727,7 @@ algorithms related to write-ahead logging. For instance, we suspect that support for appropriate callbacks will allow us to hard-code a generic recovery algorithm into the system. Similarly, any code that manages book-keeping information, such as -LSN's may be general enough to be hard-coded. +LSNs may be general enough to be hard-coded. Of course, we also plan to provide \yads current functionality, including the algorithms mentioned above as modular, well-tested extensions. @@ -1733,13 +1754,15 @@ extended in the future to support a larger range of systems. \section{Acknowledgements} -The idea behind the \oasys buffer manager optimization is from Mike -Demmer. He and Bowei Du implemented \oasys. Gilad Arnold and Amir Kamil implemented +Thanks to shepherd Bill Weihl for helping us present these ideas well, +or at least better. The idea behind the \oasys buffer manager +optimization is from Mike Demmer. He and Bowei Du implemented \oasys. +Gilad Arnold and Amir Kamil implemented pobj. Jim Blomo, Jason Bayer, and Jimmy Kittiyachavalit worked on an early version of \yad. Thanks to C. Mohan for pointing out the need for tombstones with -per-object LSN's. Jim Gray provided feedback on an earlier version of +per-object LSNs. Jim Gray provided feedback on an earlier version of this paper, and suggested we use a resource manager to manage dependencies within \yads API. Joe Hellerstein and Mike Franklin provided us with invaluable feedback.