fixed up 4.1-4.3

This commit is contained in:
Sears Russell 2005-03-26 01:38:52 +00:00
parent 033cf78870
commit 6b4cc22215

View file

@ -84,7 +84,7 @@ optimizations and enhanced usability for application developers.}
\section{Introduction} \section{Introduction}
Transactions are at the core of databases and thus form the basis of many Transactions are at the core of databases and thus form the basis of many
important systems. However, the mechanisms for transactions are important systems. However, the mechanisms that provide transactions are
typically hidden within monolithic database implementations (DBMSs) that make typically hidden within monolithic database implementations (DBMSs) that make
it hard to benefit from transactions without inheriting the rest of it hard to benefit from transactions without inheriting the rest of
the database machinery and design decisions, including the use of a the database machinery and design decisions, including the use of a
@ -102,8 +102,9 @@ model provided by a DBMS and that required by these applications. This is
not an accident: the purpose of the relational model is exactly to not an accident: the purpose of the relational model is exactly to
move to a higher-level set-based data model that avoids the kind of move to a higher-level set-based data model that avoids the kind of
``navigational'' interactions required by these lower-level systems. ``navigational'' interactions required by these lower-level systems.
Thus in some sense, we are arguing for the return of navigational Thus in some sense, we are arguing for the development of modern
transaction systems to compliment not replace relational systems. navigational transaction systems that can compliment relational systems
and that naturally support current system designs and development methodolgies.
The most obvious example of this mismatch is in the support for The most obvious example of this mismatch is in the support for
persistent objects in Java, called {\em Enterprise Java Beans} persistent objects in Java, called {\em Enterprise Java Beans}
@ -712,37 +713,52 @@ various primitives that \yad provides to application developers.
\subsection{Lock Manager} \subsection{Lock Manager}
\label{lock-manager} \label{lock-manager}
\eab{present the API?} %\eab{present the API?}
\yad provides a default page-level lock manager that performs deadlock \yad provides a default page-level lock manager that performs deadlock
detection, although we expect many applications to make use of detection, although we expect many applications to make use of
deadlock-avoidance schemes, which are already prevalent in deadlock-avoidance schemes, which are already prevalent in
multithreaded application development. The Lock Manager is flexible multithreaded application development. The Lock Manager is flexible
enough to also provide index locks for hashtable implementations and enough to also provide index locks for hashtable implementations and
more complex locking protocols. more complex locking protocols such as hierarhical two-phase
locking.~\cite{hierarcicalLocking,hierarchicalLockingOnAriesExample}
The lock manager api is divided into callback functions that are made
during normal operation and recovery, and into generic lock mananger
implementations that may be used with \yad and its index implementations.
For example, it would be relatively easy to build a strict two-phase %For example, it would be relatively easy to build a strict two-phase
locking hierarchical lock %locking hierarchical lock
manager~\cite{hierarcicalLocking,hierarchicalLockingOnAriesExample} on %manager
top of \yad. Such a lock manager would provide isolation guarantees % on
for all applications that make use of it. However, applications that %top of \yad. Such a lock manager would provide isolation guarantees
make use of such a lock manager must handle deadlocked transactions %for all applications that make use of it.
However, applications that
make use of a lock manager must handle deadlocked transactions
that have been aborted by the lock manager. This is easy if all of that have been aborted by the lock manager. This is easy if all of
the state is managed by \yad, but other state such as thread stacks the state is managed by \yad, but other state such as thread stacks
must be handled by the application, much like exception handling. must be handled by the application, much like exception handling.
\yad currently uses a custom wrapper around the pthread cancellation
mechanism to provide partial stack unwinding and pthread's thread
cancellation mechanism. Applications may use this error handling
technique, or write simple wrappers to handle errors with the
error handling scheme of their choice.
Conversely, many applications do not require such a general scheme. Conversely, many applications do not require such a general scheme.
For instance, an IMAP server can employ a simple lock-per-folder If deadlock avoidance (``normal'' thread synchronization) can be used,
approach and use lock-ordering techniques to avoid deadlock. This the application does not have to abort partial transactions, repeat
avoids the complexity of dealing with transactions that abort due work, or deal with the corner cases that aborted transactions create.
to deadlock, and also removes the runtime cost of restarting %For instance, an IMAP server can employ a simple lock-per-folder
transactions. %approach and use lock-ordering techniques to avoid deadlock. This
%avoids the complexity of dealing with transactions that abort due
%to deadlock, and also removes the runtime cost of restarting
%transactions.
\yad provides a lock manager API that allows all three variations %\yad provides a lock manager API that allows all three variations
(among others). In particular, it provides upcalls on commit/abort so %(among others). In particular, it provides upcalls on commit/abort so
that the lock manager can release locks at the right time. We will %that the lock manager can release locks at the right time. We will
revisit this point in more detail when we describe some of the example %revisit this point in more detail when we describe some of the example
operations. %operations.
%% @todo where does this text go?? %% @todo where does this text go??
@ -865,34 +881,43 @@ operations.
\label{flex-logging} \label{flex-logging}
\label{page-layouts} \label{page-layouts}
The overview discussion avoided the use of some common terminology \yad supports three types of logging, and allows applications to create
that should be presented here. {\em Physical logging } {\em custom log entries} of each type.
%The overview discussion avoided the use of some common terminology
%that should be presented here.
{\em Physical logging }
is the practice of logging physical (byte-level) updates is the practice of logging physical (byte-level) updates
and the physical (page-number) addresses to which they are applied. and the physical (page-number) addresses to which they are applied.
\rcs{Do we really need to differentiate between types of diffs applied to pages? The concept of physical REDO/logical UNDO is probably more important...} {\em Physiological logging } extends this idea, and is generally used
for \yad's REDO entries. The physical address (page number) is
stored, along with the arguments of an arbitrary function that
is associated with the log entry.
{\em Physiological logging } is what \yad recommends for its REDO This is used to implement many primatives, including {\em slotted pages}, which use
records~\cite{physiological}. The physical address (page number) is
stored, but the byte offset and the actual delta are stored implicitly
in the parameters of the REDO or UNDO function. These parameters allow
the function to update the page in a way that preserves application
semantics. One common use for this is {\em slotted pages}, which use
an on-page level of indirection to allow records to be rearranged an on-page level of indirection to allow records to be rearranged
within the page; instead of using the page offset, REDO operations use within the page; instead of using the page offset, REDO operations use
the index to locate the data within the page. This allows data within a single the index to locate the data within the page. This allows data within a single
page to be re-arranged at runtime to produce contiguous regions of page to be re-arranged easily, producing contiguous regions of
free space. \yad generalizes this model; for example, the parameters free space. Since the log entry is associated with an arbitrary function
passed to the function may utilize application-specific properties in more sophisticated log entries can be implemented. In turn, this can improve
order to be significantly smaller than the physical change made to the performance by conserving log space, or be used to build match recovery to application
page. semantics.
%\yad generalizes this model, allowing the parameters of a
%custom log entry to invoke arbitrary application-specific code.
%In
%Section~\ref{OASYS} this is used to significantly improve performance by
%storing difference records in an application specfic format.
This forms the basis of \yad's flexible page layouts. We current %In addition to supporting custom log entries, this mechanism
support four layouts: a raw page, which is just an array of %is the basis of \yad's {\em flexible page layouts}.
bytes, a record-oriented page with fixed-size records, \yad also uses this mechanism to support four {\em page layouts}:
a slotted-page that support variable-sized records, and a page of records with version numbers (Section~\ref{version-pages}). {\em raw-page}, which is just an array of
Data structures can pick the layout that is most convenient or implement bytes, {\em fixed-page}, a record-oriented page with fixed-length records,
new layouts. {\em slotted-page}, which supports variable-sized records, and
{\em versioned-page}, a slotted-page with a seperate version number for
each record. (Section~\ref{version-pages}).
{\em Logical logging} uses a higher-level key to specify the {\em Logical logging} uses a higher-level key to specify the
UNDO/REDO. Since these higher-level keys may affect multiple pages, UNDO/REDO. Since these higher-level keys may affect multiple pages,
@ -911,25 +936,29 @@ span multiple pages, as shown in the next section.
%% logical log entries, the page file must be physically consistent, %% logical log entries, the page file must be physically consistent,
%% ruling out use of logical logging for redo operations. %% ruling out use of logical logging for redo operations.
\yad supports all three types of logging, and allows developers to %\yad supports all three types of logging, and allows developers to
register new operations, which we cover below. %register new operations, which we cover below.
\subsection{Nested Top Actions} \subsection{Nested Top Actions}
\label{nested-top-actions} \label{nested-top-actions}
The operations presented so far work fine for a single page, since The operations presented so far work fine for a single page, since
each update is atomic. For updates that span multiple pages there are two basic options: full isolation or nested top actions. each update is atomic. For updates that span multiple pages there
are two basic options: full isolation or nested top actions.
By full isolation, we mean that no other transactions see the By full isolation, we mean that no other transactions see the
in-progress updates, which can be trivially achieved with a big lock in-progress updates, which can be trivially achieved with a big lock
around the whole structure. Given isolation, \yad needs nothing else to around the whole structure. Usually the application must enforce
such a locking policy or decide to use a lock manager and deal with
deadlock. Given isolation, \yad needs nothing else to
make multi-page updates transactional: although many pages might be make multi-page updates transactional: although many pages might be
modified they will commit or abort as a group and be recovered modified they will commit or abort as a group and be recovered
accordingly. accordingly.
However, this level of isolation reduces concurrency within a data However, this level of isolation disallows all concurrency between
structure. ARIES introduced the notion of nested top actions to transactions that use the same data structure. ARIES introduced the
notion of nested top actions to
address this problem. For example, consider what would happen if one address this problem. For example, consider what would happen if one
transaction, $A$, rearranged the layout of a data structure, a second transaction, $A$, rearranged the layout of a data structure, a second
transaction, $B$, added a value to the rearranged structure, and then transaction, $B$, added a value to the rearranged structure, and then
@ -937,7 +966,7 @@ the first transaction aborted. (Note that the structure is not
isolated.) While applying physical undo information to the altered isolated.) While applying physical undo information to the altered
data structure, $A$ would UNDO its writes data structure, $A$ would UNDO its writes
without considering the modifications made by without considering the modifications made by
$B$, which is likely to cause corruption. At this point, $B$ would $B$, which is likely to cause corruption. Therefore, $B$ would
have to be aborted as well ({\em cascading aborts}). have to be aborted as well ({\em cascading aborts}).
With nested top actions, ARIES defines the structural changes as a With nested top actions, ARIES defines the structural changes as a
@ -956,9 +985,8 @@ In particular, we have found a simple recipe for converting a
non-concurrent data structure into a concurrent one, which involves non-concurrent data structure into a concurrent one, which involves
three steps: three steps:
\begin{enumerate} \begin{enumerate}
\item Wrap a mutex around each operation. If full transactional isolation \item Wrap a mutex around each operation. If this is done with care,
with deadlock detection is required, this can be done with the lock it may be possible to use finer grained mutexes.
manager. Alternatively, this can be done using mutexes for fine-grain isolation.
\item Define a logical UNDO for each operation (rather than just using \item Define a logical UNDO for each operation (rather than just using
a lower-level physical UNDO). For example, this is easy for a a lower-level physical UNDO). For example, this is easy for a
hashtable; e.g. the UNDO for an {\em insert} is {\em remove}. hashtable; e.g. the UNDO for an {\em insert} is {\em remove}.
@ -969,10 +997,12 @@ three steps:
This recipe ensures that operations that might span multiple pages This recipe ensures that operations that might span multiple pages
atomically apply and commit any structural changes and thus avoids atomically apply and commit any structural changes and thus avoids
cascading aborts. If the transaction that encloses the operations cascading aborts. If the transaction that encloses the operations
aborts, the logical UNDO will {\em compensate} for aborts, the logical undo will {\em compensate} for
its effects, but leave its structural changes intact. Note that by releasing the mutex before we commit, we are its effects, but leave its structural changes intact. Because this
violating strict two-phase locking in exchange for better performance recipe does not ensure transactional consistency and is largely
and support for deadlock avoidance. orthoganol to the use of a lock mananger, we call this class of
concurrenct control {\em latching} throughout this paper.
We have found the recipe to be easy to follow and very effective, and We have found the recipe to be easy to follow and very effective, and
we use it everywhere our concurrent data structures may make structural we use it everywhere our concurrent data structures may make structural
changes, such as growing a hash table or array. changes, such as growing a hash table or array.
@ -1404,7 +1434,7 @@ need a map from bucket number to bucket contents (lists), and we need to handle
\subsection{The Bucket Map} \subsection{The Bucket Map}
The simplest bucket map would simply use a fixed-size transactional The simplest bucket map would simply use a fixed-length transactional
array. However, since we want the size of the table to grow, we should array. However, since we want the size of the table to grow, we should
not assume that it fits in a contiguous range of pages. Instead, we build not assume that it fits in a contiguous range of pages. Instead, we build
on top of \yad's transactional ArrayList data structure (inspired by on top of \yad's transactional ArrayList data structure (inspired by
@ -1417,7 +1447,7 @@ per enlargement typically), this leads to an efficient map. We use a
single ``header'' page to store the list of intervals and their sizes. single ``header'' page to store the list of intervals and their sizes.
For space efficiency, the array elements themselves are stored using For space efficiency, the array elements themselves are stored using
the fixed-size record page layout. Thus, we use the header page to the fixed-length record page layout. Thus, we use the header page to
find the right interval, and then index into it to get the $(page, find the right interval, and then index into it to get the $(page,
slot)$ address. Once we have this address, the REDO/UNDO entries are slot)$ address. Once we have this address, the REDO/UNDO entries are
trivial: they simply log the before and after image of the that trivial: they simply log the before and after image of the that
@ -2081,10 +2111,10 @@ requests by reordering invocations of wrapper functions.
\subsection {Data Representation} \subsection {Data Representation}
For simplicity, we represent graph nodes as For simplicity, we represent graph nodes as
fixed length records. The Array List from our linear hash table fixed-length records. The Array List from our linear hash table
implementation (Section~\ref{sub:Linear-Hash-Table}) provides access to an implementation (Section~\ref{sub:Linear-Hash-Table}) provides access to an
array of such records with performance that is competitive with native array of such records with performance that is competitive with native
recordid accesses, so we use an Array List to store the records. We recordid accesses, so we use an ArrayList to store the records. We
could have opted for a slightly more efficient representation by could have opted for a slightly more efficient representation by
implementing a fixed length array structure, but doing so seems to be implementing a fixed length array structure, but doing so seems to be
overkill for our purposes. The nodes themselves are stored as an overkill for our purposes. The nodes themselves are stored as an