fixed up 4.1-4.3
This commit is contained in:
parent
033cf78870
commit
6b4cc22215
1 changed files with 90 additions and 60 deletions
|
@ -84,7 +84,7 @@ optimizations and enhanced usability for application developers.}
|
||||||
\section{Introduction}
|
\section{Introduction}
|
||||||
|
|
||||||
Transactions are at the core of databases and thus form the basis of many
|
Transactions are at the core of databases and thus form the basis of many
|
||||||
important systems. However, the mechanisms for transactions are
|
important systems. However, the mechanisms that provide transactions are
|
||||||
typically hidden within monolithic database implementations (DBMSs) that make
|
typically hidden within monolithic database implementations (DBMSs) that make
|
||||||
it hard to benefit from transactions without inheriting the rest of
|
it hard to benefit from transactions without inheriting the rest of
|
||||||
the database machinery and design decisions, including the use of a
|
the database machinery and design decisions, including the use of a
|
||||||
|
@ -102,8 +102,9 @@ model provided by a DBMS and that required by these applications. This is
|
||||||
not an accident: the purpose of the relational model is exactly to
|
not an accident: the purpose of the relational model is exactly to
|
||||||
move to a higher-level set-based data model that avoids the kind of
|
move to a higher-level set-based data model that avoids the kind of
|
||||||
``navigational'' interactions required by these lower-level systems.
|
``navigational'' interactions required by these lower-level systems.
|
||||||
Thus in some sense, we are arguing for the return of navigational
|
Thus in some sense, we are arguing for the development of modern
|
||||||
transaction systems to compliment not replace relational systems.
|
navigational transaction systems that can compliment relational systems
|
||||||
|
and that naturally support current system designs and development methodolgies.
|
||||||
|
|
||||||
The most obvious example of this mismatch is in the support for
|
The most obvious example of this mismatch is in the support for
|
||||||
persistent objects in Java, called {\em Enterprise Java Beans}
|
persistent objects in Java, called {\em Enterprise Java Beans}
|
||||||
|
@ -712,37 +713,52 @@ various primitives that \yad provides to application developers.
|
||||||
|
|
||||||
\subsection{Lock Manager}
|
\subsection{Lock Manager}
|
||||||
\label{lock-manager}
|
\label{lock-manager}
|
||||||
\eab{present the API?}
|
%\eab{present the API?}
|
||||||
|
|
||||||
\yad provides a default page-level lock manager that performs deadlock
|
\yad provides a default page-level lock manager that performs deadlock
|
||||||
detection, although we expect many applications to make use of
|
detection, although we expect many applications to make use of
|
||||||
deadlock-avoidance schemes, which are already prevalent in
|
deadlock-avoidance schemes, which are already prevalent in
|
||||||
multithreaded application development. The Lock Manager is flexible
|
multithreaded application development. The Lock Manager is flexible
|
||||||
enough to also provide index locks for hashtable implementations and
|
enough to also provide index locks for hashtable implementations and
|
||||||
more complex locking protocols.
|
more complex locking protocols such as hierarhical two-phase
|
||||||
|
locking.~\cite{hierarcicalLocking,hierarchicalLockingOnAriesExample}
|
||||||
|
The lock manager api is divided into callback functions that are made
|
||||||
|
during normal operation and recovery, and into generic lock mananger
|
||||||
|
implementations that may be used with \yad and its index implementations.
|
||||||
|
|
||||||
For example, it would be relatively easy to build a strict two-phase
|
%For example, it would be relatively easy to build a strict two-phase
|
||||||
locking hierarchical lock
|
%locking hierarchical lock
|
||||||
manager~\cite{hierarcicalLocking,hierarchicalLockingOnAriesExample} on
|
%manager
|
||||||
top of \yad. Such a lock manager would provide isolation guarantees
|
% on
|
||||||
for all applications that make use of it. However, applications that
|
%top of \yad. Such a lock manager would provide isolation guarantees
|
||||||
make use of such a lock manager must handle deadlocked transactions
|
%for all applications that make use of it.
|
||||||
|
|
||||||
|
However, applications that
|
||||||
|
make use of a lock manager must handle deadlocked transactions
|
||||||
that have been aborted by the lock manager. This is easy if all of
|
that have been aborted by the lock manager. This is easy if all of
|
||||||
the state is managed by \yad, but other state such as thread stacks
|
the state is managed by \yad, but other state such as thread stacks
|
||||||
must be handled by the application, much like exception handling.
|
must be handled by the application, much like exception handling.
|
||||||
|
\yad currently uses a custom wrapper around the pthread cancellation
|
||||||
|
mechanism to provide partial stack unwinding and pthread's thread
|
||||||
|
cancellation mechanism. Applications may use this error handling
|
||||||
|
technique, or write simple wrappers to handle errors with the
|
||||||
|
error handling scheme of their choice.
|
||||||
|
|
||||||
Conversely, many applications do not require such a general scheme.
|
Conversely, many applications do not require such a general scheme.
|
||||||
For instance, an IMAP server can employ a simple lock-per-folder
|
If deadlock avoidance (``normal'' thread synchronization) can be used,
|
||||||
approach and use lock-ordering techniques to avoid deadlock. This
|
the application does not have to abort partial transactions, repeat
|
||||||
avoids the complexity of dealing with transactions that abort due
|
work, or deal with the corner cases that aborted transactions create.
|
||||||
to deadlock, and also removes the runtime cost of restarting
|
%For instance, an IMAP server can employ a simple lock-per-folder
|
||||||
transactions.
|
%approach and use lock-ordering techniques to avoid deadlock. This
|
||||||
|
%avoids the complexity of dealing with transactions that abort due
|
||||||
|
%to deadlock, and also removes the runtime cost of restarting
|
||||||
|
%transactions.
|
||||||
|
|
||||||
\yad provides a lock manager API that allows all three variations
|
%\yad provides a lock manager API that allows all three variations
|
||||||
(among others). In particular, it provides upcalls on commit/abort so
|
%(among others). In particular, it provides upcalls on commit/abort so
|
||||||
that the lock manager can release locks at the right time. We will
|
%that the lock manager can release locks at the right time. We will
|
||||||
revisit this point in more detail when we describe some of the example
|
%revisit this point in more detail when we describe some of the example
|
||||||
operations.
|
%operations.
|
||||||
|
|
||||||
|
|
||||||
%% @todo where does this text go??
|
%% @todo where does this text go??
|
||||||
|
@ -865,34 +881,43 @@ operations.
|
||||||
\label{flex-logging}
|
\label{flex-logging}
|
||||||
\label{page-layouts}
|
\label{page-layouts}
|
||||||
|
|
||||||
The overview discussion avoided the use of some common terminology
|
\yad supports three types of logging, and allows applications to create
|
||||||
that should be presented here. {\em Physical logging }
|
{\em custom log entries} of each type.
|
||||||
|
|
||||||
|
%The overview discussion avoided the use of some common terminology
|
||||||
|
%that should be presented here.
|
||||||
|
{\em Physical logging }
|
||||||
is the practice of logging physical (byte-level) updates
|
is the practice of logging physical (byte-level) updates
|
||||||
and the physical (page-number) addresses to which they are applied.
|
and the physical (page-number) addresses to which they are applied.
|
||||||
|
|
||||||
\rcs{Do we really need to differentiate between types of diffs applied to pages? The concept of physical REDO/logical UNDO is probably more important...}
|
{\em Physiological logging } extends this idea, and is generally used
|
||||||
|
for \yad's REDO entries. The physical address (page number) is
|
||||||
|
stored, along with the arguments of an arbitrary function that
|
||||||
|
is associated with the log entry.
|
||||||
|
|
||||||
{\em Physiological logging } is what \yad recommends for its REDO
|
This is used to implement many primatives, including {\em slotted pages}, which use
|
||||||
records~\cite{physiological}. The physical address (page number) is
|
|
||||||
stored, but the byte offset and the actual delta are stored implicitly
|
|
||||||
in the parameters of the REDO or UNDO function. These parameters allow
|
|
||||||
the function to update the page in a way that preserves application
|
|
||||||
semantics. One common use for this is {\em slotted pages}, which use
|
|
||||||
an on-page level of indirection to allow records to be rearranged
|
an on-page level of indirection to allow records to be rearranged
|
||||||
within the page; instead of using the page offset, REDO operations use
|
within the page; instead of using the page offset, REDO operations use
|
||||||
the index to locate the data within the page. This allows data within a single
|
the index to locate the data within the page. This allows data within a single
|
||||||
page to be re-arranged at runtime to produce contiguous regions of
|
page to be re-arranged easily, producing contiguous regions of
|
||||||
free space. \yad generalizes this model; for example, the parameters
|
free space. Since the log entry is associated with an arbitrary function
|
||||||
passed to the function may utilize application-specific properties in
|
more sophisticated log entries can be implemented. In turn, this can improve
|
||||||
order to be significantly smaller than the physical change made to the
|
performance by conserving log space, or be used to build match recovery to application
|
||||||
page.
|
semantics.
|
||||||
|
%\yad generalizes this model, allowing the parameters of a
|
||||||
|
%custom log entry to invoke arbitrary application-specific code.
|
||||||
|
%In
|
||||||
|
%Section~\ref{OASYS} this is used to significantly improve performance by
|
||||||
|
%storing difference records in an application specfic format.
|
||||||
|
|
||||||
This forms the basis of \yad's flexible page layouts. We current
|
%In addition to supporting custom log entries, this mechanism
|
||||||
support four layouts: a raw page, which is just an array of
|
%is the basis of \yad's {\em flexible page layouts}.
|
||||||
bytes, a record-oriented page with fixed-size records,
|
\yad also uses this mechanism to support four {\em page layouts}:
|
||||||
a slotted-page that support variable-sized records, and a page of records with version numbers (Section~\ref{version-pages}).
|
{\em raw-page}, which is just an array of
|
||||||
Data structures can pick the layout that is most convenient or implement
|
bytes, {\em fixed-page}, a record-oriented page with fixed-length records,
|
||||||
new layouts.
|
{\em slotted-page}, which supports variable-sized records, and
|
||||||
|
{\em versioned-page}, a slotted-page with a seperate version number for
|
||||||
|
each record. (Section~\ref{version-pages}).
|
||||||
|
|
||||||
{\em Logical logging} uses a higher-level key to specify the
|
{\em Logical logging} uses a higher-level key to specify the
|
||||||
UNDO/REDO. Since these higher-level keys may affect multiple pages,
|
UNDO/REDO. Since these higher-level keys may affect multiple pages,
|
||||||
|
@ -911,25 +936,29 @@ span multiple pages, as shown in the next section.
|
||||||
%% logical log entries, the page file must be physically consistent,
|
%% logical log entries, the page file must be physically consistent,
|
||||||
%% ruling out use of logical logging for redo operations.
|
%% ruling out use of logical logging for redo operations.
|
||||||
|
|
||||||
\yad supports all three types of logging, and allows developers to
|
%\yad supports all three types of logging, and allows developers to
|
||||||
register new operations, which we cover below.
|
%register new operations, which we cover below.
|
||||||
|
|
||||||
|
|
||||||
\subsection{Nested Top Actions}
|
\subsection{Nested Top Actions}
|
||||||
\label{nested-top-actions}
|
\label{nested-top-actions}
|
||||||
|
|
||||||
The operations presented so far work fine for a single page, since
|
The operations presented so far work fine for a single page, since
|
||||||
each update is atomic. For updates that span multiple pages there are two basic options: full isolation or nested top actions.
|
each update is atomic. For updates that span multiple pages there
|
||||||
|
are two basic options: full isolation or nested top actions.
|
||||||
|
|
||||||
By full isolation, we mean that no other transactions see the
|
By full isolation, we mean that no other transactions see the
|
||||||
in-progress updates, which can be trivially achieved with a big lock
|
in-progress updates, which can be trivially achieved with a big lock
|
||||||
around the whole structure. Given isolation, \yad needs nothing else to
|
around the whole structure. Usually the application must enforce
|
||||||
|
such a locking policy or decide to use a lock manager and deal with
|
||||||
|
deadlock. Given isolation, \yad needs nothing else to
|
||||||
make multi-page updates transactional: although many pages might be
|
make multi-page updates transactional: although many pages might be
|
||||||
modified they will commit or abort as a group and be recovered
|
modified they will commit or abort as a group and be recovered
|
||||||
accordingly.
|
accordingly.
|
||||||
|
|
||||||
However, this level of isolation reduces concurrency within a data
|
However, this level of isolation disallows all concurrency between
|
||||||
structure. ARIES introduced the notion of nested top actions to
|
transactions that use the same data structure. ARIES introduced the
|
||||||
|
notion of nested top actions to
|
||||||
address this problem. For example, consider what would happen if one
|
address this problem. For example, consider what would happen if one
|
||||||
transaction, $A$, rearranged the layout of a data structure, a second
|
transaction, $A$, rearranged the layout of a data structure, a second
|
||||||
transaction, $B$, added a value to the rearranged structure, and then
|
transaction, $B$, added a value to the rearranged structure, and then
|
||||||
|
@ -937,7 +966,7 @@ the first transaction aborted. (Note that the structure is not
|
||||||
isolated.) While applying physical undo information to the altered
|
isolated.) While applying physical undo information to the altered
|
||||||
data structure, $A$ would UNDO its writes
|
data structure, $A$ would UNDO its writes
|
||||||
without considering the modifications made by
|
without considering the modifications made by
|
||||||
$B$, which is likely to cause corruption. At this point, $B$ would
|
$B$, which is likely to cause corruption. Therefore, $B$ would
|
||||||
have to be aborted as well ({\em cascading aborts}).
|
have to be aborted as well ({\em cascading aborts}).
|
||||||
|
|
||||||
With nested top actions, ARIES defines the structural changes as a
|
With nested top actions, ARIES defines the structural changes as a
|
||||||
|
@ -956,9 +985,8 @@ In particular, we have found a simple recipe for converting a
|
||||||
non-concurrent data structure into a concurrent one, which involves
|
non-concurrent data structure into a concurrent one, which involves
|
||||||
three steps:
|
three steps:
|
||||||
\begin{enumerate}
|
\begin{enumerate}
|
||||||
\item Wrap a mutex around each operation. If full transactional isolation
|
\item Wrap a mutex around each operation. If this is done with care,
|
||||||
with deadlock detection is required, this can be done with the lock
|
it may be possible to use finer grained mutexes.
|
||||||
manager. Alternatively, this can be done using mutexes for fine-grain isolation.
|
|
||||||
\item Define a logical UNDO for each operation (rather than just using
|
\item Define a logical UNDO for each operation (rather than just using
|
||||||
a lower-level physical UNDO). For example, this is easy for a
|
a lower-level physical UNDO). For example, this is easy for a
|
||||||
hashtable; e.g. the UNDO for an {\em insert} is {\em remove}.
|
hashtable; e.g. the UNDO for an {\em insert} is {\em remove}.
|
||||||
|
@ -969,10 +997,12 @@ three steps:
|
||||||
This recipe ensures that operations that might span multiple pages
|
This recipe ensures that operations that might span multiple pages
|
||||||
atomically apply and commit any structural changes and thus avoids
|
atomically apply and commit any structural changes and thus avoids
|
||||||
cascading aborts. If the transaction that encloses the operations
|
cascading aborts. If the transaction that encloses the operations
|
||||||
aborts, the logical UNDO will {\em compensate} for
|
aborts, the logical undo will {\em compensate} for
|
||||||
its effects, but leave its structural changes intact. Note that by releasing the mutex before we commit, we are
|
its effects, but leave its structural changes intact. Because this
|
||||||
violating strict two-phase locking in exchange for better performance
|
recipe does not ensure transactional consistency and is largely
|
||||||
and support for deadlock avoidance.
|
orthoganol to the use of a lock mananger, we call this class of
|
||||||
|
concurrenct control {\em latching} throughout this paper.
|
||||||
|
|
||||||
We have found the recipe to be easy to follow and very effective, and
|
We have found the recipe to be easy to follow and very effective, and
|
||||||
we use it everywhere our concurrent data structures may make structural
|
we use it everywhere our concurrent data structures may make structural
|
||||||
changes, such as growing a hash table or array.
|
changes, such as growing a hash table or array.
|
||||||
|
@ -1404,7 +1434,7 @@ need a map from bucket number to bucket contents (lists), and we need to handle
|
||||||
|
|
||||||
\subsection{The Bucket Map}
|
\subsection{The Bucket Map}
|
||||||
|
|
||||||
The simplest bucket map would simply use a fixed-size transactional
|
The simplest bucket map would simply use a fixed-length transactional
|
||||||
array. However, since we want the size of the table to grow, we should
|
array. However, since we want the size of the table to grow, we should
|
||||||
not assume that it fits in a contiguous range of pages. Instead, we build
|
not assume that it fits in a contiguous range of pages. Instead, we build
|
||||||
on top of \yad's transactional ArrayList data structure (inspired by
|
on top of \yad's transactional ArrayList data structure (inspired by
|
||||||
|
@ -1417,7 +1447,7 @@ per enlargement typically), this leads to an efficient map. We use a
|
||||||
single ``header'' page to store the list of intervals and their sizes.
|
single ``header'' page to store the list of intervals and their sizes.
|
||||||
|
|
||||||
For space efficiency, the array elements themselves are stored using
|
For space efficiency, the array elements themselves are stored using
|
||||||
the fixed-size record page layout. Thus, we use the header page to
|
the fixed-length record page layout. Thus, we use the header page to
|
||||||
find the right interval, and then index into it to get the $(page,
|
find the right interval, and then index into it to get the $(page,
|
||||||
slot)$ address. Once we have this address, the REDO/UNDO entries are
|
slot)$ address. Once we have this address, the REDO/UNDO entries are
|
||||||
trivial: they simply log the before and after image of the that
|
trivial: they simply log the before and after image of the that
|
||||||
|
@ -2081,10 +2111,10 @@ requests by reordering invocations of wrapper functions.
|
||||||
\subsection {Data Representation}
|
\subsection {Data Representation}
|
||||||
|
|
||||||
For simplicity, we represent graph nodes as
|
For simplicity, we represent graph nodes as
|
||||||
fixed length records. The Array List from our linear hash table
|
fixed-length records. The Array List from our linear hash table
|
||||||
implementation (Section~\ref{sub:Linear-Hash-Table}) provides access to an
|
implementation (Section~\ref{sub:Linear-Hash-Table}) provides access to an
|
||||||
array of such records with performance that is competitive with native
|
array of such records with performance that is competitive with native
|
||||||
recordid accesses, so we use an Array List to store the records. We
|
recordid accesses, so we use an ArrayList to store the records. We
|
||||||
could have opted for a slightly more efficient representation by
|
could have opted for a slightly more efficient representation by
|
||||||
implementing a fixed length array structure, but doing so seems to be
|
implementing a fixed length array structure, but doing so seems to be
|
||||||
overkill for our purposes. The nodes themselves are stored as an
|
overkill for our purposes. The nodes themselves are stored as an
|
||||||
|
|
Loading…
Reference in a new issue