update 3 adn 4
This commit is contained in:
parent
9c0c394518
commit
bdf70353cc
1 changed files with 362 additions and 351 deletions
|
@ -51,7 +51,7 @@ to hierarchical or semi-structured data types such as XML or
|
||||||
scientific data. This work proposes a novel set of abstractions for
|
scientific data. This work proposes a novel set of abstractions for
|
||||||
transactional storage systems and generalizes an existing
|
transactional storage systems and generalizes an existing
|
||||||
transactional storage algorithm to provide an implementation of these
|
transactional storage algorithm to provide an implementation of these
|
||||||
primatives. Due to the extensibility of our architecutre, the
|
primitives. Due to the extensibility of our architecutre, the
|
||||||
implementation is competitive with existing systems on conventional
|
implementation is competitive with existing systems on conventional
|
||||||
workloads and outperforms existing systems on specialized
|
workloads and outperforms existing systems on specialized
|
||||||
workloads. Finally, we discuss characteristics of this new
|
workloads. Finally, we discuss characteristics of this new
|
||||||
|
@ -175,20 +175,20 @@ to improve performance.
|
||||||
|
|
||||||
These features are enabled by the several mechanisms:
|
These features are enabled by the several mechanisms:
|
||||||
\begin{description}
|
\begin{description}
|
||||||
\item[Flexible page formats] provide low level control over
|
\item[Flexible page layout] provide low level control over
|
||||||
transactional data representations.
|
transactional data representations (Section~\ref{page-layouts}).
|
||||||
\item[Extensible log formats] provide high-level control over
|
\item[Extensible log formats] provide high-level control over
|
||||||
transaction data structures.
|
transaction data structures (Section~\ref{op-def}).
|
||||||
\item [High and low level control over the log] such as calls to ``log this
|
\item [High and low level control over the log] such as calls to ``log this
|
||||||
operation'' or ``write a compensation record''
|
operation'' or ``write a compensation record'' (Section~\ref{log-manager}).
|
||||||
\item [In memory logical logging] provides a data store independendent
|
\item [In memory logical logging] provides a data store independendent
|
||||||
record of application requests, allowing ``in flight'' log
|
record of application requests, allowing ``in flight'' log
|
||||||
reordering, manipulation and durability primatives to be
|
reordering, manipulation and durability primitives to be
|
||||||
developed
|
developed (Section~\ref{graph-traversal}).
|
||||||
\item[Custom durability operations] such as two phase commit's
|
|
||||||
prepare call, and savepoints.
|
|
||||||
\item[Extensible locking API] provides registration of custom lock managers
|
\item[Extensible locking API] provides registration of custom lock managers
|
||||||
and a generic lock manager implementation.
|
and a generic lock manager implementation (Section~\ref{lock-manager}).
|
||||||
|
\item[Custom durability operations] such as two phase commit's
|
||||||
|
prepare call, and savepoints (Section~\ref{OASYS}).
|
||||||
\item[\eab{2PC?}]
|
\item[\eab{2PC?}]
|
||||||
\end{description}
|
\end{description}
|
||||||
|
|
||||||
|
@ -207,7 +207,7 @@ application. \yad also includes a cluster hash table
|
||||||
built upon two-phase commit which will not be descibed in detail
|
built upon two-phase commit which will not be descibed in detail
|
||||||
in this paper. Similarly we did not have space to discuss \yad's
|
in this paper. Similarly we did not have space to discuss \yad's
|
||||||
blob implementation, which demonstrates how \yad can
|
blob implementation, which demonstrates how \yad can
|
||||||
add transactional primatives to data stored in the file system.
|
add transactional primitives to data stored in the file system.
|
||||||
|
|
||||||
%To validate these claims, we developed a number of applications such
|
%To validate these claims, we developed a number of applications such
|
||||||
%as an efficient persistant object layer, {\em @todo locality preserving
|
%as an efficient persistant object layer, {\em @todo locality preserving
|
||||||
|
@ -255,21 +255,6 @@ add transactional primatives to data stored in the file system.
|
||||||
% narrow interfaces, since transactional storage algorithms'
|
% narrow interfaces, since transactional storage algorithms'
|
||||||
% interdependencies and requirements are notoriously complicated.}
|
% interdependencies and requirements are notoriously complicated.}
|
||||||
%
|
%
|
||||||
%%Not implementing ARIES any more!
|
|
||||||
%
|
|
||||||
%
|
|
||||||
% \item {\bf With these trends in mind, we have implemented a modular
|
|
||||||
% version of ARIES that makes as few assumptions as possible about
|
|
||||||
% application data structures or workload. Where such assumptions are
|
|
||||||
% inevitable, we have produced narrow APIs that allow the application
|
|
||||||
% developer to plug in alternative implementations of the modules that
|
|
||||||
% comprise our ARIES implementation. Rather than hiding the underlying
|
|
||||||
% complexity of the library from developers, we have produced narrow,
|
|
||||||
% simple API's and a set of invariants that must be maintained in
|
|
||||||
% order to ensure transactional consistency, allowing application
|
|
||||||
% developers to produce high-performance extensions with only a little
|
|
||||||
% effort.}
|
|
||||||
%
|
|
||||||
%\end{enumerate}
|
%\end{enumerate}
|
||||||
|
|
||||||
|
|
||||||
|
@ -326,28 +311,24 @@ set of monolithic storage engines.\eab{need to discuss other flaws! clusters? wh
|
||||||
|
|
||||||
The Postgres storage system~\cite{postgres} provides conventional
|
The Postgres storage system~\cite{postgres} provides conventional
|
||||||
database functionality, but can be extended with new index and object
|
database functionality, but can be extended with new index and object
|
||||||
types. A brief outline of the interfaces necessary to implement data-type extensions was presented by Stonebraker et al.~\cite{newTypes}.
|
types. A brief outline of the interfaces necessary to implement
|
||||||
Although some of the proposed methods are similar to ones presented
|
data-type extensions was presented by Stonebraker et
|
||||||
here, \yad also implements a lower-level interface that can coexist
|
al.~\cite{newTypes}. Although some of the proposed methods are
|
||||||
with these methods. Without these low-level APIs, Postgres
|
similar to ones presented here, \yad also implements a lower-level
|
||||||
suffers from many of the limitations inherent to the database systems
|
interface that can coexist with these methods. Without these
|
||||||
mentioned above. This is because Postgres was designed to provide
|
low-level APIs, Postgres suffers from many of the limitations inherent
|
||||||
these extensions within the context of the relational model.
|
to the database systems mentioned above. This is because Postgres was
|
||||||
Therefore, these extensions focused upon improving query language
|
designed to provide these extensions within the context of the
|
||||||
and indexing support. Instead of focusing upon this, \yad is more
|
relational model. Therefore, these extensions focused upon improving
|
||||||
interested in supporting conventional (imperative) software development
|
query language and indexing support. Instead of focusing upon this,
|
||||||
efforts. Therefore, while we believe that many of the high level
|
\yad is more interested in lower-level systems. Therefore, although we
|
||||||
Postgres interfaces could be built using \yad, we have not yet tried
|
believe that many of the high-level Postgres interfaces could be built
|
||||||
to implement them.
|
on top of \yad, we have not yet tried to implement them.
|
||||||
|
|
||||||
\rcs{In the above paragrap, is imperative too strong a word?}
|
|
||||||
|
|
||||||
% seems to provide
|
% seems to provide
|
||||||
%equivalents to most of the calls proposed in~\cite{newTypes} except
|
%equivalents to most of the calls proposed in~\cite{newTypes} except
|
||||||
%for those that deal with write ordering, (\yad automatically orders
|
%for those that deal with write ordering, (\yad automatically orders
|
||||||
%writes correctly) and those that refer to relations or application
|
%writes correctly) and those that refer to relations or application
|
||||||
%data types, since \yad does not have a built-in concept of a relation.
|
%data types, since \yad does not have a built-in concept of a relation.
|
||||||
|
|
||||||
However, \yad does provide an iterator interface which we hope to
|
However, \yad does provide an iterator interface which we hope to
|
||||||
extend to provide support for relational algebra, and common
|
extend to provide support for relational algebra, and common
|
||||||
programming paradigms.
|
programming paradigms.
|
||||||
|
@ -451,16 +432,9 @@ However, in each case it is relatively easy to see how they would map
|
||||||
onto \yad.
|
onto \yad.
|
||||||
|
|
||||||
|
|
||||||
% \item {\bf Implementations of ARIES and other transactional storage
|
\eab{DB Toolkit from Wisconsin?}
|
||||||
% mechanisms include many of the useful primitives described below,
|
|
||||||
% but prior implementations either deny application developers access
|
|
||||||
% to these primitives {[}??{]}, or make many high-level assumptions
|
|
||||||
% about data representation and workload {[}DB Toolkit from
|
|
||||||
% Wisconsin??-need to make sure this statement is true!{]}}
|
|
||||||
%
|
|
||||||
%\end{enumerate}
|
|
||||||
|
|
||||||
%\item {\bf 3.Architecture }
|
|
||||||
|
|
||||||
\section{Write-ahead Logging Overview}
|
\section{Write-ahead Logging Overview}
|
||||||
|
|
||||||
|
@ -480,7 +454,7 @@ The write-ahead logging algorithm we use is based upon ARIES, but
|
||||||
modified for extensibility and flexibility. Because comprehensive
|
modified for extensibility and flexibility. Because comprehensive
|
||||||
discussions of write-ahead logging protocols and ARIES are available
|
discussions of write-ahead logging protocols and ARIES are available
|
||||||
elsewhere~\cite{haerder, aries}, we focus on those details that are
|
elsewhere~\cite{haerder, aries}, we focus on those details that are
|
||||||
most important for flexibility.
|
most important for flexibility, which we discuss in Section~\ref{flexibility}.
|
||||||
|
|
||||||
|
|
||||||
\subsection{Operations}
|
\subsection{Operations}
|
||||||
|
@ -523,6 +497,51 @@ application-level policy (Section~\ref{TransClos}).
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
\subsection{Isolation}
|
||||||
|
\label{Isolation}
|
||||||
|
|
||||||
|
We allow transactions to be interleaved, allowing concurrent access to
|
||||||
|
application data and exploiting opportunities for hardware
|
||||||
|
parallelism. Therefore, each action must assume that the
|
||||||
|
physical data upon which it relies may contain uncommitted
|
||||||
|
information and that this information may have been produced by a
|
||||||
|
transaction that will be aborted by a crash or by the application.
|
||||||
|
%(The latter is actually harder, since there is no ``fate sharing''.)
|
||||||
|
|
||||||
|
% Furthermore, aborting
|
||||||
|
%and committing transactions may be interleaved, and \yad does not
|
||||||
|
%allow cascading aborts,%
|
||||||
|
%\footnote{That is, by aborting, one transaction may not cause other transactions
|
||||||
|
%to abort. To understand why operation implementors must worry about
|
||||||
|
%this, imagine that transaction A split a node in a tree, transaction
|
||||||
|
%B added some data to the node that A just created, and then A aborted.
|
||||||
|
%When A was undone, what would become of the data that B inserted?%
|
||||||
|
%} so
|
||||||
|
|
||||||
|
Therefore, in order to implement an operation we must also implement
|
||||||
|
synchronization mechanisms that isolate the effects of transactions
|
||||||
|
from each other. We use the term {\em latching} to refer to
|
||||||
|
synchronization mechanisms that protect the physical consistency of
|
||||||
|
\yad's internal data structures and the data store. We say {\em
|
||||||
|
locking} when we refer to mechanisms that provide some level of
|
||||||
|
isolation among transactions.
|
||||||
|
|
||||||
|
\yad operations that allow concurrent requests must provide a latching
|
||||||
|
(but not locking) implementation that is guaranteed not to deadlock.
|
||||||
|
These implementations need not ensure consistency of application data.
|
||||||
|
Instead, they must maintain the consistency of any underlying data
|
||||||
|
structures. Generally, latches do not persist across calls performed
|
||||||
|
by high-level code, as that could lead to deadlock.
|
||||||
|
|
||||||
|
For locking, due to the variety of locking protocols available, and
|
||||||
|
their interaction with application
|
||||||
|
workloads~\cite{multipleGenericLocking}, we leave it to the
|
||||||
|
application to decide what degree of isolation is
|
||||||
|
appropriate. Section~\ref{lock-manager} presents the Lock Manager API.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{The Log Manager}
|
\subsection{The Log Manager}
|
||||||
\label{log-manager}
|
\label{log-manager}
|
||||||
|
|
||||||
|
@ -571,227 +590,7 @@ Because pages can be recovered independently from each other, there is
|
||||||
no need to stop transactions to make a snapshot for archiving: any
|
no need to stop transactions to make a snapshot for archiving: any
|
||||||
fuzzy snapshot is fine.
|
fuzzy snapshot is fine.
|
||||||
|
|
||||||
\subsection{Flexible Logging}
|
|
||||||
\label{flex-logging}
|
|
||||||
|
|
||||||
The above discussion avoided the use of some common terminology
|
|
||||||
that should be presented here. {\em Physical logging }
|
|
||||||
is the practice of logging physical (byte-level) updates
|
|
||||||
and the physical (page-number) addresses to which they are applied.
|
|
||||||
|
|
||||||
{\em Physiological logging } is what \yad recommends for its redo
|
|
||||||
records~\cite{physiological}. The physical address (page number) is
|
|
||||||
stored, but the byte offset and the actual delta are stored implicitly
|
|
||||||
in the parameters of the redo or undo function. These parameters allow
|
|
||||||
the function to update the page in a way that preserves application
|
|
||||||
semantics. One common use for this is {\em slotted pages}, which use
|
|
||||||
an on-page level of indirection to allow records to be rearranged
|
|
||||||
within the page; instead of using the page offset, redo operations use
|
|
||||||
the index to locate the data within the page. This allows data within a single
|
|
||||||
page to be re-arranged at runtime to produce contiguous regions of
|
|
||||||
free space. \yad generalizes this model; for example, the parameters
|
|
||||||
passed to the function may utilize application-specific properties in
|
|
||||||
order to be significantly smaller than the physical change made to the
|
|
||||||
page.
|
|
||||||
|
|
||||||
{\em Logical logging} uses a higher-level key to specify the
|
|
||||||
UNDO/REDO. Since these higher-level keys may affect multiple pages,
|
|
||||||
they are prohibited for REDO functions, since our REDO is specific to
|
|
||||||
a single page. However, logical logging does make sense for UNDO,
|
|
||||||
since we can assume that the pages are physically consistent when we
|
|
||||||
apply an UNDO. We thus use logical logging to undo operations that
|
|
||||||
span multiple pages, as shown below.
|
|
||||||
|
|
||||||
%% can only be used for undo entries in \yad, and
|
|
||||||
%% stores a logical address (the key of a hash table, for instance)
|
|
||||||
%% instead of a physical address. As we will see later, these operations
|
|
||||||
%% may affect multiple pages. This allows the location of data in the
|
|
||||||
%% page file to change, even if outstanding transactions may have to roll
|
|
||||||
%% back changes made to that data. Clearly, for \yad to be able to apply
|
|
||||||
%% logical log entries, the page file must be physically consistent,
|
|
||||||
%% ruling out use of logical logging for redo operations.
|
|
||||||
|
|
||||||
\yad supports all three types of logging, and allows developers to
|
|
||||||
register new operations, which is the key to its extensibility. After
|
|
||||||
discussing \yad's architecture, we will revisit this topic with a number of
|
|
||||||
concrete examples.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{Isolation}
|
|
||||||
\label{Isolation}
|
|
||||||
|
|
||||||
We allow transactions to be interleaved, allowing concurrent access to
|
|
||||||
application data and exploiting opportunities for hardware
|
|
||||||
parallelism. Therefore, each action must assume that the
|
|
||||||
physical data upon which it relies may contain uncommitted
|
|
||||||
information and that this information may have been produced by a
|
|
||||||
transaction that will be aborted by a crash or by the application.
|
|
||||||
%(The latter is actually harder, since there is no ``fate sharing''.)
|
|
||||||
|
|
||||||
% Furthermore, aborting
|
|
||||||
%and committing transactions may be interleaved, and \yad does not
|
|
||||||
%allow cascading aborts,%
|
|
||||||
%\footnote{That is, by aborting, one transaction may not cause other transactions
|
|
||||||
%to abort. To understand why operation implementors must worry about
|
|
||||||
%this, imagine that transaction A split a node in a tree, transaction
|
|
||||||
%B added some data to the node that A just created, and then A aborted.
|
|
||||||
%When A was undone, what would become of the data that B inserted?%
|
|
||||||
%} so
|
|
||||||
|
|
||||||
Therefore, in order to implement an operation we must also implement
|
|
||||||
synchronization mechanisms that isolate the effects of transactions
|
|
||||||
from each other. We use the term {\em latching} to refer to
|
|
||||||
synchronization mechanisms that protect the physical consistency of
|
|
||||||
\yad's internal data structures and the data store. We say {\em
|
|
||||||
locking} when we refer to mechanisms that provide some level of
|
|
||||||
isolation among transactions.
|
|
||||||
|
|
||||||
\yad operations that allow concurrent requests must provide a latching
|
|
||||||
(but not locking) implementation that is guaranteed not to deadlock.
|
|
||||||
These implementations need not ensure consistency of application data.
|
|
||||||
Instead, they must maintain the consistency of any underlying data
|
|
||||||
structures. Generally, latches do not persist across calls performed
|
|
||||||
by high-level code, as that could lead to deadlock.
|
|
||||||
|
|
||||||
For locking, due to the variety of locking protocols available, and
|
|
||||||
their interaction with application
|
|
||||||
workloads~\cite{multipleGenericLocking}, we leave it to the
|
|
||||||
application to decide what degree of isolation is appropriate. \yad
|
|
||||||
provides a default page-level lock manager that performs deadlock
|
|
||||||
detection, although we expect many applications to make use of
|
|
||||||
deadlock-avoidance schemes, which are already prevalent in
|
|
||||||
multithreaded application development. The Lock Manager is flexible
|
|
||||||
enough to also provide index locks for hashtable implementations, and more complex locking protocols.
|
|
||||||
|
|
||||||
For example, it would be relatively easy to build a strict two-phase
|
|
||||||
locking hierarchical lock
|
|
||||||
manager~\cite{hierarcicalLocking,hierarchicalLockingOnAriesExample} on
|
|
||||||
top of \yad. Such a lock manager would provide isolation guarantees
|
|
||||||
for all applications that make use of it. However, applications that
|
|
||||||
make use of such a lock manager must handle deadlocked transactions
|
|
||||||
that have been aborted by the lock manager. This is easy if all of
|
|
||||||
the state is managed by \yad, but other state such as thread stacks
|
|
||||||
must be handled by the application, much like exception handling.
|
|
||||||
|
|
||||||
Conversely, many applications do not require such a general scheme.
|
|
||||||
For instance, an IMAP server can employ a simple lock-per-folder
|
|
||||||
approach and use lock-ordering techniques to avoid deadlock. This
|
|
||||||
avoids the complexity of dealing with transactions that abort due
|
|
||||||
to deadlock, and also removes the runtime cost of restarting
|
|
||||||
transactions.
|
|
||||||
|
|
||||||
\yad provides a lock manager API that allows all three variations
|
|
||||||
(among others). In particular, it provides upcalls on commit/abort so
|
|
||||||
that the lock manager can release locks at the right time. We will
|
|
||||||
revisit this point in more detail when we describe some of the example
|
|
||||||
operations.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{Nested Top Actions}
|
|
||||||
\label{nested-top-actions}
|
|
||||||
|
|
||||||
%explain that with a ``big lock'' it is easy to write transactional data structure. (trivial example?)
|
|
||||||
|
|
||||||
There are three levels of concurency that a transactional data
|
|
||||||
structure can support. If we do not implement any sort of consistency
|
|
||||||
code, we can use physical undo and redo to update the structure. This
|
|
||||||
works well if the application only runs one transaction at a time and
|
|
||||||
is single threaded. To understand why transactions that such a data
|
|
||||||
structure may not overlap, consider what would happen if one
|
|
||||||
transaction, $A$, rearranged the layout of a data structure, a second
|
|
||||||
transaction, $B$, added a value to the rearranged structure, and then
|
|
||||||
the first transaction called abort(). While applying physical undo
|
|
||||||
information to the altered data structure, the $A$ would undo the
|
|
||||||
writes that it performed without considering the data values and
|
|
||||||
structural changes introduced $B$. For concreteness, imagine that $A$
|
|
||||||
split a B-Tree bucket, and that $B$ added a value to the newly
|
|
||||||
allocated bucket. $A$'s physical undo would deallocate the new
|
|
||||||
bucket, and remove any references to it within the B-Tree, losing
|
|
||||||
$B$'s data.
|
|
||||||
|
|
||||||
The reason this is not a problem in the single transaction case is
|
|
||||||
that $A$'s changes atomically exposed to the other transactions in the
|
|
||||||
system. ($B$ can only run before $A$ begins, or after $A$ commits, so
|
|
||||||
it can never see changes that $A$ made, but did not commit.)
|
|
||||||
|
|
||||||
\rcs{I'm not going to mention cascading aborts, unless you think it makes this section more clear.}
|
|
||||||
|
|
||||||
\rcs{@todo this list could be part of the broken section called ``Concurrency and Aborted Transactions''}
|
|
||||||
|
|
||||||
\begin{itemize}
|
|
||||||
\item An operation that spans pages can be made atomic by simply
|
|
||||||
wrapping it in a nested top action and obtaining appropriate latches
|
|
||||||
at runtime. This approach reduces development of atomic page spanning
|
|
||||||
operations to something very similar to conventional multithreaded
|
|
||||||
development using mutexes for synchroniztion. Unfortunately, this
|
|
||||||
mode of operation writes redundant undo entry to the log, and has
|
|
||||||
performance implications that will be discussed later. However, for
|
|
||||||
most circumstances, the ease of development with nested top actions
|
|
||||||
outweighs the difficulty verifying the correctness of implementations
|
|
||||||
that use the next method.
|
|
||||||
|
|
||||||
\item It nested top actions are not used, an undo operation must
|
|
||||||
correctly update a data structure if any prefix of its corresponding
|
|
||||||
redo operations are applied to the structure, and if any number of
|
|
||||||
intervening operations are applied to the structure. In the best
|
|
||||||
case, this simply means that the operation should fail gracefully if
|
|
||||||
the change it should undo is not already reflected in the page file.
|
|
||||||
However, if the page file may temporarily lose consistency, then the
|
|
||||||
undo operation must be aware of this, and be able to handle all cases
|
|
||||||
that could arise at recovery time. Figure~\ref{linkedList} provides
|
|
||||||
an example of the sort of details that can arise in this case.
|
|
||||||
\end{itemize}
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
but we want more concurrency, which means 2 problems: 1) finer grain locking and 2) weaker isolation since interleaved transactions seeing the same structure
|
|
||||||
|
|
||||||
cascading aborts problem
|
|
||||||
|
|
||||||
solution: don't undo structural changes, just commit them even if the causeing xact fails. then logical undo to fix the aborted xact.
|
|
||||||
|
|
||||||
% @todo this section is confusing. Re-write it in light of page spanning operations, and the fact that we assumed opeartions don't span pages above. A nested top action (or recoverable, carefully ordered operation) is simply a way of causing a page spanning operation to be applied atomically. (And must be used in conjunction with latches...) Note that the combination of latching and NTAs makes the implementation of a page spanning operation no harder than normal multithreaded software development.
|
|
||||||
|
|
||||||
%% \textcolor{red}{OLD TEXT:} Section~\ref{sub:OperationProperties} states that \yad does not allow
|
|
||||||
%% cascading aborts, implying that operation implementors must protect
|
|
||||||
%% transactions from any structural changes made to data structures by
|
|
||||||
%% uncommitted transactions, but \yad does not provide any mechanisms
|
|
||||||
%% designed for long-term locking. However, one of \yad's goals is to
|
|
||||||
%% make it easy to implement custom data structures for use within safe,
|
|
||||||
%% multi-threaded transactions. Clearly, an additional mechanism is
|
|
||||||
%% needed.
|
|
||||||
|
|
||||||
%% The solution is to allow portions of an operation to ``commit'' before
|
|
||||||
%% the operation returns.\footnote{We considered the use of nested top actions, which \yad could easily
|
|
||||||
%% support. However, we currently use the slightly simpler (and lighter-weight)
|
|
||||||
%% mechanism described here. If the need arises, we will add support
|
|
||||||
%% for nested top actions.}
|
|
||||||
%% An operation's wrapper is just a normal function, and therefore may
|
|
||||||
%% generate multiple log entries. First, it writes an undo-only entry
|
|
||||||
%% to the log. This entry will cause the \emph{logical} inverse of the
|
|
||||||
%% current operation to be performed at recovery or abort, must be idempotent,
|
|
||||||
%% and must fail gracefully if applied to a version of the database that
|
|
||||||
%% does not contain the results of the current operation. Also, it must
|
|
||||||
%% behave correctly even if an arbitrary number of intervening operations
|
|
||||||
%% are performed on the data structure.
|
|
||||||
|
|
||||||
%% Next, the operation writes one or more redo-only log entries that may
|
|
||||||
%% perform structural modifications to the data structure. These redo
|
|
||||||
%% entries have the constraint that any prefix of them must leave the
|
|
||||||
%% database in a consistent state, since only a prefix might execute
|
|
||||||
%% before a crash. This is not as hard as it sounds, and in fact the
|
|
||||||
%% $B^{LINK}$ tree~\cite{blink} is an example of a B-Tree implementation
|
|
||||||
%% that behaves in this way, while the linear hash table implementation
|
|
||||||
%% discussed in Section~\ref{sub:Linear-Hash-Table} is a scalable hash
|
|
||||||
%% table that meets these constraints.
|
|
||||||
|
|
||||||
%% %[EAB: I still think there must be a way to log all of the redoes
|
|
||||||
%% %before any of the actions take place, thus ensuring that you can redo
|
|
||||||
%% %the whole thing if needed. Alternatively, we could pin a page until
|
|
||||||
%% %the set completes, in which case we know that that all of the records
|
|
||||||
%% %are in the log before any page is stolen.]
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{Recovery}
|
\subsection{Recovery}
|
||||||
|
@ -844,7 +643,8 @@ during normal operation.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\section{Extendible transaction architecture}
|
\section{Flexible, Extensible Transactions}
|
||||||
|
\label{flexibility}
|
||||||
|
|
||||||
As long as operation implementations obey the atomicity constraints
|
As long as operation implementations obey the atomicity constraints
|
||||||
outlined above, and the algorithms they use correctly manipulate
|
outlined above, and the algorithms they use correctly manipulate
|
||||||
|
@ -855,31 +655,66 @@ application data that is stored in the system. This suggests a
|
||||||
natural partitioning of transactional storage mechanisms into two
|
natural partitioning of transactional storage mechanisms into two
|
||||||
parts.
|
parts.
|
||||||
|
|
||||||
The first piece implements the write-ahead logging component,
|
The lower layer implements the write-ahead logging component,
|
||||||
including a buffer pool, logger, and (optionally) a lock manager.
|
including a buffer pool, logger, and (optionally) a lock manager.
|
||||||
The complexity of the write ahead logging component lies in
|
The complexity of the write-ahead logging component lies in
|
||||||
determining exactly when the undo and redo operations should be
|
determining exactly when the undo and redo operations should be
|
||||||
applied, when pages may be flushed to disk, log truncation, logging
|
applied, when pages may be flushed to disk, log truncation, logging
|
||||||
optimizations, and a large number of other data-independent extensions
|
optimizations, and a large number of other data-independent extensions
|
||||||
and optimizations.
|
and optimizations. This layer is the core of \yad.
|
||||||
|
|
||||||
The second component provides the actual data structure
|
The upper layer, which can be authored by the application developer,
|
||||||
implementations, policies regarding page layout (other than the
|
provides the actual data structure implementations, policies regarding
|
||||||
location of the LSN field), and the implementation of any application-specific operations.
|
page layout (other than the location of the LSN field), and the
|
||||||
As long as each layer provides well defined interfaces, the application,
|
implementation of any application-specific operations. As long as
|
||||||
operation implementation, and write ahead logging component can be
|
each layer provides well defined interfaces, the application,
|
||||||
|
operation implementation, and write-ahead logging component can be
|
||||||
independently extended and improved.
|
independently extended and improved.
|
||||||
|
|
||||||
We have implemented a number of simple, high performance
|
We have implemented a number of simple, high performance
|
||||||
and general purpose data structures. These are used by our sample
|
and general-purpose data structures. These are used by our sample
|
||||||
applications, and as building blocks for new data structures. Example
|
applications, and as building blocks for new data structures. Example
|
||||||
data structures include two distinct linked list implementations, and
|
data structures include two distinct linked-list implementations, and
|
||||||
an extendible array. Surprisingly, even these simple operations have
|
an growable array. Surprisingly, even these simple operations have
|
||||||
important performance characteristics that are not available from
|
important performance characteristics that are not available from
|
||||||
existing systems.
|
existing systems.
|
||||||
|
|
||||||
The remainder of this section is devoted to a description of the
|
The remainder of this section is devoted to a description of the
|
||||||
various primatives that \yad provides to application developers.
|
various primitives that \yad provides to application developers.
|
||||||
|
|
||||||
|
\subsection{Lock Manager}
|
||||||
|
\label{lock-manager}
|
||||||
|
\eab{present the API?}
|
||||||
|
|
||||||
|
\yad
|
||||||
|
provides a default page-level lock manager that performs deadlock
|
||||||
|
detection, although we expect many applications to make use of
|
||||||
|
deadlock-avoidance schemes, which are already prevalent in
|
||||||
|
multithreaded application development. The Lock Manager is flexible
|
||||||
|
enough to also provide index locks for hashtable implementations, and more complex locking protocols.
|
||||||
|
|
||||||
|
For example, it would be relatively easy to build a strict two-phase
|
||||||
|
locking hierarchical lock
|
||||||
|
manager~\cite{hierarcicalLocking,hierarchicalLockingOnAriesExample} on
|
||||||
|
top of \yad. Such a lock manager would provide isolation guarantees
|
||||||
|
for all applications that make use of it. However, applications that
|
||||||
|
make use of such a lock manager must handle deadlocked transactions
|
||||||
|
that have been aborted by the lock manager. This is easy if all of
|
||||||
|
the state is managed by \yad, but other state such as thread stacks
|
||||||
|
must be handled by the application, much like exception handling.
|
||||||
|
|
||||||
|
Conversely, many applications do not require such a general scheme.
|
||||||
|
For instance, an IMAP server can employ a simple lock-per-folder
|
||||||
|
approach and use lock-ordering techniques to avoid deadlock. This
|
||||||
|
avoids the complexity of dealing with transactions that abort due
|
||||||
|
to deadlock, and also removes the runtime cost of restarting
|
||||||
|
transactions.
|
||||||
|
|
||||||
|
\yad provides a lock manager API that allows all three variations
|
||||||
|
(among others). In particular, it provides upcalls on commit/abort so
|
||||||
|
that the lock manager can release locks at the right time. We will
|
||||||
|
revisit this point in more detail when we describe some of the example
|
||||||
|
operations.
|
||||||
|
|
||||||
|
|
||||||
%% @todo where does this text go??
|
%% @todo where does this text go??
|
||||||
|
@ -997,59 +832,190 @@ various primatives that \yad provides to application developers.
|
||||||
%This allows the the application, the operation, and \yad itself to be
|
%This allows the the application, the operation, and \yad itself to be
|
||||||
%independently improved.
|
%independently improved.
|
||||||
|
|
||||||
\subsection{Operation Implementation}
|
|
||||||
|
\subsection{Flexible Logging and Page Layouts}
|
||||||
|
\label{flex-logging}
|
||||||
|
\label{page-layouts}
|
||||||
|
|
||||||
|
The overview discussion avoided the use of some common terminology
|
||||||
|
that should be presented here. {\em Physical logging }
|
||||||
|
is the practice of logging physical (byte-level) updates
|
||||||
|
and the physical (page-number) addresses to which they are applied.
|
||||||
|
|
||||||
|
{\em Physiological logging } is what \yad recommends for its redo
|
||||||
|
records~\cite{physiological}. The physical address (page number) is
|
||||||
|
stored, but the byte offset and the actual delta are stored implicitly
|
||||||
|
in the parameters of the redo or undo function. These parameters allow
|
||||||
|
the function to update the page in a way that preserves application
|
||||||
|
semantics. One common use for this is {\em slotted pages}, which use
|
||||||
|
an on-page level of indirection to allow records to be rearranged
|
||||||
|
within the page; instead of using the page offset, redo operations use
|
||||||
|
the index to locate the data within the page. This allows data within a single
|
||||||
|
page to be re-arranged at runtime to produce contiguous regions of
|
||||||
|
free space. \yad generalizes this model; for example, the parameters
|
||||||
|
passed to the function may utilize application-specific properties in
|
||||||
|
order to be significantly smaller than the physical change made to the
|
||||||
|
page.
|
||||||
|
|
||||||
|
This forms the basis of \yad's flexible page layouts. We current
|
||||||
|
support three layouts: a raw page (RawPage), which is just an array of
|
||||||
|
bytes, a record-oriented page with fixed-size records (FixedPage), and
|
||||||
|
a slotted-page that support variable-sized records (SlottedPage).
|
||||||
|
Data structures can pick the layout that is most convenient.
|
||||||
|
|
||||||
|
{\em Logical logging} uses a higher-level key to specify the
|
||||||
|
UNDO/REDO. Since these higher-level keys may affect multiple pages,
|
||||||
|
they are prohibited for REDO functions, since our REDO is specific to
|
||||||
|
a single page. However, logical logging does make sense for UNDO,
|
||||||
|
since we can assume that the pages are physically consistent when we
|
||||||
|
apply an UNDO. We thus use logical logging to undo operations that
|
||||||
|
span multiple pages, as shown in the next section.
|
||||||
|
|
||||||
|
%% can only be used for undo entries in \yad, and
|
||||||
|
%% stores a logical address (the key of a hash table, for instance)
|
||||||
|
%% instead of a physical address. As we will see later, these operations
|
||||||
|
%% may affect multiple pages. This allows the location of data in the
|
||||||
|
%% page file to change, even if outstanding transactions may have to roll
|
||||||
|
%% back changes made to that data. Clearly, for \yad to be able to apply
|
||||||
|
%% logical log entries, the page file must be physically consistent,
|
||||||
|
%% ruling out use of logical logging for redo operations.
|
||||||
|
|
||||||
|
\yad supports all three types of logging, and allows developers to
|
||||||
|
register new operations, which we cover below.
|
||||||
|
|
||||||
|
|
||||||
|
\subsection{Nested Top Actions}
|
||||||
|
\label{nested-top-actions}
|
||||||
|
|
||||||
|
The operations presented so far work fine for a single page, since
|
||||||
|
each update is atomic. For updates that span multiple pages there are two basic options: full isolation or nested top actions.
|
||||||
|
|
||||||
|
By full isolation, we mean that no other transactions see the
|
||||||
|
in-progress updates, which can be trivially acheived with a big lock
|
||||||
|
around the whole transaction. Given isolation, \yad needs nothing else to
|
||||||
|
make multi-page updates transactional: although many pages might be
|
||||||
|
modified they will commit or abort as a group and recovered
|
||||||
|
accordingly.
|
||||||
|
|
||||||
|
However, this level of isolation reduces concurrency within a data
|
||||||
|
structure. ARIES introduced the notion of nested top actions to
|
||||||
|
address this problem. For example, consider what would happen if one
|
||||||
|
transaction, $A$, rearranged the layout of a data structure, a second
|
||||||
|
transaction, $B$, added a value to the rearranged structure, and then
|
||||||
|
the first transaction aborted. (Note that the structure is not
|
||||||
|
isolated.) While applying physical undo information to the altered
|
||||||
|
data structure, the $A$ would undo the writes that it performed
|
||||||
|
without considering the data values and structural changes introduced
|
||||||
|
$B$, which is likely to cause corruption. At this point, $B$ would
|
||||||
|
have to be aborted as well ({\em cascading aborts}).
|
||||||
|
|
||||||
|
With nested top actions, ARIES defines the structural changes as their
|
||||||
|
own mini-transaction. This means that the structural change
|
||||||
|
``commits'' even if the containing transaction ($A$) aborts, which
|
||||||
|
ensures that $B$'s update remains valid.
|
||||||
|
|
||||||
|
\yad supports nested atomic actions as the preferred way to build
|
||||||
|
high-performance data structures. In particular, an operation that
|
||||||
|
spans pages can be made atomic by simply wrapping it in a nested top
|
||||||
|
action and obtaining appropriate latches at runtime. This approach
|
||||||
|
reduces development of atomic page spanning operations to something
|
||||||
|
very similar to conventional multithreaded development that use mutexes
|
||||||
|
for synchronization.
|
||||||
|
|
||||||
|
In particular, we have found a simple recipe for converting a
|
||||||
|
non-concurrent data structure into a concurrent one, which involves
|
||||||
|
three steps:
|
||||||
|
\begin{enumerate}
|
||||||
|
\item Wrap a mutex around each operation, this can be done with the lock
|
||||||
|
manager, or just using pthread mutexes. This provides fine-grain isolation.
|
||||||
|
\item Define a logical UNDO for each operation (rather than just using
|
||||||
|
a lower-level physical undo). For example, this is easy for a
|
||||||
|
hashtable; e.g. the undo for an {\em insert} is {\em remove}.
|
||||||
|
\item For mutating operations (not read-only), add a ``begin nested
|
||||||
|
top action'' right after the mutex acquisition, and a ``commit
|
||||||
|
nested top action'' where we release the mutex.
|
||||||
|
\end{enumerate}
|
||||||
|
This recipe ensures that any operations that might span multiple pages
|
||||||
|
commit any structural changes and thus avoids cascading aborts. If
|
||||||
|
this transaction aborts, the logical undo will {\em compensate} for
|
||||||
|
its effects, but leave its structural changes in tact (or augment
|
||||||
|
them). Note that by releasing the mutex before we commit, we are
|
||||||
|
violating strict two-phase locking in exchange for better performance.
|
||||||
|
We have found the recipe to be easy to follow and very effective, and
|
||||||
|
we use in everywhere we have structural changes, such as growing a
|
||||||
|
hash table or array.
|
||||||
|
|
||||||
|
|
||||||
|
%% \textcolor{red}{OLD TEXT:} Section~\ref{sub:OperationProperties} states that \yad does not allow
|
||||||
|
%% cascading aborts, implying that operation implementors must protect
|
||||||
|
%% transactions from any structural changes made to data structures by
|
||||||
|
%% uncommitted transactions, but \yad does not provide any mechanisms
|
||||||
|
%% designed for long-term locking. However, one of \yad's goals is to
|
||||||
|
%% make it easy to implement custom data structures for use within safe,
|
||||||
|
%% multi-threaded transactions. Clearly, an additional mechanism is
|
||||||
|
%% needed.
|
||||||
|
|
||||||
|
%% The solution is to allow portions of an operation to ``commit'' before
|
||||||
|
%% the operation returns.\footnote{We considered the use of nested top actions, which \yad could easily
|
||||||
|
%% support. However, we currently use the slightly simpler (and lighter-weight)
|
||||||
|
%% mechanism described here. If the need arises, we will add support
|
||||||
|
%% for nested top actions.}
|
||||||
|
%% An operation's wrapper is just a normal function, and therefore may
|
||||||
|
%% generate multiple log entries. First, it writes an undo-only entry
|
||||||
|
%% to the log. This entry will cause the \emph{logical} inverse of the
|
||||||
|
%% current operation to be performed at recovery or abort, must be idempotent,
|
||||||
|
%% and must fail gracefully if applied to a version of the database that
|
||||||
|
%% does not contain the results of the current operation. Also, it must
|
||||||
|
%% behave correctly even if an arbitrary number of intervening operations
|
||||||
|
%% are performed on the data structure.
|
||||||
|
|
||||||
|
%% Next, the operation writes one or more redo-only log entries that may
|
||||||
|
%% perform structural modifications to the data structure. These redo
|
||||||
|
%% entries have the constraint that any prefix of them must leave the
|
||||||
|
%% database in a consistent state, since only a prefix might execute
|
||||||
|
%% before a crash. This is not as hard as it sounds, and in fact the
|
||||||
|
%% $B^{LINK}$ tree~\cite{blink} is an example of a B-Tree implementation
|
||||||
|
%% that behaves in this way, while the linear hash table implementation
|
||||||
|
%% discussed in Section~\ref{sub:Linear-Hash-Table} is a scalable hash
|
||||||
|
%% table that meets these constraints.
|
||||||
|
|
||||||
|
%% %[EAB: I still think there must be a way to log all of the redoes
|
||||||
|
%% %before any of the actions take place, thus ensuring that you can redo
|
||||||
|
%% %the whole thing if needed. Alternatively, we could pin a page until
|
||||||
|
%% %the set completes, in which case we know that that all of the records
|
||||||
|
%% %are in the log before any page is stolen.]
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
\subsection{Adding Log Operations}
|
||||||
|
\label{op-def}
|
||||||
|
|
||||||
% \item {\bf ARIES provides {}``transactional pages'' }
|
% \item {\bf ARIES provides {}``transactional pages'' }
|
||||||
|
|
||||||
\yad is designed to allow application developers to easily add new
|
Given this background, we now cover adding new operations. \yad is
|
||||||
data representations and data structures by defining new operations
|
designed to allow application developers to easily add new data
|
||||||
that can be used to provide transactions. There are a number of
|
representations and data structures by defining new operations.
|
||||||
constraints that these extensions must obey:
|
|
||||||
|
|
||||||
\begin{itemize}
|
There are a number of invariants that these operations must obey:
|
||||||
|
\begin{enumerate}
|
||||||
\item Pages should only be updated inside of a redo or undo function.
|
\item Pages should only be updated inside of a redo or undo function.
|
||||||
\item An update to a page atomically updates the LSN by pinning the page.
|
\item An update to a page atomically updates the LSN by pinning the page.
|
||||||
\item If the data read by the wrapper function must match the state of
|
\item If the data read by the wrapper function must match the state of
|
||||||
the page that the redo function sees, then the wrapper should latch
|
the page that the redo function sees, then the wrapper should latch
|
||||||
the relevant data.
|
the relevant data.
|
||||||
\item Redo operations address {\em pages} by physical offset,
|
\item Redo operations use page numbers and possibly record numbers
|
||||||
while Undo operations address {\em data} with a permanent address (such as an index key)
|
while Undo operations use these or logical names/keys
|
||||||
\item An operation must never leave the data store in an unrecoverable state. Usually
|
\item Acquire latches as needed (typically per page or record)
|
||||||
this means ensuring operation atomicity at some level of granularity, and arranging for re
|
\item Use nested top actions or ``big locks'' for multi-page updates
|
||||||
covery to perform physical and logical undo as appropriate. (Section~\ref{nested-top-actions})
|
\end{enumerate}
|
||||||
\end{itemize}
|
|
||||||
|
|
||||||
\rcs{Implementation of Increment here?}
|
\subsubsection{Example: Increment/Decrement}
|
||||||
|
|
||||||
We believe that it is reasonable to expect application developers to
|
A common optimization for TPC benchmarks is to provide hand-built
|
||||||
correctly implement extensions that make use of Nested Top Actions.
|
operations that support adding/subtracting from an account. Such
|
||||||
|
operations improve concurrency since they can be reordered and can be
|
||||||
Because undo and redo operations during normal operation and recovery
|
easily made into nested top actions (since the the logical undo is
|
||||||
are similar, most bugs will be found with conventional testing
|
trivial). Here we show how increment/decrement map onto \yad operations.
|
||||||
strategies. There is some hope of verifying atomicity~\cite{StaticAnalysisReference} if
|
|
||||||
nested top actions are used. Furthermore, we plan to develop a
|
|
||||||
number of tools that will automatically verify or test new operation
|
|
||||||
implementations' behavior with respect to these constraints, and
|
|
||||||
behavior during recovery. For example, whether or not nested top actions are
|
|
||||||
used, randomized testing or more advanced sampling techniques~\cite{OSDIFSModelChecker}
|
|
||||||
could be used to check operation behavior under various recovery
|
|
||||||
conditions and thread schedules.
|
|
||||||
|
|
||||||
However, as we will see in Section~\ref{OASYS}, some applications may
|
|
||||||
have valid reasons to ``break'' recovery semantics. It is unclear how
|
|
||||||
useful such testing tools will be in this case.
|
|
||||||
|
|
||||||
Note that the ARIES algorithm is extremely complex, and we have left
|
|
||||||
out most of the details needed to understand how ARIES works, or to
|
|
||||||
implement it correctly.
|
|
||||||
Yet, we believe we have covered everything that a programmer needs
|
|
||||||
to know in order to implement new transactional data structures.
|
|
||||||
This was possible due to the careful encapsulation
|
|
||||||
of portions of the ARIES algorithm, which is the feature that
|
|
||||||
most strongly differentiates \yad from other, similar libraries.
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{Example: Increment}
|
|
||||||
|
|
||||||
First, we define the operation-specific part of the log record:
|
First, we define the operation-specific part of the log record:
|
||||||
\begin{small}
|
\begin{small}
|
||||||
|
@ -1077,7 +1043,23 @@ int operateIncrement(int xid, Page* p, lsn_t lsn,
|
||||||
return 0; // no error
|
return 0; // no error
|
||||||
}
|
}
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
\noindent {\normalsize Here is the wrapper that uses the operation, which is indentified via {\small\tt OP\_INCREMENT}:}
|
\noindent{\normalsize Next, we register the operation:}
|
||||||
|
\begin{verbatim}
|
||||||
|
// first set up the normal case
|
||||||
|
ops[OP_INCREMENT].implementation= &operateIncrement;
|
||||||
|
ops[OP_INCREMENT].argumentSize = sizeof(inc_dec_t);
|
||||||
|
|
||||||
|
// set the REDO to be the same as normal operation
|
||||||
|
// Sometime is useful to have them differ.
|
||||||
|
ops[OP_INCREMENT].redoOperation = OP_INCREMENT;
|
||||||
|
|
||||||
|
// set UNDO to be the inverse
|
||||||
|
ops[OP_INCREMENT].undoOperation = OP_DECREMENT;
|
||||||
|
\end{verbatim}
|
||||||
|
\noindent {\normalsize Finally, here is the wrapper that uses the
|
||||||
|
operation, which is indentified via {\small\tt OP\_INCREMENT};
|
||||||
|
applications use the wrapper rather than the operation, as it tends to
|
||||||
|
be cleaner.}
|
||||||
\begin{verbatim}
|
\begin{verbatim}
|
||||||
int Tincrement(int xid, recordid rid, int amount) {
|
int Tincrement(int xid, recordid rid, int amount) {
|
||||||
// rec will be serialized to the log.
|
// rec will be serialized to the log.
|
||||||
|
@ -1094,21 +1076,43 @@ int Tincrement(int xid, recordid rid, int amount) {
|
||||||
return new_value;
|
return new_value;
|
||||||
}
|
}
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
\noindent{\normalsize Given the wrapper and the operation, we register the operation:}
|
|
||||||
\begin{verbatim}
|
|
||||||
// first set up the normal case
|
|
||||||
ops[OP_INCREMENT].implementation= &operateIncrement;
|
|
||||||
ops[OP_INCREMENT].argumentSize = sizeof(inc_dec_t);
|
|
||||||
|
|
||||||
// set the REDO to be the same as normal operation
|
|
||||||
// Sometime is useful to have them differ.
|
|
||||||
ops[OP_INCREMENT].redoOperation = OP_INCREMENT;
|
|
||||||
|
|
||||||
// set UNDO to be the inverse
|
|
||||||
ops[OP_INCREMENT].undoOperation = OP_DECREMENT;
|
|
||||||
\end{verbatim}
|
|
||||||
\end{small}
|
\end{small}
|
||||||
|
|
||||||
|
|
||||||
|
\subsubsection{Correctness}
|
||||||
|
|
||||||
|
With some examination it is possible to show that this example meets
|
||||||
|
the invariants. In addition, because the redo code is used for normal
|
||||||
|
operation, most bugs are easy to find with conventional testing
|
||||||
|
strategies. As future work, there is some hope of verifying these
|
||||||
|
invariants statically; for example, it is easy to verify that pages
|
||||||
|
are only modified by operations, and it is also possible to verify
|
||||||
|
latching for our two page layouts that support records.
|
||||||
|
|
||||||
|
%% Furthermore, we plan to develop a number of tools that will
|
||||||
|
%% automatically verify or test new operation implementations' behavior
|
||||||
|
%% with respect to these constraints, and behavior during recovery. For
|
||||||
|
%% example, whether or not nested top actions are used, randomized
|
||||||
|
%% testing or more advanced sampling techniques~\cite{OSDIFSModelChecker}
|
||||||
|
%% could be used to check operation behavior under various recovery
|
||||||
|
%% conditions and thread schedules.
|
||||||
|
|
||||||
|
However, as we will see in Section~\ref{OASYS}, even these invariants
|
||||||
|
can be stretched by sophisticated developers.
|
||||||
|
|
||||||
|
\subsection{Summary}
|
||||||
|
|
||||||
|
\eab{update}
|
||||||
|
Note that the ARIES algorithm is extremely complex, and we have left
|
||||||
|
out most of the details needed to understand how ARIES works, or to
|
||||||
|
implement it correctly. Yet, we believe we have covered everything
|
||||||
|
that a programmer needs to know in order to implement new
|
||||||
|
transactional data structures. This was possible due to the careful
|
||||||
|
encapsulation of portions of the ARIES algorithm, which is the feature
|
||||||
|
that most strongly differentiates \yad from other, similar libraries.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
%We hope that this will increase the availability of transactional
|
%We hope that this will increase the availability of transactional
|
||||||
%data primitives to application developers.
|
%data primitives to application developers.
|
||||||
|
|
||||||
|
@ -1241,6 +1245,13 @@ ops[OP_INCREMENT].undoOperation = OP_DECREMENT;
|
||||||
|
|
||||||
%\end{enumerate}
|
%\end{enumerate}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\section{Experimental setup}
|
\section{Experimental setup}
|
||||||
|
|
||||||
The following sections describe the design and implementation of
|
The following sections describe the design and implementation of
|
||||||
|
@ -1592,7 +1603,7 @@ mentioned above, and used Berkeley DB for comparison.
|
||||||
%developers that settle for ``slow'' straightforward implementations of
|
%developers that settle for ``slow'' straightforward implementations of
|
||||||
%specialized data structures should achieve better performance than would
|
%specialized data structures should achieve better performance than would
|
||||||
%be possible by using existing systems that only provide general purpose
|
%be possible by using existing systems that only provide general purpose
|
||||||
%primatives.
|
%primitives.
|
||||||
|
|
||||||
The first test (Figure~\ref{fig:BULK_LOAD}) measures the throughput of
|
The first test (Figure~\ref{fig:BULK_LOAD}) measures the throughput of
|
||||||
a single long-running
|
a single long-running
|
||||||
|
@ -1906,11 +1917,11 @@ This section uses:
|
||||||
\item{Reusability of operation implementations (borrow's the hashtable's bucket list (the Array List) implementation to store objcets}
|
\item{Reusability of operation implementations (borrow's the hashtable's bucket list (the Array List) implementation to store objcets}
|
||||||
\item{Clean seperation of logical and physiological operations provided by wrapper functions allows us to reorder requests}
|
\item{Clean seperation of logical and physiological operations provided by wrapper functions allows us to reorder requests}
|
||||||
\item{Addressibility of data by page offset provides the information that is necessary to produce locality in workloads}
|
\item{Addressibility of data by page offset provides the information that is necessary to produce locality in workloads}
|
||||||
\item{The idea of the log as an application primative, which can be generalized to other applications such as log entry merging, more advanced reordering primatives, network replication schemes, etc.}
|
\item{The idea of the log as an application primitive, which can be generalized to other applications such as log entry merging, more advanced reordering primitives, network replication schemes, etc.}
|
||||||
\end{enumerate}
|
\end{enumerate}
|
||||||
%\begin{enumerate}
|
%\begin{enumerate}
|
||||||
%
|
%
|
||||||
% \item {\bf Comparison of transactional primatives (best case for each operator)}
|
% \item {\bf Comparison of transactional primitives (best case for each operator)}
|
||||||
%
|
%
|
||||||
% \item {\bf Serialization Benchmarks (Abstract log) }
|
% \item {\bf Serialization Benchmarks (Abstract log) }
|
||||||
%
|
%
|
||||||
|
@ -1941,7 +1952,7 @@ This section uses:
|
||||||
\section{Future work}
|
\section{Future work}
|
||||||
|
|
||||||
We have described a new approach toward developing applications using
|
We have described a new approach toward developing applications using
|
||||||
generic transactional storage primatives. This approach raises a
|
generic transactional storage primitives. This approach raises a
|
||||||
number of important questions which fall outside the scope of its
|
number of important questions which fall outside the scope of its
|
||||||
initial design and implementation.
|
initial design and implementation.
|
||||||
|
|
||||||
|
@ -1970,10 +1981,10 @@ of the issues that we will face in distributed domains. By adding
|
||||||
networking support to our logical log interface,
|
networking support to our logical log interface,
|
||||||
we should be able to multiplex and replicate log entries to sets of
|
we should be able to multiplex and replicate log entries to sets of
|
||||||
nodes easily. Single node optimizations such as the demand based log
|
nodes easily. Single node optimizations such as the demand based log
|
||||||
reordering primative should be directly applicable to multi-node
|
reordering primitive should be directly applicable to multi-node
|
||||||
systems.~\footnote{For example, our (local, and non-redundant) log
|
systems.~\footnote{For example, our (local, and non-redundant) log
|
||||||
multiplexer provides semantics similar to the
|
multiplexer provides semantics similar to the
|
||||||
Map-Reduce~\cite{mapReduce} distributed programming primative, but
|
Map-Reduce~\cite{mapReduce} distributed programming primitive, but
|
||||||
exploits hard disk and buffer pool locality instead of the parallelism
|
exploits hard disk and buffer pool locality instead of the parallelism
|
||||||
inherent in large networks of computer systems.} Also, we believe
|
inherent in large networks of computer systems.} Also, we believe
|
||||||
that logical, host independent logs may be a good fit for applications
|
that logical, host independent logs may be a good fit for applications
|
||||||
|
@ -1990,15 +2001,15 @@ this functionality. We are unaware of any transactional system that
|
||||||
provides such a broad range of data structure implementations.
|
provides such a broad range of data structure implementations.
|
||||||
|
|
||||||
Also, we have noticed that the intergration between transactional
|
Also, we have noticed that the intergration between transactional
|
||||||
storage primatives and in memory data structures is often fairly
|
storage primitives and in memory data structures is often fairly
|
||||||
limited. (For example, JDBC does not reuse Java's iterator
|
limited. (For example, JDBC does not reuse Java's iterator
|
||||||
interface.) We have been experimenting with the production of a
|
interface.) We have been experimenting with the production of a
|
||||||
uniform interface to iterators, maps, and other structures which would
|
uniform interface to iterators, maps, and other structures which would
|
||||||
allow code to be simultaneously written for native in-memory storage
|
allow code to be simultaneously written for native in-memory storage
|
||||||
and for our transactional layer. We believe the fundamental reason
|
and for our transactional layer. We believe the fundamental reason
|
||||||
for the differing API's of past systems is the heavy weight nature of
|
for the differing API's of past systems is the heavy weight nature of
|
||||||
the primatives provided by transactional systems, and the highly
|
the primitives provided by transactional systems, and the highly
|
||||||
specialized, light weight interfaces provided by typical in memory
|
specialized, light-weight interfaces provided by typical in memory
|
||||||
structures. Because \yad makes it easy to implement light weight
|
structures. Because \yad makes it easy to implement light weight
|
||||||
transactional structures, it may be easy to integrate it further with
|
transactional structures, it may be easy to integrate it further with
|
||||||
programming language constructs.
|
programming language constructs.
|
||||||
|
|
Loading…
Reference in a new issue