cleanup sec 3; remove sec 6
This commit is contained in:
parent
3bc8b20920
commit
2fcb841ffe
1 changed files with 174 additions and 145 deletions
|
@ -228,9 +228,8 @@ customized to implement many existing (and some new) write-ahead
|
||||||
logging variants. We present implementations of some of these variants and
|
logging variants. We present implementations of some of these variants and
|
||||||
benchmark them against popular real-world systems. We
|
benchmark them against popular real-world systems. We
|
||||||
conclude with a survey of related and future work.
|
conclude with a survey of related and future work.
|
||||||
|
|
||||||
An (early) open-source implementation of
|
An (early) open-source implementation of
|
||||||
the ideas presented here is available at \eab{where?}.
|
the ideas presented here is available (see Section~\ref{sec:avail}).
|
||||||
|
|
||||||
\section{\yad is not a Database}
|
\section{\yad is not a Database}
|
||||||
\label{sec:notDB}
|
\label{sec:notDB}
|
||||||
|
@ -399,8 +398,11 @@ update disk pages atomically, we relax this restriction in Section~\cite{sec:lsn
|
||||||
|
|
||||||
\subsection{Single-Page Transactions}
|
\subsection{Single-Page Transactions}
|
||||||
|
|
||||||
Transactional pages provide the "A" and "D" properties
|
Transactional pages provide the ``A'' and ``D'' properties
|
||||||
of ACID transactions, but only within a single page. We cover
|
of ACID transactions, but only within a single page.\endnote{The ``A'' in ACID really means atomic persistence
|
||||||
|
of data, rather than atomic in-memory updates, as the term is normally
|
||||||
|
used in systems work~\cite{GR97}; the latter is covered by ``C'' and ``I''.}
|
||||||
|
We cover
|
||||||
multi-page transactions in the next section, and the rest of ACID in
|
multi-page transactions in the next section, and the rest of ACID in
|
||||||
Section~\ref{locking}. The insight behind transactional pages was
|
Section~\ref{locking}. The insight behind transactional pages was
|
||||||
that atomic page writes form a good foundation for full transactions;
|
that atomic page writes form a good foundation for full transactions;
|
||||||
|
@ -414,8 +416,8 @@ but no logging is required.
|
||||||
|
|
||||||
This approach performs poorly because we {\em force} the page to disk
|
This approach performs poorly because we {\em force} the page to disk
|
||||||
on commit, which leads to a large number of synchronous non-sequential
|
on commit, which leads to a large number of synchronous non-sequential
|
||||||
writes. By writing "redo" information to the log before committing
|
writes. By writing ``redo'' information to the log before committing
|
||||||
(write-ahead logging), we get "no force" transactions and better
|
(write-ahead logging), we get {\em no force} transactions and better
|
||||||
performance, since the synchronous writes to the log are sequential.
|
performance, since the synchronous writes to the log are sequential.
|
||||||
The pages themselves can be written out later asynchronously and often
|
The pages themselves can be written out later asynchronously and often
|
||||||
as part of a larger sequential write.
|
as part of a larger sequential write.
|
||||||
|
@ -457,7 +459,7 @@ The primary difference between \yad and ARIES for basic transactions
|
||||||
is that \yad allows user-defined operations. An {\em operation}
|
is that \yad allows user-defined operations. An {\em operation}
|
||||||
consists of both a redo and an undo function, both of which take one
|
consists of both a redo and an undo function, both of which take one
|
||||||
argument. An update is always the redo function applied to a page;
|
argument. An update is always the redo function applied to a page;
|
||||||
there is no "do" function, which ensures that updates behave the same
|
there is no ``do'' function, which ensures that updates behave the same
|
||||||
on recovery. The redo log entry consists of the LSN and the argument.
|
on recovery. The redo log entry consists of the LSN and the argument.
|
||||||
The undo entry is analagous. \yad ensures the correct ordering and
|
The undo entry is analagous. \yad ensures the correct ordering and
|
||||||
timing of all log entries and page writes. We desribe operations in
|
timing of all log entries and page writes. We desribe operations in
|
||||||
|
@ -580,39 +582,84 @@ default data structure implementations. This approach also works with the varia
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{Extending \yad with new operations}
|
\subsection{User-Defined Operations}
|
||||||
|
|
||||||
|
The first kind of extensibility enabled by \yad is user-defined operations.
|
||||||
Figure~\ref{fig:structure} shows how operations interact with \yad. A
|
Figure~\ref{fig:structure} shows how operations interact with \yad. A
|
||||||
number of default operations come with \yad. These include operations
|
number of default operations come with \yad. These include operations
|
||||||
that allocate and manipulate records, operations that implement hash
|
that allocate and manipulate records, operations that implement hash
|
||||||
tables, and a number of methods that add functionality to recovery.
|
tables, and a number of methods that add functionality to recovery.
|
||||||
|
Many of the customizations described below are implemented using
|
||||||
|
custom operations.
|
||||||
|
|
||||||
If an operation does not need to be used by concurrent
|
In this portion of the discussion, operations are limited to a single
|
||||||
transactions, directly manipulating the page file is as simple as
|
page, as they must be applied atomically. We remove the single-page
|
||||||
ensuring that each update to the page file occurs inside of the
|
constraint in Setion~\ref{sec:lsn-free}.
|
||||||
operation's implementation. Operation implementations must be invoked
|
|
||||||
by registering a callback with \yad at startup, and then calling {\em
|
|
||||||
Tupdate()} to invoke the operation at runtime.
|
|
||||||
|
|
||||||
Each operation should be deterministic, provide an inverse, and
|
Operations are invoked by registering a callback with \yad at
|
||||||
acquire all of its arguments from a struct that is passed via
|
startup, and then calling {\tt Tupdate()} to invoke the operation at
|
||||||
Tupdate() and from the page it updates. The callbacks that are used
|
runtime.
|
||||||
|
|
||||||
|
\yad ensures that operations follow the
|
||||||
|
write-ahead logging rules required for steal/no-force transactions by
|
||||||
|
controlling the timing and ordering of log and page writes. Each
|
||||||
|
operation should be deterministic, provide an inverse, and acquire all
|
||||||
|
of its arguments from a struct that is passed via {\tt Tupdate()} or from
|
||||||
|
the page it updates (or typically both). The callbacks used
|
||||||
during forward operation are also used during recovery. Therefore
|
during forward operation are also used during recovery. Therefore
|
||||||
operations provide a single redo function and a single undo function.
|
operations provide a single redo function and a single undo function.
|
||||||
(There is no ``do'' function.) This reduces the amount of
|
(There is no ``do'' function.) This reduces the amount of
|
||||||
recovery-specific code in the system. Tupdate() writes the struct
|
recovery-specific code in the system. {\tt Tupdate()} writes the struct
|
||||||
that is passed to it to the log before invoking the operation's
|
that is passed to it to the log before invoking the operation's
|
||||||
implementation. Recovery simply reads the struct from disk and invokes the operation.
|
implementation. Recovery simply reads the struct from disk and
|
||||||
|
invokes the operation at the appropriate time.
|
||||||
|
|
||||||
In this portion of the discussion, operations are limited to a single
|
\begin{figure}
|
||||||
page, and provide an undo function. Operations that affect multiple
|
\includegraphics[%
|
||||||
pages or do not provide inverses will be discussed later. \eab{where?}
|
width=1\columnwidth]{figs/structure.pdf}
|
||||||
|
\caption{\sf\label{fig:structure} The portions of \yad that directly interact with new operations.}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
Operations are limited to a single page because their results must be
|
The first step in implementing a new operation is to decide upon an
|
||||||
applied to the page file atomically. Some operations use the data
|
external interace, which is typically cleaner than using the redo/undo
|
||||||
stored on the page to update the page. If this data were corrupted by
|
functions directly. The externally visible interface is implemented
|
||||||
a non-atomic disk write, then such operations would fail during recovery.
|
by wrapper functions and read-only access methods. The wrapper
|
||||||
|
function modifies the state of the page file by packaging the
|
||||||
|
information that will be needed for redo/undo into a data format
|
||||||
|
of its choosing. This data structure is passed into {\tt Tupdate()}, which then writes a log entry and invokes the redo function.
|
||||||
|
|
||||||
|
The redo function modifies the page file directly (or takes some other
|
||||||
|
action). It is essentially an interpreter for its log entries. Undo
|
||||||
|
works analogously, but is invoked when an operation must be undone
|
||||||
|
(due to an abort).
|
||||||
|
|
||||||
|
This pattern applies in many cases. In
|
||||||
|
order to implement a ``typical'' operation, the operation's
|
||||||
|
implementation must obey a few more invariants:
|
||||||
|
\begin{itemize}
|
||||||
|
\item Pages should only be updated inside redo/undo functions.
|
||||||
|
\item Page updates atomically update the page's LSN by pinning the page.
|
||||||
|
\item If the data seen by a wrapper function must match data seen
|
||||||
|
during REDO, then the wrapper should use a latch to protect against
|
||||||
|
concurrent attempts to update the sensitive data (and against
|
||||||
|
concurrent attempts to allocate log entries that update the data).
|
||||||
|
\item Nested top actions (and logical undo) or ``big locks'' (total isolation) should be used to manage concurrency (Section~\ref{sec:nta}).
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
|
Although these restrictions are not trivial, they are not a problem in
|
||||||
|
practice. Most read-modify-write actions can be implemented as
|
||||||
|
user-defined operations, including common DBMS optimizations such as
|
||||||
|
increment operations. The power of \yad is that by following these
|
||||||
|
local restrictions, we enable new operations that meet the global
|
||||||
|
|
||||||
|
Finally, for some applications, the overhead of logging information for redo or
|
||||||
|
undo may outweigh their benefits. Operations that wish to avoid undo
|
||||||
|
logging can call an API that pins the page until commit, and use an
|
||||||
|
empty undo function. Similarly we provide an API that causes a page
|
||||||
|
to be written out on commit, which avoids redo logging.
|
||||||
|
|
||||||
|
|
||||||
|
\eat{
|
||||||
Note that we could implement a limited form of transactions by
|
Note that we could implement a limited form of transactions by
|
||||||
limiting each transaction to a single operation, and by forcing the
|
limiting each transaction to a single operation, and by forcing the
|
||||||
page that each operation updates to disk in order. If we ignore torn
|
page that each operation updates to disk in order. If we ignore torn
|
||||||
|
@ -624,7 +671,7 @@ The rest of this section describes how recovery can be extended,
|
||||||
first to support multiple operations per transaction efficiently, and
|
first to support multiple operations per transaction efficiently, and
|
||||||
then to allow more than one transaction to modify the same data before
|
then to allow more than one transaction to modify the same data before
|
||||||
committing.
|
committing.
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
\eat{
|
\eat{
|
||||||
|
@ -676,41 +723,19 @@ needs to be forced to disk once.
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
\subsection{Alternatives to Steal/no-Force}
|
|
||||||
|
|
||||||
Note that the redo logging allows \yad to avoid forcing
|
|
||||||
pages to disk, while undo logging allows pages to be stolen. For some
|
|
||||||
applications, the overhead of logging information for redo or undo may
|
|
||||||
outweigh their benefits. \yads logging discipline provides a simple
|
|
||||||
solution to this problem. If a special-purpose operation wants to
|
|
||||||
avoid writing either the Redo or the Undo information to the log then
|
|
||||||
it can have the buffer manager pin the page or flush it at commit, and
|
|
||||||
simply omit the pertinent information from the log entries it
|
|
||||||
generates.
|
|
||||||
|
|
||||||
\eab{poor paragraph}
|
|
||||||
Recovery's undo and redo phases both will process the log entry, but
|
|
||||||
one of them will have no effect. If an operation chooses not to
|
|
||||||
provide a redo implementation, then during undo the implementation will need
|
|
||||||
to determine whether or not the redo was applied. If it omits undo,
|
|
||||||
then redo must consult recovery to see if it is part of a transaction that
|
|
||||||
committed.
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{Application-specific Locking}
|
\subsection{Application-specific Locking}
|
||||||
|
|
||||||
The transactions described above only provide the
|
The transactions described above only provide the
|
||||||
``Atomicity'' and ``Durability'' properties of ACID.\endnote{The ``A'' in ACID really means atomic persistence
|
``Atomicity'' and ``Durability'' properties of ACID.
|
||||||
of data, rather than atomic in-memory updates, as the term is normally
|
``Isolation'' is
|
||||||
used in systems work~\cite{GR97};
|
|
||||||
the latter is covered by ``C'' and
|
|
||||||
``I''.} ``Isolation'' is
|
|
||||||
typically provided by locking, which is a higher-level but
|
typically provided by locking, which is a higher-level but
|
||||||
comaptible layer. ``Consistency'' is less well defined but comes in
|
comaptible layer. ``Consistency'' is less well defined but comes in
|
||||||
part from low-level mutexes that avoid races, and in part from
|
part from low-level mutexes that avoid races, and in part from
|
||||||
higher-level constructs such as unique key requirements. \yad
|
higher-level constructs such as unique key requirements. \yad, as with DBMSs,
|
||||||
supports this by distinguishing between {\em latches} and {\em locks}.
|
supports this by distinguishing between {\em latches} and {\em locks}.
|
||||||
Latches are provided using operating system mutexes, and are held for
|
Latches are provided using OS mutexes, and are held for
|
||||||
short periods of time. \yads default data structures use latches in a
|
short periods of time. \yads default data structures use latches in a
|
||||||
way that avoids deadlock. This section describes \yads latching
|
way that avoids deadlock. This section describes \yads latching
|
||||||
protocols and describes two custom lock
|
protocols and describes two custom lock
|
||||||
|
@ -739,24 +764,26 @@ coalesce or reuse any storage associated with an active transaction.
|
||||||
In contrast, the record allocator is called frequently and must enable locality. Therefore, it associates a set of pages with
|
In contrast, the record allocator is called frequently and must enable locality. Therefore, it associates a set of pages with
|
||||||
each transaction, and keeps track of deallocation events, making sure
|
each transaction, and keeps track of deallocation events, making sure
|
||||||
that space on a page is never over reserved. Providing each
|
that space on a page is never over reserved. Providing each
|
||||||
transaction with a separate pool of freespace should increase
|
transaction with a separate pool of freespace increases
|
||||||
concurrency and locality. This allocation strategy was inspired by
|
concurrency and locality. This allocation strategy was inspired by
|
||||||
Hoard, a malloc implementation for SMP machines~\cite{hoard}.
|
Hoard, a malloc implementation for SMP machines~\cite{hoard}.
|
||||||
|
|
||||||
Note that both lock managers have implementations that are tied to the
|
Note that both lock managers have implementations that are tied to the
|
||||||
code they service, both implement deadlock avoidance, and both are
|
code they service, both implement deadlock avoidance, and both are
|
||||||
transparent to higher layers. General-purpose database lock managers
|
transparent to higher layers. General-purpose database lock managers
|
||||||
provide none of these features, supporting the idea that special
|
provide none of these features, supporting the idea that
|
||||||
purpose lock managers are a useful abstraction.\rcs{This would be a
|
special-purpose lock managers are a useful abstraction.\rcs{This would
|
||||||
good place to cite Bill and others on higher-level locking protocols}
|
be a good place to cite Bill and others on higher-level locking
|
||||||
|
protocols}
|
||||||
|
|
||||||
Locking is largely orthogonal to the concepts desribed in this paper.
|
Locking is largely orthogonal to the concepts desribed in this paper.
|
||||||
We make no assumptions regarding lock managers being used by higher-level code in the remainder of this discussion.
|
We make no assumptions regarding lock managers being used by higher-level code in the remainder of this discussion.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\section{LSN-free pages.}
|
\section{LSN-free Pages}
|
||||||
\label{sec:lsn-free}
|
\label{sec:lsn-free}
|
||||||
|
|
||||||
The recovery algorithm described above uses LSNs to determine the
|
The recovery algorithm described above uses LSNs to determine the
|
||||||
version number of each page during recovery. This is a common
|
version number of each page during recovery. This is a common
|
||||||
technique. As far as we know, is used by all database systems that
|
technique. As far as we know, is used by all database systems that
|
||||||
|
@ -974,93 +1001,6 @@ physical undo information. Such optimizations can be implemented
|
||||||
using conventional transactions, but they appear to be easier to
|
using conventional transactions, but they appear to be easier to
|
||||||
implement and reason about when applied to LSN-free pages.
|
implement and reason about when applied to LSN-free pages.
|
||||||
|
|
||||||
\section{Transactional Pages}
|
|
||||||
|
|
||||||
\subsection{Blind Writes}
|
|
||||||
\label{sec:blindWrites}
|
|
||||||
\rcs{Somewhere in the description of conventional transactions, emphasize existing transactional storage systems' tendancy to hard code recommended page formats, data structures, etc.}
|
|
||||||
|
|
||||||
\rcs{All the text in this section is orphaned, but should be worked in elsewhere.}
|
|
||||||
|
|
||||||
Regarding LSN-free pages:
|
|
||||||
|
|
||||||
Furthermore, efficient recovery and
|
|
||||||
log truncation require only minor modifications to our recovery
|
|
||||||
algorithm. In practice, this is implemented by providing a buffer manager callback
|
|
||||||
for LSN free pages. The callback computes a
|
|
||||||
conservative estimate of the page's LSN whenever the page is read from disk.
|
|
||||||
For a less conservative estimate, it suffices to write a page's LSN to
|
|
||||||
the log shortly after the page itself is written out; on recovery the
|
|
||||||
log entry is thus a conservative but close estimate.
|
|
||||||
|
|
||||||
Section~\ref{sec:zeroCopy} explains how LSN-free pages led us to new
|
|
||||||
approaches for recoverable virtual memory and for large object storage.
|
|
||||||
Section~\ref{sec:oasys} uses blind writes to efficiently update records
|
|
||||||
on pages that are manipulated using more general operations.
|
|
||||||
|
|
||||||
\rcs{ (Why was this marked to be deleted? It needs to be moved somewhere else....)
|
|
||||||
Although the extensions that it proposes
|
|
||||||
require a fair amount of knowledge about transactional logging
|
|
||||||
schemes, our initial experience customizing the system for various
|
|
||||||
applications is positive. We believe that the time spent customizing
|
|
||||||
the library is less than amount of time that it would take to work
|
|
||||||
around typical problems with existing transactional storage systems.
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\section{Extending \yad}
|
|
||||||
\subsection{Adding log operations}
|
|
||||||
\label{sec:wal}
|
|
||||||
|
|
||||||
\rcs{This section needs to be merged into the new text. For now, it's an orphan.}
|
|
||||||
|
|
||||||
\yad allows application developers to easily add new operations to the
|
|
||||||
system. Many of the customizations described below can be implemented
|
|
||||||
using custom log operations. In this section, we describe how to implement an
|
|
||||||
``ARIES style'' concurrent, steal/no-force operation using
|
|
||||||
\diff{physical redo, logical undo} and per-page LSNs.
|
|
||||||
Such operations are typical of high-performance commercial database
|
|
||||||
engines.
|
|
||||||
|
|
||||||
As we mentioned above, \yad operations must implement a number of
|
|
||||||
functions. Figure~\ref{fig:structure} describes the environment that
|
|
||||||
schedules and invokes these functions. The first step in implementing
|
|
||||||
a new set of log interfaces is to decide upon an interface that these log
|
|
||||||
interfaces will export to callers outside of \yad.
|
|
||||||
|
|
||||||
\begin{figure}
|
|
||||||
\includegraphics[%
|
|
||||||
width=1\columnwidth]{figs/structure.pdf}
|
|
||||||
\caption{\sf\label{fig:structure} The portions of \yad that directly interact with new operations.}
|
|
||||||
\end{figure}
|
|
||||||
|
|
||||||
The externally visible interface is implemented by wrapper functions
|
|
||||||
and read-only access methods. The wrapper function modifies the state
|
|
||||||
of the page file by packaging the information that will be needed for
|
|
||||||
undo and redo into a data format of its choosing. This data structure
|
|
||||||
is passed into Tupdate(). Tupdate() copies the data to the log, and
|
|
||||||
then passes the data into the operation's REDO function.
|
|
||||||
|
|
||||||
REDO modifies the page file directly (or takes some other action). It
|
|
||||||
is essentially an interpreter for the log entries it is associated
|
|
||||||
with. UNDO works analogously, but is invoked when an operation must
|
|
||||||
be undone (usually due to an aborted transaction, or during recovery).
|
|
||||||
|
|
||||||
This pattern applies in many cases. In
|
|
||||||
order to implement a ``typical'' operation, the operation's
|
|
||||||
implementation must obey a few more invariants:
|
|
||||||
|
|
||||||
\begin{itemize}
|
|
||||||
\item Pages should only be updated inside REDO and UNDO functions.
|
|
||||||
\item Page updates atomically update the page's LSN by pinning the page.
|
|
||||||
\item If the data seen by a wrapper function must match data seen
|
|
||||||
during REDO, then the wrapper should use a latch to protect against
|
|
||||||
concurrent attempts to update the sensitive data (and against
|
|
||||||
concurrent attempts to allocate log entries that update the data).
|
|
||||||
\item Nested top actions (and logical undo) or ``big locks'' (total isolation but lower concurrency) should be used to manage concurrency (Section~\ref{sec:nta}).
|
|
||||||
\end{itemize}
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@ -1947,6 +1887,7 @@ dependencies within \yads API. Joe Hellerstein and Mike Franklin
|
||||||
provided us with invaluable feedback.
|
provided us with invaluable feedback.
|
||||||
|
|
||||||
\section{Availability}
|
\section{Availability}
|
||||||
|
\label{sec:avail}
|
||||||
|
|
||||||
Additional information, and \yads source code is available at:
|
Additional information, and \yads source code is available at:
|
||||||
|
|
||||||
|
@ -1961,6 +1902,93 @@ Additional information, and \yads source code is available at:
|
||||||
\bibliography{LLADD}}
|
\bibliography{LLADD}}
|
||||||
|
|
||||||
\theendnotes
|
\theendnotes
|
||||||
|
\section{Orphaned Stuff}
|
||||||
|
|
||||||
|
\subsection{Blind Writes}
|
||||||
|
\label{sec:blindWrites}
|
||||||
|
\rcs{Somewhere in the description of conventional transactions, emphasize existing transactional storage systems' tendancy to hard code recommended page formats, data structures, etc.}
|
||||||
|
|
||||||
|
\rcs{All the text in this section is orphaned, but should be worked in elsewhere.}
|
||||||
|
|
||||||
|
Regarding LSN-free pages:
|
||||||
|
|
||||||
|
Furthermore, efficient recovery and
|
||||||
|
log truncation require only minor modifications to our recovery
|
||||||
|
algorithm. In practice, this is implemented by providing a buffer manager callback
|
||||||
|
for LSN free pages. The callback computes a
|
||||||
|
conservative estimate of the page's LSN whenever the page is read from disk.
|
||||||
|
For a less conservative estimate, it suffices to write a page's LSN to
|
||||||
|
the log shortly after the page itself is written out; on recovery the
|
||||||
|
log entry is thus a conservative but close estimate.
|
||||||
|
|
||||||
|
Section~\ref{sec:zeroCopy} explains how LSN-free pages led us to new
|
||||||
|
approaches for recoverable virtual memory and for large object storage.
|
||||||
|
Section~\ref{sec:oasys} uses blind writes to efficiently update records
|
||||||
|
on pages that are manipulated using more general operations.
|
||||||
|
|
||||||
|
\rcs{ (Why was this marked to be deleted? It needs to be moved somewhere else....)
|
||||||
|
Although the extensions that it proposes
|
||||||
|
require a fair amount of knowledge about transactional logging
|
||||||
|
schemes, our initial experience customizing the system for various
|
||||||
|
applications is positive. We believe that the time spent customizing
|
||||||
|
the library is less than amount of time that it would take to work
|
||||||
|
around typical problems with existing transactional storage systems.
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
\eat{
|
||||||
|
\section{Extending \yad}
|
||||||
|
\subsection{Adding log operations}
|
||||||
|
\label{sec:wal}
|
||||||
|
|
||||||
|
\rcs{This section needs to be merged into the new text. For now, it's an orphan.}
|
||||||
|
|
||||||
|
\yad allows application developers to easily add new operations to the
|
||||||
|
system. Many of the customizations described below can be implemented
|
||||||
|
using custom log operations. In this section, we describe how to implement an
|
||||||
|
``ARIES style'' concurrent, steal/no-force operation using
|
||||||
|
\diff{physical redo, logical undo} and per-page LSNs.
|
||||||
|
Such operations are typical of high-performance commercial database
|
||||||
|
engines.
|
||||||
|
|
||||||
|
As we mentioned above, \yad operations must implement a number of
|
||||||
|
functions. Figure~\ref{fig:structure} describes the environment that
|
||||||
|
schedules and invokes these functions. The first step in implementing
|
||||||
|
a new set of log interfaces is to decide upon an interface that these log
|
||||||
|
interfaces will export to callers outside of \yad.
|
||||||
|
|
||||||
|
\begin{figure}
|
||||||
|
\includegraphics[%
|
||||||
|
width=1\columnwidth]{figs/structure.pdf}
|
||||||
|
\caption{\sf\label{fig:structure} The portions of \yad that directly interact with new operations.}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
|
The externally visible interface is implemented by wrapper functions
|
||||||
|
and read-only access methods. The wrapper function modifies the state
|
||||||
|
of the page file by packaging the information that will be needed for
|
||||||
|
undo and redo into a data format of its choosing. This data structure
|
||||||
|
is passed into Tupdate(). Tupdate() copies the data to the log, and
|
||||||
|
then passes the data into the operation's REDO function.
|
||||||
|
|
||||||
|
REDO modifies the page file directly (or takes some other action). It
|
||||||
|
is essentially an interpreter for the log entries it is associated
|
||||||
|
with. UNDO works analogously, but is invoked when an operation must
|
||||||
|
be undone (usually due to an aborted transaction, or during recovery).
|
||||||
|
|
||||||
|
This pattern applies in many cases. In
|
||||||
|
order to implement a ``typical'' operation, the operation's
|
||||||
|
implementation must obey a few more invariants:
|
||||||
|
|
||||||
|
\begin{itemize}
|
||||||
|
\item Pages should only be updated inside REDO and UNDO functions.
|
||||||
|
\item Page updates atomically update the page's LSN by pinning the page.
|
||||||
|
\item If the data seen by a wrapper function must match data seen
|
||||||
|
during REDO, then the wrapper should use a latch to protect against
|
||||||
|
concurrent attempts to update the sensitive data (and against
|
||||||
|
concurrent attempts to allocate log entries that update the data).
|
||||||
|
\item Nested top actions (and logical undo) or ``big locks'' (total isolation but lower concurrency) should be used to manage concurrency (Section~\ref{sec:nta}).
|
||||||
|
\end{itemize}
|
||||||
|
}
|
||||||
|
|
||||||
\end{document}
|
\end{document}
|
||||||
|
|
||||||
|
@ -1970,3 +1998,4 @@ Additional information, and \yads source code is available at:
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue