Merged in some comments, added OLAP reference.

This commit is contained in:
Sears Russell 2006-08-02 19:34:01 +00:00
parent b5ce838df0
commit 7e5825aa74
2 changed files with 102 additions and 57 deletions

View file

@ -75,6 +75,28 @@
OPTannote = {} OPTannote = {}
} }
@InProceedings{molap,
author = {Yihong Zhao and Prasad M. Deshpande and Jeffrey F. Naughton},
title = {An Array-Based Algorithm for Simultaneous Multidimensional Aggregates},
OPTcrossref = {},
OPTkey = {},
booktitle = {Proceedings of SIGMOD},
pages = {159-170},
year = {1997},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
OPTaddress = {},
OPTmonth = {},
OPTorganization = {},
OPTpublisher = {},
OPTnote = {},
OPTannote = {}
}
@Misc{hibernate, @Misc{hibernate,
key = {hibernate}, key = {hibernate},
OPTauthor = {}, OPTauthor = {},

View file

@ -276,9 +276,9 @@ translate a relation into a set of keyed tuples. If the database were
going to be used for short, write-intensive and high-concurrency going to be used for short, write-intensive and high-concurrency
transactions (OLTP), the physical model would probably translate sets transactions (OLTP), the physical model would probably translate sets
of tuples into an on-disk B-Tree. In contrast, if the database needed of tuples into an on-disk B-Tree. In contrast, if the database needed
to support long-running, read only aggregation queries (OLAP), a to support long-running, read only aggregation queries (OLAP) over high
physical model tuned for such queries\rcs{be more concrete here} would dimensional data, a physical model that stores the data in sparse array format would
be more appropriate. While both OLTP and OLAP databases are based be more appropriate~\cite{molap}. While both OLTP and OLAP databases are based
upon the relational model they make use of different physical models upon the relational model they make use of different physical models
in order to serve different classes of applications.} in order to serve different classes of applications.}
@ -481,8 +481,14 @@ may reorder writes on sector boundaries, causing an arbitrary subset
of a page's sectors to be updated during a crash. of a page's sectors to be updated during a crash.
{\em Torn page detection} can be used to detect this phenomonon. Torn {\em Torn page detection} can be used to detect this phenomonon. Torn
and corrupted pages may be recovered by restoring the page from and corrupted pages may be recovered by using {\em media recovery} to
backup. For simplicity, this section ignores mechanisms that detect restore the page from backup. Media recovery works by reinitializing
the page to zero, and playing back the REDO entries in the log that
modify the page. In practice, a system administrator would
periodically back up the page file, thus enabling log truncation and
shortening recovery time.
For simplicity, this section ignores mechanisms that detect
and restore torn pages, and assumes that page writes are atomic. and restore torn pages, and assumes that page writes are atomic.
While the techniques described in this section rely on the ability to While the techniques described in this section rely on the ability to
atomically update disk pages, this restriction is relaxed by other atomically update disk pages, this restriction is relaxed by other
@ -491,21 +497,47 @@ recovery mechanisms.
\subsubsection{Extending \yad with new operations} \subsubsection{Extending \yad with new operations}
Figure~\ref{fig:structure} shows how custom operations interact with Figure~\ref{fig:structure} shows how operations interact with \yad. A
\yad. If an application does not need to make use of concurrent number of default operations come with \yad. These include operations
that allocate and manipulate records, operations that implement hash
tables, and a number of methods that add functionality to recovery.
If an operation does not need to be used by concurrent
transactions, directly manipulating the page file is as simple as transactions, directly manipulating the page file is as simple as
ensuring that each update to the page file occurs inside of an ensuring that each update to the page file occurs inside of the
operation's implementation. Operation implementations must be invoked operation's implementation. Operation implementations must be invoked
by registering a callback with \yad at startup, and then calling {\em by registering a callback with \yad at startup, and then calling {\em
Tupdate()} to invoke the operation at runtime. Each operation should Tupdate()} to invoke the operation at runtime.
be deterministic, provide an inverse, and acquire all of its arguments
from a struct that is passed via Tupdate(). (Operations that affect Each operation should be deterministic, provide an inverse, and
more than one page, and ones that do not provide inverses will be acquire all of its arguments from a struct that is passed via
described later.) The same callbacks are used during forward opertion Tupdate() and from the page it updates. The callbacks that are used
as during recovery. Therefore operations provide a single redo during forward opertion are also used during recovery. Therefore
function and a single undo function. (There is no ``do'' operations provide a single redo function and a single undo function.
function.) This reduces the amount of recovery-specific code in the (There is no ``do'' function.) This reduces the amount of
system. recovery-specific code in the system. Tupdate() writes the struct
that is passed to it to the log before invoking the operation's
implementation. Recovery simply reads the struct from disk and passes
it into the operation implementation.
In this portion of the discussion, operations are limited
to a single page, and provide an undo function. Operations that
affect multiple pages and that do not provide inverses will be
discussed later.
Operations are limited to a single page because their results must be
applied to the page file atomically. Some operations use the data
stored on the page to update the page. If this data were corrupted by
a non-atomic disk write, then such operations would fail during recovery.
Note that we could implement a limited form of transactions by
limiting each transaction to a single operation, and by forcing the
page that each operation updates to disk in order. This would not
require any sort of logging, but is quite inefficient in practice.
The rest of this section describes how recovery can be extended, first
to efficiently support multiple operations per transaction, and then
to allow more than one transaction to modify the same data before
committing.
\subsubsection{\yads Recovery Algorithm} \subsubsection{\yads Recovery Algorithm}
@ -522,8 +554,8 @@ log forward in time, applying any updates that did not make it to disk
before the system crashed. ``Undo'' runs the log backwards in time, before the system crashed. ``Undo'' runs the log backwards in time,
only applying portions that correspond to aborted transactions. This only applying portions that correspond to aborted transactions. This
section only considers physical undo. Section~\ref{sec:nta} describes section only considers physical undo. Section~\ref{sec:nta} describes
the distinction between physical and logical undo, and describes the distinction between physical and logical undo.
logical undo. A summary of the stages of recovery and the invariants A summary of the stages of recovery and the invariants
they establish is presented in Figure~\ref{fig:conventional-recovery}. they establish is presented in Figure~\ref{fig:conventional-recovery}.
Redo is the only phase that makes use of LSN's stored on pages. Redo is the only phase that makes use of LSN's stored on pages.
@ -575,7 +607,7 @@ committed.
\subsection{Concurrent Transactions} \subsection{Concurrent Transactions}
\diff{Two factors make it more difficult to write operations that may be Two factors make it more difficult to write operations that may be
used in concurrent transactions. The first is familiar to anyone that used in concurrent transactions. The first is familiar to anyone that
has written multi-threaded code: Accesses to shared data structures has written multi-threaded code: Accesses to shared data structures
must be protected by latches (mutexes). The second problem stems from must be protected by latches (mutexes). The second problem stems from
@ -583,20 +615,7 @@ the fact that concurrent transactions prevent abort from simply
rolling back the physical updates that a transaction made. rolling back the physical updates that a transaction made.
Fortunately, it is straightforward to reduce this second, Fortunately, it is straightforward to reduce this second,
transaction-specific, problem to the familiar problem of writing transaction-specific, problem to the familiar problem of writing
multi-threaded software.} multi-threaded software.
\rcs{This text needs to make the following two points: (1)Multi-page transactions break the
atomicity assumption because their results are not applied to disk
atomically. (2) Concurrent transactions break the assumption that a
series of physical undos is the inverse of a transaction. Nested top
actions restore these two broken invariants, but are orthoganol to the
mechanisms that apply the atomic updates.}
\rcs{Work this in too: Nested top actions work by
performing physical operations on a data structure, and then
registering a CLR. The CLR contains a logical undo entry for the
operation. When recovery and abort encounter a CLR they skip the
physical undo entries, and instead apply the logical undo.}
To understand the problems that arise with concurrent transactions, To understand the problems that arise with concurrent transactions,
consider what would happen if one transaction, A, rearranged the consider what would happen if one transaction, A, rearranged the
@ -631,15 +650,18 @@ operations do not need to be undone if the containing logical operation
(insert) aborts. \diff{We record such operations using {\em logical (insert) aborts. \diff{We record such operations using {\em logical
logging} and {\em physical logging}, respectively.} logging} and {\em physical logging}, respectively.}
\diff{Each nested top action performs a single logical operation by applying \diff{Each nested top action performs a single logical operation by
a number of physical operations to the page file. Physical REDO log applying a number of physical operations to the page file. Physical
entries are stored in the log so that recovery can repair any REDO and UNDO log entries are stored in the log so that recovery can
temporary inconsistency that the nested top action introduces. repair any temporary inconsistency that the nested top action
Logical UNDO entries are recorded so that the nested top action can be introduces. Once the nested top action has completed, a logical UNDO
rolled back even if concurrent transactions manipulate the data entry is recorded, and a CLR is used to tell recovery to ignore the
structure. Finally, physical UNDO entries are recorded so that physical UNDO entries. The logical UNDO can be safely applied even if
the nested top action may be rolled back if the system crashes before concurrent transactions manipulate the data structure, and physical
it completes.} UNDO can safely roll back incomplete attempts to manipulate the data
structure. Therefore, as long as the physical updates are protected
from other transactions, the nested top action can always be rolled
back.}
This leads to a mechanical approach that converts non-reentrant This leads to a mechanical approach that converts non-reentrant
operations that do not support concurrent transactions into reentrant, operations that do not support concurrent transactions into reentrant,
@ -650,12 +672,12 @@ concurrent operations:
to use finer-grained latches in a \yad operation, but it is rarely necessary. to use finer-grained latches in a \yad operation, but it is rarely necessary.
\item Define a {\em logical} UNDO for each operation (rather than just \item Define a {\em logical} UNDO for each operation (rather than just
using a set of page-level UNDO's). For example, this is easy for a using a set of page-level UNDO's). For example, this is easy for a
hashtable: the UNDO for {\em insert} is {\em remove}. \diff{This logical hashtable: the UNDO for {\em insert} is {\em remove}. This logical
undo function should arrange to acquire the mutex when invoked by undo function should arrange to acquire the mutex when invoked by
abort or recovery.} abort or recovery.
\item Add a ``begin nested \item Add a ``begin nested
top action'' right after the mutex acquisition, and an ``end top action'' right after the mutex acquisition, and an ``end
nested top action'' right before the mutex is released. \diff{\yad provides a default nested top action implementation as an extension.} nested top action'' right before the mutex is released. \yad provides operations to implement nested top actions.
\end{enumerate} \end{enumerate}
If the transaction that encloses a nested top action aborts, the If the transaction that encloses a nested top action aborts, the
@ -744,10 +766,16 @@ technique. As far as we know, is used by all database systems that
update data in place. Unfortunately, this makes it difficult to map update data in place. Unfortunately, this makes it difficult to map
large objects onto pages, as the LSN's break up the object. It large objects onto pages, as the LSN's break up the object. It
is tempting to store the LSN's elsewhere, but then they would not be is tempting to store the LSN's elsewhere, but then they would not be
written atomically with their page, which defeats their purpose.~\eab{Fit in RVM?} written atomically with their page, which defeats their purpose.
This section explains how we can avoid storing LSN's on pages in \yad This section explains how we can avoid storing LSN's on pages in \yad
without giving up durable transactional updates. In the process, we without giving up durable transactional updates. The techniques here
are similar to those used by RVM~\cite{lrvm}, a system that supports
transactional updates to virtual memory. However, \yad generalizes
the concept, allowing it to co-exist with traditional pages and fully
support concurrent transactions.
In the process of removing LSN's from pages, we
are able to relax the atomicity assumptions that we make regarding are able to relax the atomicity assumptions that we make regarding
writes to disk. These relaxed assumptions allow recovery to repair writes to disk. These relaxed assumptions allow recovery to repair
torn pages without performing media recovery, and allow arbitrary torn pages without performing media recovery, and allow arbitrary
@ -884,11 +912,7 @@ use of per-page LSN's assume that each page is written to disk
atomically even though that is generally not the case. Such schemes atomically even though that is generally not the case. Such schemes
deal with this problem by using page formats that allow partially deal with this problem by using page formats that allow partially
written pages to be detected. Media recovery allows them to recover written pages to be detected. Media recovery allows them to recover
these pages. \rcs{This would be a good place to explain exactly how media recovery works. Old text: Like ARIES, \yad can recover lost pages in the page these pages.
file by reinitializing the page to zero, and playing back the entire
log. In practice, a system administrator would periodically back up
the page file, thus enabling log truncation and shortening recovery
time.}
The Redo phase of the LSN-free recovery algorithm actually creates a The Redo phase of the LSN-free recovery algorithm actually creates a
torn page each time it applies an old log entry to a new page. torn page each time it applies an old log entry to a new page.
@ -963,10 +987,9 @@ bottom-up approach yields unexpected flexibility.}
\rcs{All the text in this section is orphaned, but should be worked in elsewhere.} \rcs{All the text in this section is orphaned, but should be worked in elsewhere.}
We call such pages ``LSN-free'' pages. Although this technique is Regarding LSN-free pages:
novel for databases, it resembles the mechanism used by
RVM~\cite{lrvm}; \yad generalizes the concept and allows it to Furthermore, efficient recovery and
co-exist with traditional pages. Furthermore, efficient recovery and
log truncation require only minor modifications to our recovery log truncation require only minor modifications to our recovery
algorithm. In practice, this is implemented by providing a buffer manager callback algorithm. In practice, this is implemented by providing a buffer manager callback
for LSN free pages. The callback computes a for LSN free pages. The callback computes a