Merged in some comments, added OLAP reference.
This commit is contained in:
parent
b5ce838df0
commit
7e5825aa74
2 changed files with 102 additions and 57 deletions
|
@ -75,6 +75,28 @@
|
||||||
OPTannote = {}
|
OPTannote = {}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@InProceedings{molap,
|
||||||
|
author = {Yihong Zhao and Prasad M. Deshpande and Jeffrey F. Naughton},
|
||||||
|
title = {An Array-Based Algorithm for Simultaneous Multidimensional Aggregates},
|
||||||
|
OPTcrossref = {},
|
||||||
|
OPTkey = {},
|
||||||
|
booktitle = {Proceedings of SIGMOD},
|
||||||
|
pages = {159-170},
|
||||||
|
year = {1997},
|
||||||
|
OPTeditor = {},
|
||||||
|
OPTvolume = {},
|
||||||
|
OPTnumber = {},
|
||||||
|
OPTseries = {},
|
||||||
|
OPTaddress = {},
|
||||||
|
OPTmonth = {},
|
||||||
|
OPTorganization = {},
|
||||||
|
OPTpublisher = {},
|
||||||
|
OPTnote = {},
|
||||||
|
OPTannote = {}
|
||||||
|
}
|
||||||
|
|
||||||
@Misc{hibernate,
|
@Misc{hibernate,
|
||||||
key = {hibernate},
|
key = {hibernate},
|
||||||
OPTauthor = {},
|
OPTauthor = {},
|
||||||
|
|
|
@ -276,9 +276,9 @@ translate a relation into a set of keyed tuples. If the database were
|
||||||
going to be used for short, write-intensive and high-concurrency
|
going to be used for short, write-intensive and high-concurrency
|
||||||
transactions (OLTP), the physical model would probably translate sets
|
transactions (OLTP), the physical model would probably translate sets
|
||||||
of tuples into an on-disk B-Tree. In contrast, if the database needed
|
of tuples into an on-disk B-Tree. In contrast, if the database needed
|
||||||
to support long-running, read only aggregation queries (OLAP), a
|
to support long-running, read only aggregation queries (OLAP) over high
|
||||||
physical model tuned for such queries\rcs{be more concrete here} would
|
dimensional data, a physical model that stores the data in sparse array format would
|
||||||
be more appropriate. While both OLTP and OLAP databases are based
|
be more appropriate~\cite{molap}. While both OLTP and OLAP databases are based
|
||||||
upon the relational model they make use of different physical models
|
upon the relational model they make use of different physical models
|
||||||
in order to serve different classes of applications.}
|
in order to serve different classes of applications.}
|
||||||
|
|
||||||
|
@ -481,8 +481,14 @@ may reorder writes on sector boundaries, causing an arbitrary subset
|
||||||
of a page's sectors to be updated during a crash.
|
of a page's sectors to be updated during a crash.
|
||||||
|
|
||||||
{\em Torn page detection} can be used to detect this phenomonon. Torn
|
{\em Torn page detection} can be used to detect this phenomonon. Torn
|
||||||
and corrupted pages may be recovered by restoring the page from
|
and corrupted pages may be recovered by using {\em media recovery} to
|
||||||
backup. For simplicity, this section ignores mechanisms that detect
|
restore the page from backup. Media recovery works by reinitializing
|
||||||
|
the page to zero, and playing back the REDO entries in the log that
|
||||||
|
modify the page. In practice, a system administrator would
|
||||||
|
periodically back up the page file, thus enabling log truncation and
|
||||||
|
shortening recovery time.
|
||||||
|
|
||||||
|
For simplicity, this section ignores mechanisms that detect
|
||||||
and restore torn pages, and assumes that page writes are atomic.
|
and restore torn pages, and assumes that page writes are atomic.
|
||||||
While the techniques described in this section rely on the ability to
|
While the techniques described in this section rely on the ability to
|
||||||
atomically update disk pages, this restriction is relaxed by other
|
atomically update disk pages, this restriction is relaxed by other
|
||||||
|
@ -491,21 +497,47 @@ recovery mechanisms.
|
||||||
|
|
||||||
\subsubsection{Extending \yad with new operations}
|
\subsubsection{Extending \yad with new operations}
|
||||||
|
|
||||||
Figure~\ref{fig:structure} shows how custom operations interact with
|
Figure~\ref{fig:structure} shows how operations interact with \yad. A
|
||||||
\yad. If an application does not need to make use of concurrent
|
number of default operations come with \yad. These include operations
|
||||||
|
that allocate and manipulate records, operations that implement hash
|
||||||
|
tables, and a number of methods that add functionality to recovery.
|
||||||
|
|
||||||
|
If an operation does not need to be used by concurrent
|
||||||
transactions, directly manipulating the page file is as simple as
|
transactions, directly manipulating the page file is as simple as
|
||||||
ensuring that each update to the page file occurs inside of an
|
ensuring that each update to the page file occurs inside of the
|
||||||
operation's implementation. Operation implementations must be invoked
|
operation's implementation. Operation implementations must be invoked
|
||||||
by registering a callback with \yad at startup, and then calling {\em
|
by registering a callback with \yad at startup, and then calling {\em
|
||||||
Tupdate()} to invoke the operation at runtime. Each operation should
|
Tupdate()} to invoke the operation at runtime.
|
||||||
be deterministic, provide an inverse, and acquire all of its arguments
|
|
||||||
from a struct that is passed via Tupdate(). (Operations that affect
|
Each operation should be deterministic, provide an inverse, and
|
||||||
more than one page, and ones that do not provide inverses will be
|
acquire all of its arguments from a struct that is passed via
|
||||||
described later.) The same callbacks are used during forward opertion
|
Tupdate() and from the page it updates. The callbacks that are used
|
||||||
as during recovery. Therefore operations provide a single redo
|
during forward opertion are also used during recovery. Therefore
|
||||||
function and a single undo function. (There is no ``do''
|
operations provide a single redo function and a single undo function.
|
||||||
function.) This reduces the amount of recovery-specific code in the
|
(There is no ``do'' function.) This reduces the amount of
|
||||||
system.
|
recovery-specific code in the system. Tupdate() writes the struct
|
||||||
|
that is passed to it to the log before invoking the operation's
|
||||||
|
implementation. Recovery simply reads the struct from disk and passes
|
||||||
|
it into the operation implementation.
|
||||||
|
|
||||||
|
In this portion of the discussion, operations are limited
|
||||||
|
to a single page, and provide an undo function. Operations that
|
||||||
|
affect multiple pages and that do not provide inverses will be
|
||||||
|
discussed later.
|
||||||
|
|
||||||
|
Operations are limited to a single page because their results must be
|
||||||
|
applied to the page file atomically. Some operations use the data
|
||||||
|
stored on the page to update the page. If this data were corrupted by
|
||||||
|
a non-atomic disk write, then such operations would fail during recovery.
|
||||||
|
|
||||||
|
Note that we could implement a limited form of transactions by
|
||||||
|
limiting each transaction to a single operation, and by forcing the
|
||||||
|
page that each operation updates to disk in order. This would not
|
||||||
|
require any sort of logging, but is quite inefficient in practice.
|
||||||
|
The rest of this section describes how recovery can be extended, first
|
||||||
|
to efficiently support multiple operations per transaction, and then
|
||||||
|
to allow more than one transaction to modify the same data before
|
||||||
|
committing.
|
||||||
|
|
||||||
\subsubsection{\yads Recovery Algorithm}
|
\subsubsection{\yads Recovery Algorithm}
|
||||||
|
|
||||||
|
@ -522,8 +554,8 @@ log forward in time, applying any updates that did not make it to disk
|
||||||
before the system crashed. ``Undo'' runs the log backwards in time,
|
before the system crashed. ``Undo'' runs the log backwards in time,
|
||||||
only applying portions that correspond to aborted transactions. This
|
only applying portions that correspond to aborted transactions. This
|
||||||
section only considers physical undo. Section~\ref{sec:nta} describes
|
section only considers physical undo. Section~\ref{sec:nta} describes
|
||||||
the distinction between physical and logical undo, and describes
|
the distinction between physical and logical undo.
|
||||||
logical undo. A summary of the stages of recovery and the invariants
|
A summary of the stages of recovery and the invariants
|
||||||
they establish is presented in Figure~\ref{fig:conventional-recovery}.
|
they establish is presented in Figure~\ref{fig:conventional-recovery}.
|
||||||
|
|
||||||
Redo is the only phase that makes use of LSN's stored on pages.
|
Redo is the only phase that makes use of LSN's stored on pages.
|
||||||
|
@ -575,7 +607,7 @@ committed.
|
||||||
|
|
||||||
\subsection{Concurrent Transactions}
|
\subsection{Concurrent Transactions}
|
||||||
|
|
||||||
\diff{Two factors make it more difficult to write operations that may be
|
Two factors make it more difficult to write operations that may be
|
||||||
used in concurrent transactions. The first is familiar to anyone that
|
used in concurrent transactions. The first is familiar to anyone that
|
||||||
has written multi-threaded code: Accesses to shared data structures
|
has written multi-threaded code: Accesses to shared data structures
|
||||||
must be protected by latches (mutexes). The second problem stems from
|
must be protected by latches (mutexes). The second problem stems from
|
||||||
|
@ -583,20 +615,7 @@ the fact that concurrent transactions prevent abort from simply
|
||||||
rolling back the physical updates that a transaction made.
|
rolling back the physical updates that a transaction made.
|
||||||
Fortunately, it is straightforward to reduce this second,
|
Fortunately, it is straightforward to reduce this second,
|
||||||
transaction-specific, problem to the familiar problem of writing
|
transaction-specific, problem to the familiar problem of writing
|
||||||
multi-threaded software.}
|
multi-threaded software.
|
||||||
|
|
||||||
\rcs{This text needs to make the following two points: (1)Multi-page transactions break the
|
|
||||||
atomicity assumption because their results are not applied to disk
|
|
||||||
atomically. (2) Concurrent transactions break the assumption that a
|
|
||||||
series of physical undos is the inverse of a transaction. Nested top
|
|
||||||
actions restore these two broken invariants, but are orthoganol to the
|
|
||||||
mechanisms that apply the atomic updates.}
|
|
||||||
|
|
||||||
\rcs{Work this in too: Nested top actions work by
|
|
||||||
performing physical operations on a data structure, and then
|
|
||||||
registering a CLR. The CLR contains a logical undo entry for the
|
|
||||||
operation. When recovery and abort encounter a CLR they skip the
|
|
||||||
physical undo entries, and instead apply the logical undo.}
|
|
||||||
|
|
||||||
To understand the problems that arise with concurrent transactions,
|
To understand the problems that arise with concurrent transactions,
|
||||||
consider what would happen if one transaction, A, rearranged the
|
consider what would happen if one transaction, A, rearranged the
|
||||||
|
@ -631,15 +650,18 @@ operations do not need to be undone if the containing logical operation
|
||||||
(insert) aborts. \diff{We record such operations using {\em logical
|
(insert) aborts. \diff{We record such operations using {\em logical
|
||||||
logging} and {\em physical logging}, respectively.}
|
logging} and {\em physical logging}, respectively.}
|
||||||
|
|
||||||
\diff{Each nested top action performs a single logical operation by applying
|
\diff{Each nested top action performs a single logical operation by
|
||||||
a number of physical operations to the page file. Physical REDO log
|
applying a number of physical operations to the page file. Physical
|
||||||
entries are stored in the log so that recovery can repair any
|
REDO and UNDO log entries are stored in the log so that recovery can
|
||||||
temporary inconsistency that the nested top action introduces.
|
repair any temporary inconsistency that the nested top action
|
||||||
Logical UNDO entries are recorded so that the nested top action can be
|
introduces. Once the nested top action has completed, a logical UNDO
|
||||||
rolled back even if concurrent transactions manipulate the data
|
entry is recorded, and a CLR is used to tell recovery to ignore the
|
||||||
structure. Finally, physical UNDO entries are recorded so that
|
physical UNDO entries. The logical UNDO can be safely applied even if
|
||||||
the nested top action may be rolled back if the system crashes before
|
concurrent transactions manipulate the data structure, and physical
|
||||||
it completes.}
|
UNDO can safely roll back incomplete attempts to manipulate the data
|
||||||
|
structure. Therefore, as long as the physical updates are protected
|
||||||
|
from other transactions, the nested top action can always be rolled
|
||||||
|
back.}
|
||||||
|
|
||||||
This leads to a mechanical approach that converts non-reentrant
|
This leads to a mechanical approach that converts non-reentrant
|
||||||
operations that do not support concurrent transactions into reentrant,
|
operations that do not support concurrent transactions into reentrant,
|
||||||
|
@ -650,12 +672,12 @@ concurrent operations:
|
||||||
to use finer-grained latches in a \yad operation, but it is rarely necessary.
|
to use finer-grained latches in a \yad operation, but it is rarely necessary.
|
||||||
\item Define a {\em logical} UNDO for each operation (rather than just
|
\item Define a {\em logical} UNDO for each operation (rather than just
|
||||||
using a set of page-level UNDO's). For example, this is easy for a
|
using a set of page-level UNDO's). For example, this is easy for a
|
||||||
hashtable: the UNDO for {\em insert} is {\em remove}. \diff{This logical
|
hashtable: the UNDO for {\em insert} is {\em remove}. This logical
|
||||||
undo function should arrange to acquire the mutex when invoked by
|
undo function should arrange to acquire the mutex when invoked by
|
||||||
abort or recovery.}
|
abort or recovery.
|
||||||
\item Add a ``begin nested
|
\item Add a ``begin nested
|
||||||
top action'' right after the mutex acquisition, and an ``end
|
top action'' right after the mutex acquisition, and an ``end
|
||||||
nested top action'' right before the mutex is released. \diff{\yad provides a default nested top action implementation as an extension.}
|
nested top action'' right before the mutex is released. \yad provides operations to implement nested top actions.
|
||||||
\end{enumerate}
|
\end{enumerate}
|
||||||
|
|
||||||
If the transaction that encloses a nested top action aborts, the
|
If the transaction that encloses a nested top action aborts, the
|
||||||
|
@ -744,10 +766,16 @@ technique. As far as we know, is used by all database systems that
|
||||||
update data in place. Unfortunately, this makes it difficult to map
|
update data in place. Unfortunately, this makes it difficult to map
|
||||||
large objects onto pages, as the LSN's break up the object. It
|
large objects onto pages, as the LSN's break up the object. It
|
||||||
is tempting to store the LSN's elsewhere, but then they would not be
|
is tempting to store the LSN's elsewhere, but then they would not be
|
||||||
written atomically with their page, which defeats their purpose.~\eab{Fit in RVM?}
|
written atomically with their page, which defeats their purpose.
|
||||||
|
|
||||||
This section explains how we can avoid storing LSN's on pages in \yad
|
This section explains how we can avoid storing LSN's on pages in \yad
|
||||||
without giving up durable transactional updates. In the process, we
|
without giving up durable transactional updates. The techniques here
|
||||||
|
are similar to those used by RVM~\cite{lrvm}, a system that supports
|
||||||
|
transactional updates to virtual memory. However, \yad generalizes
|
||||||
|
the concept, allowing it to co-exist with traditional pages and fully
|
||||||
|
support concurrent transactions.
|
||||||
|
|
||||||
|
In the process of removing LSN's from pages, we
|
||||||
are able to relax the atomicity assumptions that we make regarding
|
are able to relax the atomicity assumptions that we make regarding
|
||||||
writes to disk. These relaxed assumptions allow recovery to repair
|
writes to disk. These relaxed assumptions allow recovery to repair
|
||||||
torn pages without performing media recovery, and allow arbitrary
|
torn pages without performing media recovery, and allow arbitrary
|
||||||
|
@ -884,11 +912,7 @@ use of per-page LSN's assume that each page is written to disk
|
||||||
atomically even though that is generally not the case. Such schemes
|
atomically even though that is generally not the case. Such schemes
|
||||||
deal with this problem by using page formats that allow partially
|
deal with this problem by using page formats that allow partially
|
||||||
written pages to be detected. Media recovery allows them to recover
|
written pages to be detected. Media recovery allows them to recover
|
||||||
these pages. \rcs{This would be a good place to explain exactly how media recovery works. Old text: Like ARIES, \yad can recover lost pages in the page
|
these pages.
|
||||||
file by reinitializing the page to zero, and playing back the entire
|
|
||||||
log. In practice, a system administrator would periodically back up
|
|
||||||
the page file, thus enabling log truncation and shortening recovery
|
|
||||||
time.}
|
|
||||||
|
|
||||||
The Redo phase of the LSN-free recovery algorithm actually creates a
|
The Redo phase of the LSN-free recovery algorithm actually creates a
|
||||||
torn page each time it applies an old log entry to a new page.
|
torn page each time it applies an old log entry to a new page.
|
||||||
|
@ -963,10 +987,9 @@ bottom-up approach yields unexpected flexibility.}
|
||||||
|
|
||||||
\rcs{All the text in this section is orphaned, but should be worked in elsewhere.}
|
\rcs{All the text in this section is orphaned, but should be worked in elsewhere.}
|
||||||
|
|
||||||
We call such pages ``LSN-free'' pages. Although this technique is
|
Regarding LSN-free pages:
|
||||||
novel for databases, it resembles the mechanism used by
|
|
||||||
RVM~\cite{lrvm}; \yad generalizes the concept and allows it to
|
Furthermore, efficient recovery and
|
||||||
co-exist with traditional pages. Furthermore, efficient recovery and
|
|
||||||
log truncation require only minor modifications to our recovery
|
log truncation require only minor modifications to our recovery
|
||||||
algorithm. In practice, this is implemented by providing a buffer manager callback
|
algorithm. In practice, this is implemented by providing a buffer manager callback
|
||||||
for LSN free pages. The callback computes a
|
for LSN free pages. The callback computes a
|
||||||
|
|
Loading…
Reference in a new issue