This commit is contained in:
Eric Brewer 2004-10-22 21:09:45 +00:00
parent bba27699c3
commit 13927883c7
2 changed files with 39 additions and 44 deletions

Binary file not shown.

View file

@ -32,26 +32,24 @@
\title{\Large \bf LLADD: Extensible Transactional Storage FIXME}
\author{
Your N.\ Here \\
%{\em Your Department} \\
{\em Your Institution}\\
%{\em Your City, State, ZIP}\\
Russell Sears and Eric Brewer\\
{\em UC Berkeley}\\
% is there a standard format for email/URLs??
% remember that ~ doesn't do what you expect, use \~{}.
{\normalsize yourname@host.site.dom, http://host.site.dom/yoururl} \\
{\normalsize \{sears,brewer\}@cs.berkeley.edu, http://lladd.sourceforge.net} \\
%
% copy the following lines to add more authors
\smallskip
Name Two Here \\
{\em Two's Institution}\\
% \smallskip
% Name Two Here \\
%{\em Two's Institution}\\
%% is there a standard format for email/URLs??
{\normalsize two@host.site.dom, http://host.site.dom/twourl}
%{\normalsize two@host.site.dom, http://host.site.dom/twourl}
%
} % end author
\maketitle
\thispagestyle{empty}
\thispagestyle{plain}
\subsection*{Abstract}
@ -88,23 +86,22 @@ more direct management of data, LLADD offers a layered architecture
that enables simple but robust data management.\footnote{A large class
of such applications are deemed ``navigational'' in the database
vocabulary, as they directly navigate data structures rather than
perform set operations. We also believe that LLADD is applicable in
the context of new, special purpose database systems (XML databases,
streaming databases, database/semantic file systems, etc), which is a
fruitful area of current work both within the database research
community and in industry.}
perform set operations.}
We also believe that LLADD is applicable in
the context of new, special-purpose database systems such as XML databases,
streaming databases, and database/semantic file systems. These form a
fruitful area of current research, but existing monolithic database systems tend to be a poor fit for these new areas.
The basic approach of LLADD, taken from ARIES [xx], is to build
The basic approach of LLADD, taken from ARIES~\cite{aries}, is to build
\emph{transactional pages}, which enables recovery on a page-by-page
basis, despite support for high concurrency and the minimization of
dish seeks during commit (by using a log). We show how to build a variety
of useful data managers on top of this layer, including persistent
hash tables, lightweight recoverable virtual memory, and simple
hash tables, lightweight recoverable virtual memory~\cite{lrvm}, and simple
databases. We also cover the details of crash recovery,
application-level support for transaction abort and commit, and basic
latching for multithreaded applications.
We also discuss the shortcomings of common applications , and explain
Finally, we also discuss the shortcomings of common applications, and explain
why LLADD provides an appropriate solution to these problems.
%[more coverage of kinds of apps? imap, lrvm, cht, file system, database]
@ -113,19 +110,19 @@ Many implementations of transactional pages exist in industry and
in the literature. Unfortunately, these algorithms tend either to
be straightforward and unsuitable for real-world deployment, or are
robust and scalable, but achieve these properties by relying upon
intricate sets of internal (and often implicit) interactions. The
ARIES algorithm falls into the second category, has been extremely
intricate sets of internal and often implicit interactions. The
ARIES algorithm falls into the second category, and has been extremely
sucessful as part of the IBM DB2 database system.
It provides performance and reliability that is comparable to that of current
commercial and open-source products. Unfortunately, while the algorithm
is conceptually simple, many subtlties arise in its implementation.
We chose ARIES as the basis of LLADD, and have made a significant
effort to document these interactions. Although a complete discussion
of the AIRES algorithm is beyond the scope of this paper, we will
of the ARIES algorithm is beyond the scope of this paper, we will
provide a breif overview, and explain the details that are relevant
to developers that wish to extend LLADD.
By documenting the interface between AIRES and higher-level primitives
By documenting the interface between ARIES and higher-level primitives
such as data structures, and by structuring LLADD to make this
interface explicit in both the library and its extensions, we hope to
make it easy to produce correct and efficient durable data
@ -141,7 +138,7 @@ modules that ``do one thing and do it well'', we believe that
LLADD can provide competitive performance while making future improvements
to its core implementation significantly easier. In order to achieve
this goal, LLADD has been split into a number of modules forming a
'core library', and a number of extensions called 'operations' that
{\em core library}, and a number of extensions called {\em operations} that
build upon the core library. Since each of these modules exports a
stable interface, they can be independently improved.
@ -151,8 +148,7 @@ stable interface, they can be independently improved.
An extensive amount of prior work covers the algorithms presented in
this paper. Most fundamentally, systems that provide transactional
consistency to their users generally include a number of common
modules. A high-level overview of a typical system is given in Figure
\ref{cap:DB-Architecture}.
modules. Figure~\ref{cap:DB-Architecture} presents a high-level overview of a typical system.
\begin{figure}
\includegraphics[%
@ -414,17 +410,18 @@ reacquired during recovery, the redo phase of the recovery process
is single threaded. Since latches acquired by the wrapper function
are held while the log entry and page are updated, the ordering of
the log entries and page updates associated with a particular latch
must be consistent. Because undo occurs during normal operation,
will be consistent. Because undo occurs during normal operation,
some care must be taken to ensure that undo operations obtain the
proper latches.
\subsection{Recovery}
In this section, we present the details of crach recovery, user-defined logging, and atomic actions that commit even if their enclosing transaction aborts.
\subsubsection{ANALYSIS / REDO / UNDO}
Recovery in AIRES consists of three stages, analysis, redo and undo.
Recovery in ARIES consists of three stages, analysis, redo and undo.
The first, analysis, is
implemented by LLADD, but will not be discussed in this
paper. The second, redo, ensures that each redo entry in the log
@ -467,20 +464,19 @@ exactly as they would be during normal operation.
\subsubsection{Physical, Logical and Phisiological Logging.}
The above discussion avoided the use of some terminology that is common
in the database literature and which should be presented here. {}``Physical
loggging'' is the practice of logging physical (byte level) upates
and the physical (page number) addresses that they are applied to.
in the database literature and which should be presented here. ``Physical
loggging'' is the practice of logging physical (byte-level) updates
and the physical (page number) addresses to which they are applied.
It is subtly different than {}``physiological logging,'' which is
It is subtly different than ``physiological logging,'' which is
what LLADD recommends for its redo records. In physiological logging,
the physical (page number) address is stored, but the byte offset
the physical address (page number) is stored, but the byte offset
and the actual difference are stored implicitly in the parameters
of some function. When the parameters are applied to the function,
it will update the page in a way that preserves application semantics.
This allows for some convenient optimizations. For example, data within
The common use for this is {\em slotted pages}, which use a level of indirection to allow records to be rearranged on the page; redo operations use the index as the parameter rather than the page offset. For example, data within
a single page can be re-arranged at runtime to produce contiguous
regions of free space, or the parameters passed to the function may
be significantly smaller than the physical change made to the page.
regions of free space. LLADD generalizes this model; for example, the parameters passed to the function may be significantly smaller than the physical change made to the page.
{}``Logical logging'' can only be used for undo entries in LLADD,
and is identical to physiological logging, except that it stores a
@ -565,7 +561,7 @@ such a tool could easily be applied to existing LLADD operations.
Note that the ARIES algorithm is extremely complex, and we have left
out most of the details needed to understand how ARIES works, or to
implement it correctly.\footnote{The original ARIES paper was around 70 pages, and the ARIES/IM paper, which covered index implementation is roughly the same length.} Yet, we believe we have covered everything that a programmer needs
implement it correctly.\footnote{The original ARIES paper is around 70 pages, and the ARIES/IM paper~\cite{aries-IM}, which coversd index implementation is roughly the same length.} Yet, we believe we have covered everything that a programmer needs
to know in order to implement new data structures using the
functionality that ARIES provides. This was possible due to the encapsulation
of the ARIES algorithm inside of LLADD, which is the feature that
@ -612,12 +608,11 @@ is able to efficiently support interesting data structures.
Despite the complexity of the interactions between its modules, the
basic ARIES algorithm itself is quite simple. Therefore, in order to
keep LLADD simple, we started with a set of modules, and iteratively
refined the boundaries between these modules. Figure
\ref{cap:LLADD-Architecture} presents the resulting architecture. The
refined the boundaries between these modules. Figure~\ref{cap:LLADD-Architecture} presents the resulting architecture. The
core of the LLADD library is quite small at 2218 lines of code, 2155
lines of implementations of operations and other extensions, and 408
lines of installable header files.\footnote{generated using David
A. Wheeler's ``SLOCCount''} The code has been documented extensively,
lines of installable header files.\footnote{These counts were generated using David
A. Wheeler's {\tt SLOCCount}.} The code has been documented extensively,
and we hope that we have exposed most of the subtle interactions
between internal modules in the online documentation.
@ -644,9 +639,9 @@ we would like to support transactional access to resources beyond
simple page files. Some examples include transactional updates of
multiple files on disk, transactional groups of program executions
or network requests, or even leveraging some of the advances being
made in the Linux and other modern operating system kernels. For example,
ReiserFS recently added support for atomic file system operations.
This could be used to provide variable sized pages
made in the Linux and other modern OS kernels. For example,
ReiserFS recently added support for atomic file-system operations.
This could be used to provide variable-sized pages
to LLADD. Combining these ideas should make it easy to
implement some interesting applications, and to improve existing
systems such as CVS, IMAP, and a host of ``simple'' desktop applications.