Merged conflicts, update to section 4.
This commit is contained in:
parent
8cf5d11c21
commit
630112937b
1 changed files with 35 additions and 35 deletions
|
@ -379,10 +379,6 @@ the page metadata appropriately.
|
||||||
|
|
||||||
\subsubsection{Log entries and forward operation (the Tupdate() function)\label{sub:Tupdate}}
|
\subsubsection{Log entries and forward operation (the Tupdate() function)\label{sub:Tupdate}}
|
||||||
|
|
||||||
[TODO...need to make this clearer... I think we need to say that we define a function to do redo, and then we define an update that use
|
|
||||||
it. Recovery uses the same function the same way.]
|
|
||||||
|
|
||||||
|
|
||||||
In order to handle crashes correctly, and in order to the undo the
|
In order to handle crashes correctly, and in order to the undo the
|
||||||
effects of aborted transactions, LLADD provides operation implementors
|
effects of aborted transactions, LLADD provides operation implementors
|
||||||
with a mechanism to log undo and redo information for their actions.
|
with a mechanism to log undo and redo information for their actions.
|
||||||
|
@ -451,33 +447,29 @@ behave correctly even if an arbitrary number of intervening operations
|
||||||
are performed on the data structure.
|
are performed on the data structure.
|
||||||
|
|
||||||
[TODO...this next paragraph doesn't make sense; also maybe move this whole subsection to later, since it is complicated]
|
[TODO...this next paragraph doesn't make sense; also maybe move this whole subsection to later, since it is complicated]
|
||||||
The remaining log entries are redo-only, and may perform structural
|
Next, the operations writes one or more redo-only log entries that may perform structural
|
||||||
modifications to the data structure. They should not make any assumptions
|
modifications to the data structure. They should not make any assumptions
|
||||||
about the consistency of the current version of the database. Finally,
|
about the consistency of the current version of the database. Finally,
|
||||||
any prefix of the sequence of the redo-only operations performed by
|
any prefix of the sequence of the redo-only operations performed by
|
||||||
this operation must leave the database in a consistent state. The
|
this operation must leave the database in a consistent state. The
|
||||||
$B^{LINK}$ tree {[}...{]} is an example of a B-Tree implementation
|
$B^{LINK}$ tree {[}...{]} is an example of a B-Tree implementation
|
||||||
that behaves in this way, as is the linear hash table implementation
|
that behaves in this way, while the linear hash table implementation
|
||||||
discussed in Section \ref{sub:Linear-Hash-Table}.
|
discussed in Section \ref{sub:Linear-Hash-Table} is a scalable
|
||||||
|
hash table that meets these constraints.
|
||||||
Some of the logging constraints introduced in this section may seem
|
|
||||||
strange at this point, but are motivated by the recovery process.
|
|
||||||
|
|
||||||
[TODO...need to explain this...]
|
|
||||||
|
|
||||||
\subsection{Recovery}
|
\subsection{Recovery}
|
||||||
|
|
||||||
|
|
||||||
\subsubsection{ANALYSIS / REDO / UNDO}
|
\subsubsection{ANALYSIS / REDO / UNDO}
|
||||||
|
|
||||||
Recovery in AIRES consists of three stages, analysis, redo and undo
|
Recovery in AIRES consists of three stages, analysis, redo and undo.
|
||||||
. The first, analysis, is
|
The first, analysis, is
|
||||||
implemented by LLADD, but will not be discussed in this
|
implemented by LLADD, but will not be discussed in this
|
||||||
paper. The second, redo, ensures that each redo entry in the log
|
paper. The second, redo, ensures that each redo entry in the log
|
||||||
will have been applied each page in the page file exactly once.
|
will have been applied each page in the page file exactly once.
|
||||||
The third phase, undo, rolls back any transactions that were active
|
The third phase, undo, rolls back any transactions that were active
|
||||||
when the crash occured, as though the application manually aborted
|
when the crash occured, as though the application manually aborted
|
||||||
them with the {}``abort()'' call.
|
them with the {}``abort'' function call.
|
||||||
|
|
||||||
After the analysis phase, the on-disk version of the page file
|
After the analysis phase, the on-disk version of the page file
|
||||||
is in the same state it was in when LLADD crashed. This means that
|
is in the same state it was in when LLADD crashed. This means that
|
||||||
|
@ -496,7 +488,7 @@ page are in an inconsistent state. Therefore, as the redo phase re-applies
|
||||||
Therefore, the redo information for each operation in the log
|
Therefore, the redo information for each operation in the log
|
||||||
must contain the physical address (page number) of the information
|
must contain the physical address (page number) of the information
|
||||||
that it modifies, and the portion of the operation executed by a single
|
that it modifies, and the portion of the operation executed by a single
|
||||||
log entry must only rely upon the contents of the page that the log
|
redo log entry must only rely upon the contents of the page that the log
|
||||||
entry refers to. Since we assume that pages are propagated to disk
|
entry refers to. Since we assume that pages are propagated to disk
|
||||||
atomically, the REDO phase may rely upon information contained within
|
atomically, the REDO phase may rely upon information contained within
|
||||||
a single page.
|
a single page.
|
||||||
|
@ -506,7 +498,7 @@ complete entries for all committed transactions. Therefore, we know that the pa
|
||||||
a physically consistent state, although it contains portions of the
|
a physically consistent state, although it contains portions of the
|
||||||
results of uncomitted transactions. The final stage of recovery is
|
results of uncomitted transactions. The final stage of recovery is
|
||||||
the undo phase, which simply aborts all uncomitted transactions. Since
|
the undo phase, which simply aborts all uncomitted transactions. Since
|
||||||
the page file is physically consistent, the transactions are aborted
|
the page file is physically consistent, the transactions may be aborted
|
||||||
exactly as they would be during normal operation.
|
exactly as they would be during normal operation.
|
||||||
|
|
||||||
|
|
||||||
|
@ -556,7 +548,7 @@ and highly-concurrent data structure using LLADD:
|
||||||
the page that the redo function sees, then the wrapper should latch
|
the page that the redo function sees, then the wrapper should latch
|
||||||
the relevant data.
|
the relevant data.
|
||||||
\item Redo operations should address pages by their physical offset,
|
\item Redo operations should address pages by their physical offset,
|
||||||
while Undo operations should use a more permenant address (such as
|
while Undo operations should use a more permanent address (such as
|
||||||
index key) if the data may move between pages over time.
|
index key) if the data may move between pages over time.
|
||||||
\item An undo operation must correctly update a data structure if any
|
\item An undo operation must correctly update a data structure if any
|
||||||
prefix of its corresponding redo operations are applied to the
|
prefix of its corresponding redo operations are applied to the
|
||||||
|
@ -567,12 +559,13 @@ Because undo and redo operations during normal operation and recovery
|
||||||
are similar, most bugs will be found with conventional testing
|
are similar, most bugs will be found with conventional testing
|
||||||
strategies. It is difficult to verify the final property, although a
|
strategies. It is difficult to verify the final property, although a
|
||||||
number of tools could be written to simulate various crash scenarios,
|
number of tools could be written to simulate various crash scenarios,
|
||||||
and check the behavior of operations under these scenarios.
|
and check the behavior of operations under these scenarios. Of course,
|
||||||
|
such a tool could easily be applied to existing LLADD operations.
|
||||||
|
|
||||||
Note that the ARIES algorithm is extremely complex, and we have left
|
Note that the ARIES algorithm is extremely complex, and we have left
|
||||||
out most of the details needed to understand how ARIES works, or to
|
out most of the details needed to understand how ARIES works, or to
|
||||||
implement it correctly.\footnote{The original ARIES paper was around 70 pages, and the ARIES/IM paper, which covered index implementation is roughly the same length.} Yet, we believe we have covered everything that a programmer needs
|
implement it correctly.\footnote{The original ARIES paper was around 70 pages, and the ARIES/IM paper, which covered index implementation is roughly the same length.} Yet, we believe we have covered everything that a programmer needs
|
||||||
to know in order to implement new data structures using the basic
|
to know in order to implement new data structures using the
|
||||||
functionality that ARIES provides. This was possible due to the encapsulation
|
functionality that ARIES provides. This was possible due to the encapsulation
|
||||||
of the ARIES algorithm inside of LLADD, which is the feature that
|
of the ARIES algorithm inside of LLADD, which is the feature that
|
||||||
most strongly differentiates LLADD from other, similar libraries.
|
most strongly differentiates LLADD from other, similar libraries.
|
||||||
|
@ -596,7 +589,8 @@ it easy to improve and customize LLADD.}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
LLADD is a toolkit for building transaction managers.
|
LLADD is a toolkit for building transaction managers.
|
||||||
It provides user-defined redo and undo behavior, and has an extendible
|
It provides user-defined redo and undo behavior, and has an extendible
|
||||||
logging system with ... types of log entries so far. Most of these
|
logging system with 19 types of log entries so far (not counting those
|
||||||
|
internal to LLADD, such as ``begin'', ``abort'', and ``clr''). Most of these
|
||||||
extensions deal with data layout or modification, but some deal with
|
extensions deal with data layout or modification, but some deal with
|
||||||
other aspects of LLADD, such as extensions to recovery semantics (Section
|
other aspects of LLADD, such as extensions to recovery semantics (Section
|
||||||
\ref{sub:Two-Phase-Commit}). LLADD comes with some default page layout
|
\ref{sub:Two-Phase-Commit}). LLADD comes with some default page layout
|
||||||
|
@ -609,25 +603,29 @@ Although it ships with basic operations that support variable length
|
||||||
records, hash tables and other common data types, our goal is to
|
records, hash tables and other common data types, our goal is to
|
||||||
decouple all decisions regarding data format from the implementation
|
decouple all decisions regarding data format from the implementation
|
||||||
of the logging and recovery systems. Therefore, the preceeding section
|
of the logging and recovery systems. Therefore, the preceeding section
|
||||||
is essentially documentation for potential users of the library, while
|
is essentially documentation for users of the library, while
|
||||||
the purpose of the performance numbers in our evaluation section are
|
the purpose of the performance numbers in our evaluation section are
|
||||||
not to validate our hash table, but to show that the underlying architecture
|
not to validate our hash table, but to show that the underlying architecture
|
||||||
is able to efficiently support interesting data structures.
|
is able to efficiently support interesting data structures.
|
||||||
|
|
||||||
Despite the complexity of the interactions among its modules, the
|
Despite the complexity of the interactions between its modules, the
|
||||||
basic ARIES algorithm itself is quite simple. Therefore, in order to keep
|
basic ARIES algorithm itself is quite simple. Therefore, in order to
|
||||||
LLADD simple, we started with a set of modules, and iteratively refined
|
keep LLADD simple, we started with a set of modules, and iteratively
|
||||||
the boundaries among these modules. Figure \ref{cap:LLADD-Architecture} presents the resulting architecture. The core of the LLADD library
|
refined the boundaries between these modules. Figure
|
||||||
is quite small at ... lines of code, and has been documented extensively.
|
\ref{cap:LLADD-Architecture} presents the resulting architecture. The
|
||||||
We hope that we have exposed most of the subtle interactions between
|
core of the LLADD library is quite small at 2218 lines of code, 2155
|
||||||
internal modules in the online documentation. {[}... doxygen ...{]}
|
lines of implementations of operations and other extensions, and 408
|
||||||
|
lines of installable header files.\footnote{generated using David
|
||||||
|
A. Wheeler's ``SLOCCount''} The code has been documented extensively,
|
||||||
|
and we hope that we have exposed most of the subtle interactions
|
||||||
|
between internal modules in the online documentation.
|
||||||
|
|
||||||
As LLADD has evolved, many of its sub-systems have been incrementally
|
As LLADD has evolved, many of its sub-systems have been incrementally
|
||||||
improved, and we believe that the current set of modules is amenable
|
improved, and we believe that the current set of modules is amenable
|
||||||
to the addition of new functionality. For instance, the logging module
|
to the addition of new functionality. For instance, the logging module
|
||||||
interface encapsulates all of the details regarding its on disk format,
|
interface encapsulates all of the details regarding its on disk format,
|
||||||
which would make it straightforward to implement more exotic logging
|
which would make it straightforward to implement more exotic logging
|
||||||
techniques such as using log shipping to maintain a 'warm replica'
|
techniques such as using log shipping to maintain a ``warm replica''
|
||||||
for failover purposes, or the use of log replication to avoid physical
|
for failover purposes, or the use of log replication to avoid physical
|
||||||
disk access at commit time. Similarly, the interface encodes the dependencies
|
disk access at commit time. Similarly, the interface encodes the dependencies
|
||||||
between the logger and other subsystems, so, for instance, the requirements
|
between the logger and other subsystems, so, for instance, the requirements
|
||||||
|
@ -647,9 +645,10 @@ multiple files on disk, transactional groups of program executions
|
||||||
or network requests, or even leveraging some of the advances being
|
or network requests, or even leveraging some of the advances being
|
||||||
made in the Linux and other modern operating system kernels. For example,
|
made in the Linux and other modern operating system kernels. For example,
|
||||||
ReiserFS recently added support for atomic file system operations.
|
ReiserFS recently added support for atomic file system operations.
|
||||||
This could be used to provide atomic variable sized pages
|
This could be used to provide variable sized pages
|
||||||
to LLADD. Combining some of these ideas should make it easy to
|
to LLADD. Combining these ideas should make it easy to
|
||||||
implement some interesting applications.
|
implement some interesting applications, and to improve existing
|
||||||
|
systems such as CVS, IMAP, and a host of ``simple'' desktop applications.
|
||||||
|
|
||||||
From the testing point of view, the advantage of LLADD's division
|
From the testing point of view, the advantage of LLADD's division
|
||||||
into subsystems with simple interfaces is obvious. We are able to
|
into subsystems with simple interfaces is obvious. We are able to
|
||||||
|
@ -659,8 +658,9 @@ making it easy to add new tests and debug old ones. Furthermore, by
|
||||||
adding a 'simulate crash' operation to a few of the key components,
|
adding a 'simulate crash' operation to a few of the key components,
|
||||||
we can simulate application level crashes by clearing LLADD's internal
|
we can simulate application level crashes by clearing LLADD's internal
|
||||||
state, re-initializing the library and verifying that recovery was
|
state, re-initializing the library and verifying that recovery was
|
||||||
successful. These tests currently cover approximately 90\% of the
|
successful. These tests currently cover approximately
|
||||||
code. We have not yet developed a mechanism that will allow us to
|
90\%\footnote{generated using ``gcov'', which is part of gcc, and ``lcov,'' which interprets gcov's output.}
|
||||||
|
of the code. We have not yet developed a mechanism that will allow us to
|
||||||
accurately model hardware failures, which is an area where futher
|
accurately model hardware failures, which is an area where futher
|
||||||
work is needed. However, the basis for this work will be the development
|
work is needed. However, the basis for this work will be the development
|
||||||
of test harnesses that verify operation behavior in exceptional circumstances.
|
of test harnesses that verify operation behavior in exceptional circumstances.
|
||||||
|
|
Loading…
Reference in a new issue