rearranged section 3

2005-03-24 18:20:53 +00:00 · 2005-03-24 18:20:53 +00:00 · 95314d7641
commit 95314d7641
parent 669a4f181a
1 changed files with 212 additions and 207 deletions
--- a/doc/paper2/LLADD.tex
+++ b/doc/paper2/LLADD.tex
@ -11,7 +11,7 @@
 \usepackage{graphicx}
 \usepackage{xspace}
-\usepackage{geometry}
+\usepackage{geometry,color}
 \geometry{verbose,letterpaper,tmargin=1in,bmargin=1in,lmargin=0.75in,rmargin=0.75in}
 \makeatletter
@ -19,7 +19,8 @@
 \usepackage{babel}
 \newcommand{\yad}{Lemon\xspace}
-\newcommand{\eab}[1]{{\bf EAB: #1}}
+\newcommand{\eab}[1]{\textcolor{red}{\bf EAB: #1}}
 \newcommand{\rcs}[1]{\textcolor{green}{\bf RCS: #1}}
 \begin{document}
@ -58,7 +59,8 @@ workloads.  Finally, we discuss characteristics of this new
 architecture which provide opportunities for novel classes of
 optimizations and enhanced usability for application developers.}
-% todo/rcs Need to talk about collection api stuff / generalization of ARIES / new approach to application development
+\rcs{Need to talk about collection api stuff / generalization of ARIES
 / new approach to application development}
 %Although many systems provide transactionally consistent data
 %management, existing implementations are generally monolithic and tied
@ -188,7 +190,7 @@ These features are enabled by the several mechanisms:
      prepare call, and savepoints.
 \item[Extensible locking API] provides registration of custom lock managers
      and a generic lock manager implementation.
-\item[2PC?]
+\item[\eab{2PC?}]
 \end{description}
 We have produced a high-concurrency, high performance and reusable
@ -339,7 +341,7 @@ efforts.  Therefore, while we believe that many of the high level
 Postgres interfaces could be built using \yad, we have not yet tried 
 to implement them.
-{\em In the above paragrap, is imperative too strong a word?}
+\rcs{In the above paragrap, is imperative too strong a word?}
 % seems to provide
 %equivalents to most of the calls proposed in~\cite{newTypes} except
@ -392,7 +394,7 @@ systems, where the file system understands the contents of the files
 that it contains, and is able to provide services such as rapid
 search, or file-type specific operations such as thumb-nailing,
 automatic content updates, and so on \cite{Reiser4,WinFS,BeOS,SemanticFSWork,SemanticWeb}.  Others are simpler, such as
-Berkeley~DB~\cite{berkeleyDB, bdb}, which provides transactional
+Berkeley~DB~\cite{bdb, berkeleyDB}, which provides transactional
 % bdb's recno interface seems to be a specialized b-tree implementation - Rusty
 storage of data in indexed form using a hashtable or tree, or as a queue.  
@ -440,13 +442,14 @@ atomicity semantics may be relaxed under certain circumstances.  \yad is unique
 %the recovery log.  \yad's host independent logical log format will
 %allow applications to implement such optimizations.
-{\em compare and contrast with boxwood!!}
+\rcs{compare and contrast with boxwood!!}
-We believe, but cannot prove, that \yad can support all of these
+We believe that \yad can support all of these
-applications. We will demonstrate several of them, but leave implementation of a real
+applications. We will demonstrate several of them, but leave
-DBMS, LRVM and Boxwood to future work.  However, in each case it is
+implementation of a real DBMS, LRVM and Boxwood to future work.
-relatively easy to see how they would map onto \yad.
+However, in each case it is relatively easy to see how they would map
 onto \yad.
 %  \item {\bf Implementations of ARIES and other transactional storage
@ -480,22 +483,9 @@ discussions of write-ahead logging protocols and ARIES are available
 elsewhere~\cite{haerder, aries}, we focus on those details that are
 most important for flexibility.
 %Instead of providing a comprehensive discussion of ARIES, we will
 %focus upon those features of the algorithm that are most relevant
 %to a developer attempting to add a new set of operations. Correctly
 %implementing such extensions is complicated by concerns regarding
 %concurrency, recovery, and the possibility that any operation may
 %be rolled back at runtime.
 %
 %We first sketch the constraints placed upon operation implementations,
 %and then describe the properties of our implementation that
 %make these constraints necessary. Because comprehensive discussions of
 %write ahead logging protocols and ARIES are available elsewhere,~\cite{haerder, aries} we
 %only discuss those details relevant to the implementation of new
 %operations in \yad.
-
+\subsection{Operations}
-\subsection{Operations\label{sub:OperationProperties}}
+\label{sub:OperationProperties}
 A transaction consists of an arbitrary combination of actions, that
 will be protected according to the ACID properties mentioned above.
@ -505,10 +495,14 @@ will be protected according to the ACID properties mentioned above.
 Typically, the
 information necessary to redo and undo each action is stored in the
 log.  We refine this concept and explicitly discuss {\em operations},
-which must be atomically applicable to the page file.  For now, we
+which must be atomically applicable to the page file.  
-simply assume that operations do not span pages, and that pages are
+
-atomically written to disk.  In Section~\ref{nested-top-actions}, we 
+\yad is essentially a framework for transactional pages: each page is
-explain how operations can be nested, allowing them to span pages.
+independent and can be recovered independently. For now, we simply
 assume that operations do not span pages.  Since single pages are
 written to disk atomically, we have a simple atomic primitive on which
 to build. In Section~\ref{nested-top-actions}, we explain how to
 handle operations that span pages.
 One unique aspect of \yad, which is not true for ARIES, is that {\em
 normal} operations are defined in terms of redo and undo
@ -520,94 +514,18 @@ and update() operations described in Section~\ref{OASYS}.}  This has
 the nice property that the REDO code is known to work, since the
 original operation was the exact same ``redo''.  In general, the \yad
 philosophy is that you define operations in terms of their REDO/UNDO
-behavior, and then build a user friendly {\em wrapper} interface around them.  The
+behavior, and then build a user friendly {\em wrapper} interface
-value of \yad is that it provides a skeleton that invokes the
+around them.  The value of \yad is that it provides a skeleton that
-redo/undo functions at the {\em right} time, despite concurrency, crashes,
+invokes the redo/undo functions at the {\em right} time, despite
-media failures, and aborted transactions.  Also unlike ARIES, \yad refines
+concurrency, crashes, media failures, and aborted transactions.  Also
-the concept of the wrapper interface, making it possible to 
+unlike ARIES, \yad refines the concept of the wrapper interface,
-reschedule operations according to an application-level (or built-in) 
+making it possible to reschedule operations according to an
-policy. (Section~\ref{TransClos})
+application-level policy (Section~\ref{TransClos}).
 \subsection{Isolation\label{Isolation}}
 We allow transactions to be interleaved, allowing concurrent access to
 application data and exploiting opportunities for hardware
 parallelism.  Therefore, each action must assume that the
 physical data upon which it relies may contain uncommitted
 information and that this information may have been produced by a
 transaction that will be aborted by a crash or by the application.
 (The latter is actually harder, since there is no ``fate sharing''.)
 % Furthermore, aborting
 %and committing transactions may be interleaved, and \yad does not
 %allow cascading aborts,%
 %\footnote{That is, by aborting, one transaction may not cause other transactions
 %to abort. To understand why operation implementors must worry about
 %this, imagine that transaction A split a node in a tree, transaction
 %B added some data to the node that A just created, and then A aborted.
 %When A was undone, what would become of the data that B inserted?%
 %} so 
 Therefore, in order to implement an operation we must also implement
 synchronization mechanisms that isolate the effects of transactions
 from each other.  We use the term {\em latching} to refer to
 synchronization mechanisms that protect the physical consistency of
 \yad's internal data structures and the data store.  We say {\em
 locking} when we refer to mechanisms that provide some level of
 isolation among transactions.  
 \yad operations that allow concurrent requests must provide a
 latching implementation that is guaranteed not to deadlock.  These
 implementations need not ensure consistency of application data.
 Instead, they must maintain the consistency of any underlying data
 structures.  Generally, latches do not persist across calls performed 
 by high-level code.
 For locking, due to the variety of locking protocols available, and
 their interaction with application
 workloads~\cite{multipleGenericLocking}, we leave it to the
 application to decide what sort of transaction isolation is
 appropriate.  \yad provides a default page-level lock manager that
 performs deadlock detection, although we expect many applications to
 make use of deadlock avoidance schemes, which are already prevalent in
 multithreaded application development.  The Lock Manager is designed 
 to be generic enough to also provide index locks for hashtable 
 implementations.  We leave the implementation of hierarchical locking 
 to future work.
 For example, it would be relatively easy to build a strict two-phase
 locking hierarchical lock
 manager~\cite{hierarcicalLocking,hierarchicalLockingOnAriesExample} on
 top of \yad.  Such a lock manager would provide isolation guarantees
 for all applications that make use of it.  However, applications that
 make use of such a lock manager must check for (and recover from)
 deadlocked transactions that have been aborted by the lock manager,
 complicating application code, and possibly violating application semantics.
 Conversely, many applications do not require such a general scheme.
 For instance, an IMAP server can employ a simple lock-per-folder
 approach and use lock-ordering techniques to avoid deadlock.  This
 avoids the complexity of dealing with transactions that abort due
 to deadlock, and also removes the runtime cost of restarting 
 transactions.
 \yad provides a lock manager API that allows all three variations
 (among others). In particular, it provides upcalls on commit/abort so
 that the lock manager can release locks at the right time. We will
 revisit this point in more detail when we describe the sample
 operations that we have implemented.
 %Currently, \yad provides an optional page-level lock manager.  We are
 %unaware of any limitations in our architecture that would prevent us
 %from implementing full hierarchical locking and index locking in the
 %future. 
 %Thus, data dependencies among
 %transactions are allowed, but we still must ensure the physical
 %consistency of our data structures, such as operations on pages or locks.
 \subsection{The Log Manager}
 \label{log-manager}
 All actions performed by a committed transaction must be
 restored in the case of a crash, and all actions performed by aborting
@ -645,18 +563,6 @@ to a single page (``page-oriented redo''), and thus must be redone in
 order.  Therefore, they are produced after any rescheduling or computation
 specfic to the current state of the page file is performed.
 %% One unique aspect of \yad, which is not true for ARIES, is that {\em
 %% normal} operations use the REDO function; i.e. there is no way to
 %% modify the page except via the REDO operation.\footnote{Actually,
 %% operation implementations may circumvent this restriction, but doing
 %% so complicates recovery semantics, and only should be done as a last
 %% resort.  Currently, this is only done to implement the OASYS flush()
 %% and update() operations described in Section~\ref{OASYS}.}  This has
 %% the nice property that the REDO code is known to work, since even the
 %% original update is a ``redo''.  In general, the \yad philosophy is
 %% that you define operations in terms of their REDO/UNDO behavior, and
 %% then build a user friendly interface around those.
 Eventually, the page makes it to disk, but the REDO entry is still
 useful: we can use it to roll forward a single page from an archived
 copy.  Thus one of the nice properties of \yad, which has been tested,
@ -666,7 +572,182 @@ Because pages can be recovered independently from each other, there is
 no need to stop transactions to make a snapshot for archiving: any
 fuzzy snapshot is fine.
 \subsection{Flexible Logging}
 \label{flex-logging}
 The above discussion avoided the use of some common terminology 
 that should be presented here. {\em Physical logging } 
 is the practice of logging physical (byte-level) updates
 and the physical (page-number) addresses to which they are applied.
 {\em Physiological logging } is what \yad recommends for its redo
 records~\cite{physiological}. The physical address (page number) is
 stored, but the byte offset and the actual delta are stored implicitly
 in the parameters of the redo or undo function. These parameters allow
 the function to update the page in a way that preserves application
 semantics.  One common use for this is {\em slotted pages}, which use
 an on-page level of indirection to allow records to be rearranged
 within the page; instead of using the page offset, redo operations use
 the index to locate the data within the page. This allows data within a single
 page to be re-arranged at runtime to produce contiguous regions of
 free space. \yad generalizes this model; for example, the parameters
 passed to the function may utilize application-specific properties in
 order to be significantly smaller than the physical change made to the
 page.
 {\em Logical logging} uses a higher-level key to specify the
 UNDO/REDO.  Since these higher-level keys may affect multiple pages,
 they are prohibited for REDO functions, since our REDO is specific to
 a single page.  However, logical logging does make sense for UNDO,
 since we can assume that the pages are physically consistent when we
 apply an UNDO.  We thus use logical logging to undo operations that
 span multiple pages, as shown below.
 %% can only be used for undo entries in \yad, and
 %% stores a logical address (the key of a hash table, for instance)
 %% instead of a physical address. As we will see later, these operations
 %% may affect multiple pages.  This allows the location of data in the
 %% page file to change, even if outstanding transactions may have to roll
 %% back changes made to that data. Clearly, for \yad to be able to apply
 %% logical log entries, the page file must be physically consistent,
 %% ruling out use of logical logging for redo operations.
 \yad supports all three types of logging, and allows developers to
 register new operations, which is the key to its extensibility. After
 discussing \yad's architecture, we will revisit this topic with a number of
 concrete examples.
 \subsection{Isolation}
 \label{Isolation}
 We allow transactions to be interleaved, allowing concurrent access to
 application data and exploiting opportunities for hardware
 parallelism.  Therefore, each action must assume that the
 physical data upon which it relies may contain uncommitted
 information and that this information may have been produced by a
 transaction that will be aborted by a crash or by the application.
 %(The latter is actually harder, since there is no ``fate sharing''.)
 % Furthermore, aborting
 %and committing transactions may be interleaved, and \yad does not
 %allow cascading aborts,%
 %\footnote{That is, by aborting, one transaction may not cause other transactions
 %to abort. To understand why operation implementors must worry about
 %this, imagine that transaction A split a node in a tree, transaction
 %B added some data to the node that A just created, and then A aborted.
 %When A was undone, what would become of the data that B inserted?%
 %} so 
 Therefore, in order to implement an operation we must also implement
 synchronization mechanisms that isolate the effects of transactions
 from each other.  We use the term {\em latching} to refer to
 synchronization mechanisms that protect the physical consistency of
 \yad's internal data structures and the data store.  We say {\em
 locking} when we refer to mechanisms that provide some level of
 isolation among transactions.  
 \yad operations that allow concurrent requests must provide a latching
 (but not locking) implementation that is guaranteed not to deadlock.
 These implementations need not ensure consistency of application data.
 Instead, they must maintain the consistency of any underlying data
 structures.  Generally, latches do not persist across calls performed
 by high-level code, as that could lead to deadlock.
 For locking, due to the variety of locking protocols available, and
 their interaction with application
 workloads~\cite{multipleGenericLocking}, we leave it to the
 application to decide what degree of isolation is appropriate.  \yad
 provides a default page-level lock manager that performs deadlock
 detection, although we expect many applications to make use of
 deadlock-avoidance schemes, which are already prevalent in
 multithreaded application development.  The Lock Manager is flexible
 enough to also provide index locks for hashtable implementations, and more complex locking protocols.
 For example, it would be relatively easy to build a strict two-phase
 locking hierarchical lock
 manager~\cite{hierarcicalLocking,hierarchicalLockingOnAriesExample} on
 top of \yad.  Such a lock manager would provide isolation guarantees
 for all applications that make use of it.  However, applications that
 make use of such a lock manager must handle deadlocked transactions
 that have been aborted by the lock manager.  This is easy if all of
 the state is managed by \yad, but other state such as thread stacks
 must be handled by the application, much like exception handling.
 Conversely, many applications do not require such a general scheme.
 For instance, an IMAP server can employ a simple lock-per-folder
 approach and use lock-ordering techniques to avoid deadlock.  This
 avoids the complexity of dealing with transactions that abort due
 to deadlock, and also removes the runtime cost of restarting 
 transactions.
 \yad provides a lock manager API that allows all three variations
 (among others). In particular, it provides upcalls on commit/abort so
 that the lock manager can release locks at the right time. We will
 revisit this point in more detail when we describe some of the example
 operations.
 \subsection{Nested Top Actions}
 \label{nested-top-actions}
 \eab{here is the new location for this section}
 explain that with a ``big lock'' it is easy to write transactional data structure. (trivial example?)
 but we want more concurrency, which means 2 problems: 1) finer grain locking and 2) weaker isolation since interleaved transactions seeing the same structure
 cascading aborts problem
 solution: don't undo structural changes, just commit them even if the causeing xact fails. then logical undo to fix the aborted xact.
 % @todo this section is confusing.  Re-write it in light of page spanning operations, and the fact that we assumed opeartions don't span pages above.  A nested top action (or recoverable, carefully ordered operation) is simply a way of causing a page spanning operation to be applied atomically.  (And must be used in conjunction with latches...)  Note that the combination of latching and NTAs makes the implementation of a page spanning operation no harder than normal multithreaded software development.
 \textcolor{red}{OLD TEXT:} Section~\ref{sub:OperationProperties} states that \yad does not allow
 cascading aborts, implying that operation implementors must protect
 transactions from any structural changes made to data structures by
 uncommitted transactions, but \yad does not provide any mechanisms
 designed for long-term locking. However, one of \yad's goals is to
 make it easy to implement custom data structures for use within safe,
 multi-threaded transactions. Clearly, an additional mechanism is
 needed.
 The solution is to allow portions of an operation to ``commit'' before
 the operation returns.\footnote{We considered the use of nested top actions, which \yad could easily
 support. However, we currently use the slightly simpler (and lighter-weight)
 mechanism described here. If the need arises, we will add support
 for nested top actions.}
 An operation's wrapper is just a normal function, and therefore may
 generate multiple log entries. First, it writes an undo-only entry
 to the log. This entry will cause the \emph{logical} inverse of the
 current operation to be performed at recovery or abort, must be idempotent,
 and must fail gracefully if applied to a version of the database that
 does not contain the results of the current operation. Also, it must
 behave correctly even if an arbitrary number of intervening operations
 are performed on the data structure.
 Next, the operation writes one or more redo-only log entries that may
 perform structural modifications to the data structure. These redo
 entries have the constraint that any prefix of them must leave the
 database in a consistent state, since only a prefix might execute
 before a crash.  This is not as hard as it sounds, and in fact the
 $B^{LINK}$ tree~\cite{blink} is an example of a B-Tree implementation
 that behaves in this way, while the linear hash table implementation
 discussed in Section~\ref{sub:Linear-Hash-Table} is a scalable hash
 table that meets these constraints.
 %[EAB: I still think there must be a way to log all of the redoes
 %before any of the actions take place, thus ensuring that you can redo
 %the whole thing if needed. Alternatively, we could pin a page until
 %the set completes, in which case we know that that all of the records
 %are in the log before any page is stolen.]
 \subsection{Recovery}
 \label{recovery}
 %In this section, we present the details of crash recovery, user-defined logging, and atomic actions that commit even if their enclosing transaction aborts.
 %
@ -675,10 +756,11 @@ fuzzy snapshot is fine.
 We use the same basic recovery strategy as ARIES, which consists of
 three phases: {\em analysis}, {\em redo} and {\em undo}.  The first,
 analysis, is implemented by \yad, but will not be discussed in this
-paper. The second, redo, ensures that each redo entry is applied to its corresponding page exactly once.  The
+paper. The second, redo, ensures that each redo entry is applied to
-third phase, undo, rolls back any transactions that were active when
+its corresponding page exactly once.  The third phase, undo, rolls
-the crash occurred, as though the application manually aborted them
+back any transactions that were active when the crash occurred, as
-with the ``abort'' function call.
+though the application manually aborted them with the ``abort''
 function call.
 After the analysis phase, the on-disk version of the page file is in
 the same state it was in when \yad crashed. This means that some
@ -712,84 +794,7 @@ consistent, the transactions may be aborted exactly as they would be
 during normal operation.
 \subsection{Physical, Logical and Physiological Logging}
 The above discussion avoided the use of some common terminology 
 that should be presented here. {\em Physical logging } 
 is the practice of logging physical (byte-level) updates
 and the physical (page number) addresses to which they are applied.
 {\em Physiological logging } is what \yad recommends for its redo
 records~\cite{physiological}. The physical address (page number) is
 stored, but the byte offset and the actual delta are stored implicitly
 in the parameters of the redo or undo function. These parameters allow
 the function to update the page in a way that preserves application
 semantics.  One common use for this is {\em slotted pages}, which use
 an on-page level of indirection to allow records to be rearranged
 within the page; instead of using the page offset, redo operations use
 a logical offset to locate the data. This allows data within a single
 page to be re-arranged at runtime to produce contiguous regions of
 free space. \yad generalizes this model; for example, the parameters
 passed to the function may utilize application specific properties in
 order to be significantly smaller than the physical change made to the
 page.
 {\em Logical logging } can only be used for undo entries in \yad, and
 stores a logical address (the key of a hash table, for instance)
 instead of a physical address. As we will see later, these operations
 may affect multiple pages.  This allows the location of data in the
 page file to change, even if outstanding transactions may have to roll
 back changes made to that data. Clearly, for \yad to be able to apply
 logical log entries, the page file must be physically consistent,
 ruling out use of logical logging for redo operations.
 \yad supports all three types of logging, and allows developers to
 register new operations, which is the key to its extensibility. After
 discussing \yad's architecture, we will revisit this topic with a number of
 concrete examples.
 \subsection{Concurrency and Aborted Transactions}
 \label{nested-top-actions}
 \eab{Can't tell if you rewrote this section or not...  do we support nested top actions?  I thought we did. -- This section is horribly out of date (and confuses me when I try to read it!)  We do support nested top actions. Where does this belong w.r.t. the isolation section? Really, we should just explain how NTA's work so we don't have to explain why the hashtable is concurrent...-- Rusty}
 % @todo this section is confusing.  Re-write it in light of page spanning operations, and the fact that we assumed opeartions don't span pages above.  A nested top action (or recoverable, carefully ordered operation) is simply a way of causing a page spanning operation to be applied atomically.  (And must be used in conjunction with latches...)  Note that the combination of latching and NTAs makes the implementation of a page spanning operation no harder than normal multithreaded software development.
 Section~\ref{sub:OperationProperties} states that \yad does not
 allow cascading aborts, implying that operation implementors must
 protect transactions from any structural changes made to data structures
 by uncommitted transactions, but \yad does not provide any mechanisms
 designed for long-term locking. However, one of \yad's goals is to
 make it easy to implement custom data structures for use within safe,
 multi-threaded transactions. Clearly, an additional mechanism is needed.
 The solution is to allow portions of an operation to ``commit'' before
 the operation returns.\footnote{We considered the use of nested top actions, which \yad could easily
 support. However, we currently use the slightly simpler (and lighter-weight)
 mechanism described here. If the need arises, we will add support
 for nested top actions.}
 An operation's wrapper is just a normal function, and therefore may
 generate multiple log entries. First, it writes an undo-only entry
 to the log. This entry will cause the \emph{logical} inverse of the
 current operation to be performed at recovery or abort, must be idempotent,
 and must fail gracefully if applied to a version of the database that
 does not contain the results of the current operation. Also, it must
 behave correctly even if an arbitrary number of intervening operations
 are performed on the data structure.
 Next, the operation writes one or more redo-only log entries that may perform structural
 modifications to the data structure. These redo entries have the constraint that any prefix of them must leave the database in a consistent state, since only a prefix might execute before a crash.  This is not as hard as it sounds, and in fact the
 $B^{LINK}$ tree~\cite{blink} is an example of a B-Tree implementation
 that behaves in this way, while the linear hash table implementation
 discussed in Section~\ref{sub:Linear-Hash-Table} is a scalable 
 hash table that meets these constraints.
 %[EAB: I still think there must be a way to log all of the redoes
 %before any of the actions take place, thus ensuring that you can redo
 %the whole thing if needed. Alternatively, we could pin a page until
 %the set completes, in which case we know that that all of the records
 %are in the log before any page is stolen.]
 \section{Extendible transaction architecture}