sec4

2006-08-19 23:25:47 +00:00 · 2006-08-19 23:25:47 +00:00 · a161be420a
commit a161be420a
parent 2fcb841ffe
1 changed files with 109 additions and 89 deletions
--- a/doc/paper3/LLADD.tex
+++ b/doc/paper3/LLADD.tex
@ -809,40 +809,39 @@ ranges of the page file to be updated by a single physical operation.
 described in this section.  However, \yad avoids hard-coding most of
 the relevant subsytems.  LSN-free pages are essentially an alternative
 protocol for atomically and durably applying updates to the page file.
-This will require the addition of a new page type that calls the logger to estimate LSNs; \yad currently has
+This will require the addition of a new page type that calls the
-three such types, not including a few minor variants. We plan
+logger to estimate LSNs; \yad currently has three such types, not
-to support the coexistance of LSN-free pages, traditional
+including some minor variants. We plan to support the coexistance of
-pages, and similar third-party modules within the same page file, log,
+LSN-free pages, traditional pages, and similar third-party modules
-transactions, and even logical operations.
+within the same page file, log, transactions, and even logical
 operations.
 \subsection{Blind Updates}
 \subsection{Blind writes}
 Recall that LSNs were introduced to prevent recovery from applying
 updates more than once, and to prevent recovery from applying old
 updates to newer versions of pages.  This was necessary because some
 operations that manipulate pages are not idempotent, or simply make
 use of state stored in the page.  
-For example, logical operations that are constrained to a single page
+As described above, \yad operations may make use of page contents to
-(physiological operations) are often used in conventional transaction
+compute the updated value, and \yad ensures that each operation is
-systems, but are often not idempotent, and rely upon the consistency
+applied exactly once in the right order. The recovery scheme described
-of the page they modify.  The recovery scheme described in this
+in this section does not guarantee that such operations will be
-section does not guarantee that such operations will be applied
+applied exactly once, or even that they will be presented with a
-exactly once, or even that they will be presented with a consistent
+consistent version of a page during recovery.
 version of a page.
-Therefore, in this section we eliminate such operations and instead
+Therefore, in this section we focus on operations that produce
-make use of deterministic REDO operations that do not examine page
+deterministic, idempotent redo entries that do not examine page state.
-state.  We call such operations ``blind writes.''  Note that we still
+We call such operations ``blind updates.''  Note that we still allow
-allow code that invokes operations to examine the page file.  For concreteness,
+code that invokes operations to examine the page file, just not during
-assume that all physical operations produce log entries that contain a
+recovery.  For concreteness, assume that these operations produce log
-set of byte ranges, and the pre- and post-value of each byte in the
+entries that contain a set of byte ranges, and the pre- and post-value
-range.  
+of each byte in the range.
-Recovery works the same way as it does above, except that is computes
+Recovery works the same way as before, except that it now computes
-a lower bound of each page LSN instead of reading the LSN from the
+a lower bound for the LSN of each page, rather than reading it from the page.
-page.  One possible lower bound is the LSN of the most recent log
+One possible lower bound is the LSN of the most recent checkpoint.  Alternatively, \yad could occasionally write (page number, LSN) pairs to the log after it writes out pages.\rcs{This would be a good place for a figure}
 truncation or checkpoint.  Alternatively, \yad could occasionally
 write information about the state of the buffer manager to the log. \rcs{This would be a good place for a figure}
 Although the mechanism used for recovery is similar, the invariants
 maintained during recovery have changed.  With conventional
@ -850,19 +849,18 @@ transactions, if a page in the page file is internally consistent
 immediately after a crash, then the page will remain internally
 consistent throughout the recovery process.  This is not the case with
 our LSN-free scheme.  Internal page inconsistecies may be introduced
-because recovery has no way of knowing which version of a page it is
+because recovery has no way of knowing the exact version of a page.
-dealing with.  Therefore, it may overwrite new portions of a page with
+Therefore, it may overwrite new portions of a page with older data
-older data from the log.
+from the log.  Therefore, the page will contain a mixture of new and
-Therefore, the page will contain a mixture of new and old bytes, and
+old bytes, and any data structures stored on the page may be
-any data structures stored on the page may be inconsistent.  However,
+inconsistent.  However, once the redo phase is complete, any old bytes
-once the redo phase is complete, any old bytes will be overwritten by
+will be overwritten by their most recent values, so the page will
-their most recent values, so the page will contain an internally
+return to an internally consistent up-to-date state.
 consistent, up-to-date version of itself.
 (Section~\ref{sec:torn-page} explains this in more detail.)
-Once Redo completes, Undo can proceed normally, with one exception.
+Once redo completes, undo can proceed normally, with one exception.
 Like normal forward operation, the redo operations that it logs may
-only perform blind-writes.  Since logical undo operations are
+only perform blind updates.  Since logical undo operations are
 generally implemented by producing a series of redo log entries
 similar to those produced at runtime, we do not think this will be a
 practical problem.
@ -875,15 +873,12 @@ simplifies some aspects of recovery.
 \subsection{Zero-copy I/O} 
 We originally developed LSN-free pages as an efficient method for
-transactionally storing and updating large (multi-page) objects.  If a
+transactionally storing and updating multi-page objects, called {\em
-large object is stored in pages that contain LSNs, then in order to
+blobs}.  If a large object is stored in pages that contain LSNs, then it is not contiguous on disk, and must be gathered together using the CPU to do an expensive copy into a second buffer.
 read that large object the system must read each page individually,
 and then use the CPU to perform a byte-by-byte copy of the portions of
 the page that contain object data into a second buffer.
 Compare this approach to modern file systems, which allow applications to
 perform a DMA copy of the data into memory, avoiding the expensive
-byte-by-byte copy, and allowing the CPU to be used for
+ copy, and allowing the CPU to be used for
 more productive purposes.  Furthermore, modern operating systems allow
 network services to use DMA and network adaptor hardware to read data
 from disk, and send it over a network socket without passing it
@ -891,32 +886,33 @@ through the CPU.  Again, this frees the CPU, allowing it to perform
 other tasks.
 We believe that LSN-free pages will allow reads to make use of such
-optimizations in a straightforward fashion.  Zero copy writes are more challenging, but could be
+optimizations in a straightforward fashion.  Zero-copy writes are
-performed by performing a DMA write to a portion of the log file.
+ more challenging, but could be performed by performing a DMA write to
-However, doing this complicates log truncation, and does not address
+a portion of the log file. However, doing this complicates log
-the problem of updating the page file.  We suspect that contributions
+truncation, and does not address the problem of updating the page
-from the log based file system~\cite{lfs} literature can address these problems.
+file.  We suspect that contributions from log-based file
-In particular, we imagine storing 
+system~\cite{lfs} can address these problems. In
-portions of the log (the portion that stores the blob) in the 
+particular, we imagine storing portions of the log (the portion that
-page file, or other addressable storage.  In the worst case, 
+stores the blob) in the page file, or other addressable storage.  In
-the blob would have to be relocated in order to defragment the 
+the worst case, the blob would have to be relocated in order to
-storage.  Assuming the blob was relocated once, this would amount 
+defragment the storage.  Assuming the blob was relocated once, this
-to a total of three, mostly sequential disk operations.  (Two 
+would amount to a total of three, mostly sequential disk operations.
-writes and one read.)  However, in the best case, the blob would only be written once.
+(Two writes and one read.)  However, in the best case, the blob would
-In contrast, conventional blob implementations generally write the blob twice. 
+only be written once.  In contrast, conventional blob implementations
 generally write the blob twice.
 Of course, \yad could also support other approaches to blob storage,
 such as using DMA and update in place to provide file system style
 semantics, or by using B-tree layouts that allow arbitrary insertions
 and deletions in the middle of objects~\cite{esm}.
-\subsection{Concurrent recoverable virtual memory}
+\subsection{Concurrent RVM}
 Our LSN-free pages are somewhat similar to the recovery scheme used by
-RVM, recoverable virtual memory, and Camelot~\cite{camelot}. RVM
+recoverable virtual memory (RVM) and Camelot~\cite{camelot}. RVM
 used purely physical logging and LSN-free pages so that it
 could use {\tt mmap()} to map portions of the page file into application
-memory\cite{lrvm}.  However, without support for logical log entries
+memory~\cite{lrvm}.  However, without support for logical log entries
 and nested top actions, it would be extremely difficult to implement a
 concurrent, durable data structure using RVM or Camelot.  (The description of
 Argus in Section~\ref{sec:transactionalProgramming} sketches the
@ -924,35 +920,39 @@ general approach.)
 In contrast, LSN-free pages allow for logical
 undo, allowing for the use of nested top actions and concurrent
-transactions; the concurrent data structure needs only provide \yad
+transactions; the concurrent data structure need only provide \yad
 with an appropriate inverse each time its logical state changes.
-We plan to add RVM style transactional memory to \yad in a way that is
+We plan to add RVM-style transactional memory to \yad in a way that is
 compatible with fully concurrent in-memory data structures such as
 hash tables and trees.  Of course, since \yad will support coexistance
 of conventional and LSN-free pages, applications will be free to use
 the \yad data structure implementations as well.
-\subsection{Page-independent transactions}
+\subsection{Transactions without Boundaries}
 \label{sec:torn-page}
 \rcs{I don't like this section heading...}  Recovery schemes that make
 use of per-page LSNs assume that each page is written to disk
 atomically even though that is generally not the case.  Such schemes
 deal with this problem by using page formats that allow partially
 written pages to be detected.  Media recovery allows them to recover
 these pages.  
-The Redo phase of the LSN-free recovery algorithm actually creates a
+Recovery schemes that make use of per-page LSNs assume that each page
-torn page each time it applies an old log entry to a new page.
+is written to disk atomically even though that is generally no longer
-However, it guarantees that all such torn pages will be repaired by
+the case in modern disk drives.  Such schemes deal with this problem
-the time Redo completes.  In the process, it also repairs any pages
+by using page formats that allow partially written pages to be
-that were torn by a crash.  Instead of relying upon atomic page
+detected.  Media recovery allows them to recover these pages.
 updates, LSN-free recovery relies upon a weaker property.
-For LSN-free recovery to work properly after a crash, each bit in
+Transactions based on blind updates do not require atomic page writes
-persistent storage must be either:
+and thus have no meaningful boundaries for atomic updates.  We still
 use pages to simplify integration into the rest of the system, but
 need not wory about torn pages.  In fact, the redo phase of the
 LSN-free recovery algorithm actually creates a torn page each time it
 applies an old log entry to a new page.  However, it guarantees that
 all such torn pages will be repaired by the time Redo completes.  In
 the process, it also repairs any pages that were torn by a crash.
 This also implies that blind-update transactions work with disks with
 different units of atomicity.
 Instead of relying upon atomic page updates, LSN-free recovery relies
 on a weaker property, which is that each bit in the page file must
 be either:
 \begin{enumerate}
 \item The old version of a bit that was being overwritten during a crash.
 \item The newest version of the bit written to storage.
@ -965,10 +965,21 @@ is updated atomically, or it fails a checksum when read, triggering an
 error.  If a sector is found to be corrupt, then media recovery can be
 used to restore the sector from the most recent backup.
-Figure~\ref{fig:todo} provides an example page, and a number of log
+To ensure that we correctly update all of the old bits, we simply
-entries that were applied to it.  Assume that the initial version of
+start rollback from a point in time that is know to be older than the
-the page, with LSN $0$, is on disk, and the disk is in the process of
+LSN of the page (which we don't know for sure).  For bits that are
-writing out the version with LSN $2$ when the system crashes.  When
+overwritten, we end up with the correct version, since we apply the
 updates in order.  For bits that are not overwritten, they must have
 been correct before and remain correct after recovery.  Since all
 operations performed by redo are blind updates, they can be applied
 regardless of whether the intial page was the correct version or even
 logically consistent.
 \eat{ Figure~\ref{fig:todo} provides an example page, and a number of
 log entries that were applied to it.  Assume that the initial version
 of the page, with LSN $0$, is on disk, and the disk is in the process
 of writing out the version with LSN $2$ when the system crashes.  When
 recovery reads the page from disk, it may encounter any combination of
 sectors from these two versions.
@ -987,20 +998,29 @@ Of course, we do not want to constrain log entries to update entire
 sectors at once.  In order to support finer-grained logging, we simply
 repeat the above argument on the byte or bit level.  Each bit is
 either overwritten by redo, or has a known, correct, value before
-redo.  Since all operations performed by redo are blind writes, they
+redo.
-can be applied regardless of whether the page is logically consistent.
+}
 Since LSN-free recovery only relies upon atomic updates at the bit
-level, it decouples page boundaries from atomicity and recovery.  
+level, it decouples page boundaries from atomicity and recovery.  This
-This allows operations to atomically manipulate
+allows operations to atomically manipulate (potentially
-(potentially non-contiguous) regions of arbitrary size by producing a
+non-contiguous) regions of arbitrary size by producing a single log
-single log entry.  If this log entry includes a logical undo function
+entry.  If this log entry includes a logical undo function (rather
-(rather than a physical undo), then it can serve the purpose of a
+than a physical undo), then it can serve the purpose of a nested top
-nested top action without incurring the extra log bandwidth of storing
+action without incurring the extra log bandwidth of storing physical
-physical undo information.  Such optimizations can be implemented
+undo information.  Such optimizations can be implemented using
-using conventional transactions, but they appear to be easier to
+conventional transactions, but they appear to be easier to implement
-implement and reason about when applied to LSN-free pages.
+and reason about when applied to LSN-free pages.
 \subsection{Summary}
 In this section, we explored some of the flexibility of \yad. This
 includes user-defined operations, any combination of steal and force on
 a per-transaction basis, flexible locking options, and a new class of
 transactions based on blind updates that enables better support for
 DMA, large objects, and multi-page operations.  In the next section,
 we show through experiments how this flexbility enables important
 optimizations and a wide-range of transactional systems.