cleaned up blobs.

2006-04-24 08:33:34 +00:00 · 2006-04-24 08:33:34 +00:00 · 3ee5a477d9
commit 3ee5a477d9
parent b207595229
1 changed files with 68 additions and 53 deletions
--- a/doc/paper3/LLADD.tex
+++ b/doc/paper3/LLADD.tex
@ -1130,14 +1130,14 @@ The reason it would be difficult to do this with Berkeley DB is that
 we still need to generate log entries as the object is being updated.
 Otherwise, commit would not be durable, unless we queued up log 
 entries, and wrote them all before committing.  
-committing.  This would cause Berekley DB to write data back to the
+  This would cause Berekley DB to write data back to the
 page file, increasing the working set of the program, and increasing
 disk activity.
 Furthermore, because objects may be written to disk in an
 order that differs from the order in which they were updated, we need
-to maintain multiple LSN's per page.  This means we need to register a
+to maintain multiple LSN's per page.  This means we would need to register a
-callback with the recovery routing to process the LSN's.  (A similar
+callback with the recovery routine to process the LSN's.  (A similar
 callback will be needed in Section~\ref{sec:zeroCopy}.)  Also, 
 we must prevent \yads storage routine from overwriting the per-object 
 LSN's of deleted objects that may still be addressed during abort or recovery.  
@ -1147,26 +1147,27 @@ further with the buffer pool by atomically updating the buffer
 manager's copy of all objects that share a given page, removing the 
 need for multiple LSN's per page, and simplifying storage allocation.
-However, the simplest solution to this problem is to observe that
+However, the simplest solution to this problem is based on the observation that
 updates (not allocations or deletions) to fixed length objects meet
-the requirements of the LSN free transactional update scheme, and that
+the requirements of an LSN free transactional update scheme, and that
 we may do away with per-object LSN's entirely.\endnote{\yad does not
  yet implement LSN-free pages.  In order to obtain performance
  numbers for object serialization, we made use of our LSN page
  implementation.  The runtime performance impact of LSN-free pages
  should be negligible.}  Allocation and deletion can then be handled
 as updates to normal LSN containing pages.  At recovery time, object
-updates are executed based on the existence of the object on the page,
+updates are executed based on the existence of the object on the page
 and a conservative estimate of its LSN.  (If the page doesn't contain
 the object during REDO, then it must have been written back to disk
 after the object was deleted.  Therefore, we do not need to apply the
-REDO.)
+REDO.)  This means that the system can ``forget'' about objects that 
 were freed by committed transaction, simplifying space reuse 
 tremendously.
-
+The third \yad plugin to \oasys incorporates all of these buffer
-The third \yad plugin to \oasys incorporates all of the optimizations
+manager optimizations.  However, it only write the changed portions of
-present in the second plugin, but arranges to only write the changed
+objects to the log.  Because of \yad's support for custom log entry
-portions of objects to the log.  Because of \yad's support for custom
+formats, this optimization is straightforward.
 log entry formats, this optimization is straightforward.
 In addition to the buffer pool optimizations, \yad provides several 
 options  to handle UNDO records in the context
@ -1200,18 +1201,17 @@ to ensure the correctness of this code is complex, the simplicity of
 the implementation is encouraging.
 In this experiment, Berkeley DB was configured as described above.  We
-ran MySQL using InnoDB for the table engine, as it is the fastest
+ran MySQL using InnoDB for the table engine.  For this benchmark, it
-engine that provides similar durability to \yad. For this test, we
+is the fastest engine that provides similar durability to \yad. We
-also linked directly with the libmysqld daemon library, bypassing the
+linked the benchmark's executable to the libmysqld daemon library,
-RPC layer. In experiments that used the RPC layer, test completion
+bypassing the RPC layer. In experiments that used the RPC layer, test
-times were orders of magnitude slower.
+completion times were orders of magnitude slower.
 Figure~\ref{fig:OASYS} presents the performance of the three
 \yad optimizations, and the \oasys plugins implemented on top of other
 systems.  As we can see, \yad performs better than the baseline
 systems, which is not surpising, since it is not providing the A
-property of ACID transactions.
+property of ACID transactions.  (Although it is applying each individual operation atomically.)
 In non-memory bound systems, the optimizations nearly double \yads
 performance by reducing the CPU overhead of object serialization and
@ -1245,66 +1245,62 @@ reordering is inexpensive.}
 \end{figure}
 Database optimizers operate over relational algebra expressions that
-will correspond to sequence of logical operations at runtime.  \yad
+correspond to perform logical operations over streams of data at runtime.  \yad
-does not support query languages, relational algebra, or other general
+does not provide query languages, relational algebra, or other such query processing primitives.  
 purpose primitves.
-However, it does include an extendible logging infrastructure, and any
+However, it does include an extensible logging infrastructure, and any
 operations that make user of physiological logging implicitly
 implement UNDO (and often REDO) functions that interpret logical
-operations.
+requests.
 Logical operations often have some nice properties that this section
 will exploit.  Because they can be invoked at arbitrary times in the
 future, they tend to be independent of the database's physical state.
-They tend to be inverses of operations that programmer's understand.
+Often, they correspond to operations that programmer's understand.
 If each method in the API exposed to the programmer is the inverse of
 some other method in the API, then each logical operation corresponds
 to a method the programmer can manually invoke.
 Because of this, application developers can easily determine whether
-logical operations may safely be reordered, transformed, or even
+logical operations may be reordered, transformed, or even
-dropped from the stream of requests that \yad is processing.  Even
+dropped from the stream of requests that \yad is processing.
-better, if requests can be partitioned in a natural way, load
+
-balancing can be implemented by spliting requests across many nodes.
+If requests can be partitioned in a natural way, load
 balancing can be implemented by splitting requests across many nodes.
 Similarly, a node can easily service streams of requests from multiple
 nodes by combining them into a single log, and processing the log
-using operaiton implementations.  Furthermore, application-specific
+using operaiton implementations.  
 Furthermore, application-specific
 procedures that are analagous to standard relational algebra methods
 (join, project and select) could be used to efficiently transform the data
 before it reaches the page file, while it is layed out sequentially
-in memory.
+in non-transactional memory.
 Note that read-only operations do not necessarily generate log
 entries.  Therefore, applications may need to implement custom
 operations to make use of the ideas in this section.
 Although \yad has rudimentary support for a two-phase commit based
-cluster hash table, we have not yet implemented a logical log based
+cluster hash table, we have not yet implemented networking primitives for logical logs.
-networking primitives.  Therefore, we implemented some of these ideas
+Therefore, we implemented a single node log reordering scheme that increases request locality
 in a single node configuration in order to increase request locality
 during the traversal of a random graph.  The graph traversal system
 takes a sequence of (read) requests, and partitions them using some
 function.  It then proceses each partition in isolation from the
-others.  We considered two partitioning functions.  The first, which
+others.  We considered two partitioning functions.  The first, partitions the
-is really only of interested in the distributed case, partitions the
+requests according to the hash of the node id they refer to, and would be useful for load balancing over a network.
-requests according to the hash of the node id they refer to.  This
+(We expect the early phases of such a traversal to be bandwidth, not
 would allow us to balance the graph traversal across many nodes.  (We
 expect the early phases of such a traversal to be bandwidth, not
 latency limited, as each node would stream large sequences of
 asynchronous requests to the other nodes.) 
-The second partitioning function, which was used to produce
+The second partitioning function, which was used in our benchmarks,
-Figure~\ref{hotset} partitions requests by their position in the page
+partitions requests by their position in the page
-file.  When the graph has good locality, a normal depth first search
+file.  We ran two experiments.  The first, presented in Figure~\ref{fig:oo7} is loosely based on the oo7 database benchmark.~\cite{oo7}.  The second explicitly measures the effect of graph locality on our optimization. (Figure~\ref{fig:hotGraph})  When the graph has good locality, a normal depth first search
-traversal and the prioritized traversal perform well.  As locality
+traversal and the prioritized traversal performs well.  As locality
 decreases, the partitioned traversal algorithm's performance degrades
 less than the naive traversal.
-**TODO This really needs more experimental setup... look at older draft!**
+\rcs{ This really needs more experimental setup... look at older draft! }
 \subsection{LSN-Free pages}
-
+\label{sec:zeroCopy}
 In Section~\ref{todo}, we describe how operations can avoid recording
 LSN's on the pages they modify.  Essentially, opeartions that make use
 of purely physical logging need not heed page boundaries, as
@ -1323,22 +1319,41 @@ this approach to a modern filesystem, which allows applications to
 perform a DMA copy of the data into memory, avoiding the expensive
 byte-by-byte copy of the data, and allowing the CPU to be used for
 more productive purposes.  Furthermore, modern operating systems allow
-network services to use DMA and ethernet adaptor hardware to read data
+network services to use DMA and network adaptor hardware to read data
 from disk, and send it over a network socket without passing it
 through the CPU.  Again, this frees the CPU, allowing it to perform
 other tasks.
-We beleive that LSN free pages will allow reads to make use of such
+We believe that LSN free pages will allow reads to make use of such
-optimizations in a straightforward fashion.  Zero copy writes could be
+optimizations in a straightforward fashion.  Zero copy writes are more challenging, but could be
 performed by performing a DMA write to a portion of the log file.
 However, doing this complicates log truncation, and does not address
 the problem of updating the page file.  We suspect that contributions
 from the log based filesystem literature can address these problems in
-a straightforward fashion.
+a straightforward fashion.  In particular, we imagine storing 
 portions of the log (the portion that stores the blob) in the 
 page file, or other addressable storage.  In the worst case, 
 the blob would have to be relocated in order to defragment the 
 storage.  Assuming the blob was relocated once, this would amount 
 to a total of three, mostly sequential disk operation.  (Two 
 writes and one read.)  A conventional blob system would need 
 to write the blob twice, but also may need to create complex 
 structures such as B-Trees, or may evict a large number of 
 unrelated pages from the buffer pool as the blob is being written 
 to disk.  
 Alternatively, we could use DMA to overwrite the blob to the page file
 in a non-atomic fashion, providing filesystem style semantics.
 (Existing database servers often provide this mode based on the
 observation that many blobs are static data that does not really need
 to be updated transactionally.~\cite{sqlServer}) Of course, \yad could
 also support other approaches to blob storage, such as B-Tree layouts
 that allow arbitrary insertions and deletions in the middle of
 objects~\cite{esm}.
 Finally, RVM, recoverable virtual memory, made use of LSN-free pages
 so that it could use mmap() to map portions of the page file into
-application memory.  However, without support for logical log entries
+application memory.\cite{rvm}  However, without support for logical log entries
 and nested top actions, it would be difficult to implement a
 concurrent, durable data structure using RVM.  We plan to add RVM
 style transactional memory to \yad in a way that is compatible with