Added log reordering, and zero-copy sections.

2006-04-23 05:06:16 +00:00 · 2006-04-23 05:06:16 +00:00 · c5bbe0af3b
commit c5bbe0af3b
parent 3b5508a03a
1 changed files with 97 additions and 5 deletions
--- a/doc/paper3/LLADD.tex
+++ b/doc/paper3/LLADD.tex
@ -896,15 +896,107 @@ optimizations nearly double \yad's performance, and we see that in the
 memory-bound setup, update/flush indeed improves memory utilization.
-\subsection{Graph traversal}
+\subsection{Manipulation of logical log entries}
 Database optimizers operate over relational algebra expressions that
 will correspond to sequence of logical operations at runtime.  \yad
 does not support query languages, relational algebra, or other general
 purpose primitves.
 However, it does include an extendible logging infrastructure, and any
 operations that make user of physiological logging implicitly
 implement UNDO (and often REDO) functions that interpret logical
 operations.
 Logical operations often have some nice properties that this section
 will exploit.  Because they can be invoked at arbitrary times in the
 future, they tend to be independent of the database's physical state.
 They tend to be inverses of operations that programmer's understand.
 If each method in the API exposed to the programmer is the inverse of
 some other method in the API, then each logical operation corresponds
 to a method the programmer can manually invoke.
 Because of this, application developers can easily determine whether
 logical operations may safely be reordered, transformed, or even
 dropped from the stream of requests that \yad is processing.  Even
 better, if requests can be partitioned in a natural way, load
 balancing can be implemented by spliting requests across many nodes.
 Similarly, a node can easily service streams of requests from multiple
 nodes by combining them into a single log, and processing the log
 using operaiton implementations.  Furthermore, application-specific
 procedures that are analagous to standard relational algebra methods
 (join, project and select) could be used to efficiently transform the data
 before it reaches the page file, while it is layed out sequentially
 in memory.
 Note that read-only operations do not necessarily generate log
 entries.  Therefore, applications may need to implement custom
 operations to make use of the ideas in this section.
 Although \yad has rudimentary support for a two-phase commit based
 cluster hash table, we have not yet implemented a logical log based
 networking primitives.  Therefore, we implemented some of these ideas
 in a single node configuration in order to increase request locality
 during the traversal of a random graph.  The graph traversal system
 takes a sequence of (read) requests, and partitions them using some
 function.  It then proceses each partition in isolation from the
 others.  We considered two partitioning functions.  The first, which
 is really only of interested in the distributed case, partitions the
 requests according to the hash of the node id they refer to.  This
 would allow us to balance the graph traversal across many nodes.  (We
 expect the early phases of such a traversal to be bandwidth, not
 latency limited, as each node would stream large sequences of
 asynchronous requests to the other nodes.) 
 The second partitioning function, which was used to produce
 Figure~\ref{hotset} partitions requests by their position in the page
 file.  When the graph has good locality, a normal depth first search
 traversal and the prioritized traversal perform well.  As locality
 decreases, the partitioned traversal algorithm's performance degrades
 less than the naive traversal.
 **TODO This really needs more experimental setup... look at older draft!**
 \subsection{Request reordering for locality}
 Compare to DB optimizer.  (Reordering can happen later than DB optimizer's reordering..)
 \subsection{LSN-Free pages}
-\subsection{Blobs: File system based and zero-copy}
+
-\subsection{Recoverable Virtual Memory}
+In Section~\ref{todo}, we describe how operations can avoid recording
 LSN's on the pages they modify.  Essentially, opeartions that make use
 of purely physical logging need not heed page boundaries, as
 physiological operations must.  Recall that purely physical logging
 interacts poorly with concurrent transactions that modify the same
 data structures or pages, so LSN-Free pages are not applicable in all
 situations.
 Consider the retreival of a large (page spanning) object stored on
 pages that contain LSN's.  The object's data will not be contiguous.
 Therefore, in order to retrive the object, the transaction system must
 load the pages contained on disk into memory, allocate buffer space to
 allow the object to be read, and perform a byte-by-byte copy of the
 portions of the pages that contain the large object's data.  Compare
 this approach to a modern filesystem, which allows applications to
 perform a DMA copy of the data into memory, avoiding the expensive
 byte-by-byte copy of the data, and allowing the CPU to be used for
 more productive purposes.  Furthermore, modern operating systems allow
 network services to use DMA and ethernet adaptor hardware to read data
 from disk, and send it over a network socket without passing it
 through the CPU.  Again, this frees the CPU, allowing it to perform
 other tasks.
 We beleive that LSN free pages will allow reads to make use of such
 optimizations in a straightforward fashion.  Zero copy writes could be
 performed by performing a DMA write to a portion of the log file.
 However, doing this complicates log truncation, and does not address
 the problem of updating the page file.  We suspect that contributions
 from the log based filesystem literature can address these problems in
 a straightforward fashion.
 Finally, RVM, recoverable virtual memory, made use of LSN-free pages
 so that it could use mmap() to map portions of the page file into
 application memory.  However, without support for logical log entries
 and nested top actions, it would be difficult to implement a
 concurrent, durable data structure using RVM.  We plan to add RVM
 style transactional memory to \yad in a way that is compatible with
 fully concurrent collections such as hash tables and tree structures.
 \section{Conclusion}