Added log reordering, and zero-copy sections.

2006-04-23 05:06:16 +00:00 · 2006-04-23 05:06:16 +00:00 · c5bbe0af3b
commit c5bbe0af3b
parent 3b5508a03a
1 changed files with 97 additions and 5 deletions
--- a/doc/paper3/LLADD.tex
+++ b/doc/paper3/LLADD.tex
@ -896,15 +896,107 @@ optimizations nearly double \yad's performance, and we see that in the
 memory-bound setup, update/flush indeed improves memory utilization.


-\subsection{Graph traversal}
+\subsection{Manipulation of logical log entries}

+Database optimizers operate over relational algebra expressions that
+will correspond to sequence of logical operations at runtime.  \yad
+does not support query languages, relational algebra, or other general
+purpose primitves.

+However, it does include an extendible logging infrastructure, and any
+operations that make user of physiological logging implicitly
+implement UNDO (and often REDO) functions that interpret logical
+operations.
+
+Logical operations often have some nice properties that this section
+will exploit.  Because they can be invoked at arbitrary times in the
+future, they tend to be independent of the database's physical state.
+They tend to be inverses of operations that programmer's understand.
+If each method in the API exposed to the programmer is the inverse of
+some other method in the API, then each logical operation corresponds
+to a method the programmer can manually invoke.
+
+Because of this, application developers can easily determine whether
+logical operations may safely be reordered, transformed, or even
+dropped from the stream of requests that \yad is processing.  Even
+better, if requests can be partitioned in a natural way, load
+balancing can be implemented by spliting requests across many nodes.
+Similarly, a node can easily service streams of requests from multiple
+nodes by combining them into a single log, and processing the log
+using operaiton implementations.  Furthermore, application-specific
+procedures that are analagous to standard relational algebra methods
+(join, project and select) could be used to efficiently transform the data
+before it reaches the page file, while it is layed out sequentially
+in memory.
+
+Note that read-only operations do not necessarily generate log
+entries.  Therefore, applications may need to implement custom
+operations to make use of the ideas in this section.
+
+Although \yad has rudimentary support for a two-phase commit based
+cluster hash table, we have not yet implemented a logical log based
+networking primitives.  Therefore, we implemented some of these ideas
+in a single node configuration in order to increase request locality
+during the traversal of a random graph.  The graph traversal system
+takes a sequence of (read) requests, and partitions them using some
+function.  It then proceses each partition in isolation from the
+others.  We considered two partitioning functions.  The first, which
+is really only of interested in the distributed case, partitions the
+requests according to the hash of the node id they refer to.  This
+would allow us to balance the graph traversal across many nodes.  (We
+expect the early phases of such a traversal to be bandwidth, not
+latency limited, as each node would stream large sequences of
+asynchronous requests to the other nodes.) 
+
+The second partitioning function, which was used to produce
+Figure~\ref{hotset} partitions requests by their position in the page
+file.  When the graph has good locality, a normal depth first search
+traversal and the prioritized traversal perform well.  As locality
+decreases, the partitioned traversal algorithm's performance degrades
+less than the naive traversal.
+
+**TODO This really needs more experimental setup... look at older draft!**

-\subsection{Request reordering for locality}
-Compare to DB optimizer.  (Reordering can happen later than DB optimizer's reordering..)
 \subsection{LSN-Free pages}
-\subsection{Blobs: File system based and zero-copy}
-\subsection{Recoverable Virtual Memory}
+
+In Section~\ref{todo}, we describe how operations can avoid recording
+LSN's on the pages they modify.  Essentially, opeartions that make use
+of purely physical logging need not heed page boundaries, as
+physiological operations must.  Recall that purely physical logging
+interacts poorly with concurrent transactions that modify the same
+data structures or pages, so LSN-Free pages are not applicable in all
+situations.
+
+Consider the retreival of a large (page spanning) object stored on
+pages that contain LSN's.  The object's data will not be contiguous.
+Therefore, in order to retrive the object, the transaction system must
+load the pages contained on disk into memory, allocate buffer space to
+allow the object to be read, and perform a byte-by-byte copy of the
+portions of the pages that contain the large object's data.  Compare
+this approach to a modern filesystem, which allows applications to
+perform a DMA copy of the data into memory, avoiding the expensive
+byte-by-byte copy of the data, and allowing the CPU to be used for
+more productive purposes.  Furthermore, modern operating systems allow
+network services to use DMA and ethernet adaptor hardware to read data
+from disk, and send it over a network socket without passing it
+through the CPU.  Again, this frees the CPU, allowing it to perform
+other tasks.
+
+We beleive that LSN free pages will allow reads to make use of such
+optimizations in a straightforward fashion.  Zero copy writes could be
+performed by performing a DMA write to a portion of the log file.
+However, doing this complicates log truncation, and does not address
+the problem of updating the page file.  We suspect that contributions
+from the log based filesystem literature can address these problems in
+a straightforward fashion.
+
+Finally, RVM, recoverable virtual memory, made use of LSN-free pages
+so that it could use mmap() to map portions of the page file into
+application memory.  However, without support for logical log entries
+and nested top actions, it would be difficult to implement a
+concurrent, durable data structure using RVM.  We plan to add RVM
+style transactional memory to \yad in a way that is compatible with
+fully concurrent collections such as hash tables and tree structures.

 \section{Conclusion}