graph-fig

2005-03-25 18:51:47 +00:00 · 2005-03-25 18:51:47 +00:00 · 43cb4c6133
commit 43cb4c6133
parent 6b18f55ed8
2 changed files with 67 additions and 89 deletions
--- a/doc/paper2/LLADD.tex
+++ b/doc/paper2/LLADD.tex
@ -1394,104 +1394,84 @@ number.
 \subsection{The Bucket List}
-\rcs{This seems overly complicated to me...}
+%\rcs{This seems overly complicated to me...}
 \yad provides access to transactional storage with page-level
 granularity and stores all record information in the same page file.
-Therefore, our bucket list must be partitioned into page size chunks,
+Therefore, our bucket list must be partitioned into page-size chunks,
-and (since other data structures may concurrently use the page file)
+and we cannot assume that the entire bucket list is contiguous.
 we cannot assume that the entire bucket list is contiguous.
 Therefore, we need some level of indirection to allow us to map from
 bucket number to the record that stores the corresponding bucket.
 \yad's allocation routines allow applications to reserve regions of
 contiguous pages.  Therefore, if we are willing to allocate the bucket
-list in sufficiently large chunks, we can limit the number of such
+list in sufficiently large chunks, we can limit the number of distinct
-contiguous regions that we will require.  Borrowing from Java's
+contiguous regions.  Borrowing from Java's ArrayList structure, we
-ArrayList structure, we initially allocate a fixed number of pages to
+initially allocate a fixed number of pages to store buckets and
-store buckets and allocate more pages as necessary, doubling the
+allocate more pages as necessary, doubling the allocation each
-number allocated each time.  
+time. We use a single ``header'' page to store the list of regions and
 their sizes.
 We use fixed-sized buckets, so given we can treat a region as an array
 of buckets using the fixed-size record page layout. Thus, we use the
 header page to find the right region, and then index into it, to get
 the $(page, slot)$ address.  Once we have this address, the redo/undo
 entries are trivial: they simply log the before and after image of the
 appropriate record.
 We allocate a fixed amount of storage for each bucket, so we know how
 many buckets will fit in each of these pages.  Therefore, in order to
 look up an aribtrary bucket we simply need to calculate which chunk
 of allocated pages will contain the bucket and then caculate the offset the
 appropriate page within that group of allocated pages.  
 %Since we double the amount of space allocated at each step, we arrange
 %to run out of addressable space before the lookup table that we need
 %runs out of space. 
-\rcs{This parapgraph doesn't really belong}
+%\rcs{This parapgraph doesn't really belong}
-Normal \yad slotted pages are not without overhead.  Each record has
+%Normal \yad slotted pages are not without overhead.  Each record has
-an assoiciated size field, and an offset pointer that points to a
+%an assoiciated size field, and an offset pointer that points to a
-location within the page.  Throughout our bucket list implementation,
+%location within the page.  Throughout our bucket list implementation,
-we only deal with fixed-length slots.  Since \yad supports multiple
+%we only deal with fixed-length slots.  Since \yad supports multiple
-page layouts, we use the ``Fixed Page'' layout, which implements a
+%page layouts, we use the ``Fixed Page'' layout, which implements a
-page consisting on an array of fixed-length records.  Each bucket thus
+%page consisting on an array of fixed-length records.  Each bucket thus
-maps directly to one record, and it is trivial to map bucket numbers
+%maps directly to one record, and it is trivial to map bucket numbers
-to record numbers within a page.  
+%to record numbers within a page.  
-\yad provides a call that allocates a contiguous range of pages.  We
+%\yad provides a call that allocates a contiguous range of pages.  We
-use this method to allocate increasingly larger regions of pages as
+%use this method to allocate increasingly larger regions of pages as
-the array list expands, and store the regions' offsets in a single 
+%the array list expands, and store the regions' offsets in a single 
-page header.  
+%page header.  
-When we need to access a record, we first calculate 
+%When we need to access a record, we first calculate 
-which region the record is in, and use the header page to determine 
+%which region the record is in, and use the header page to determine 
-its offset.  We can do this because the size of each region is 
+%its offset.  We can do this because the size of each region is 
-deterministic; it is simply $size_{first~region} * 2^{region~number}$.  
+%deterministic; it is simply $size_{first~region} * 2^{region~number}$.  
-We then calculate the $(page,slot)$ offset within that region.  
+%We then calculate the $(page,slot)$ offset within that region.  
 \yad 
 allows us to reference records by using a $(page,slot,size)$ triple, 
 which we call a {\em recordid}, and we already know the size of the 
 record.  Once we have the recordid, the redo/undo entries are trivial.  
 They simply log the before and after image of the appropriate record, 
 and are provided by the Fixed Page interface.
 %In fact, this is essentially identical to the transactional array
 %implementation, so we can just use that directly: a range of
 %contiguous pages is treated as a large array of buckets. The linear
 %hash table is thus a tuple of such arrays that map ranges of IDs to
 %each array.  For a table split into $m$ arrays, we thus get $O(lg m)$
 %in-memory operations to find the right array, followed by an $O(1)$
 %array lookup.  The redo/undo functions for the array are trivial: they
 %just log the before or after image of the specific record.
 %
 %\eab{should we cover transactional arrays somewhere?}
 %% The ArrayList page handling code overrides the recordid ``slot'' field
 %% to refer to a logical offset within the ArrayList.  Therefore,
 %% ArrayList provides an interface that can be used as though it were
 %% backed by an infinitely large page that contains fixed length records.
 %% This seems to be generally useful, so the ArrayList implementation may
 %% be used independently of the hashtable.
 %For brevity we do not include a description of how the ArrayList
 %operations are logged and implemented.
 \subsection{Bucket Overflow}
 \eab{don't get this section, and it sounds really complicated, which is counterproductive at this point  -- Is this better now? -- Rusty}
 \eab{some basic questions: 1) does the record described above contain
 key/value pairs or a pointer to a linked list?  Ideally it would be
 one bucket with a next pointer at the end... 2) what about values that
 are bigger than one bucket?, 3) add caption to figure.}
 \begin{figure}
 \includegraphics[width=3.25in]{LHT2.pdf}
 \caption{\label{fig:LHT}Structure of linked lists...}
 \end{figure}
 For simplicity, our buckets are fixed length.  In order to support
-variable length entries we store the keys and values
+variable-length entries we store the keys and values
 in linked lists, and represent each list as a list of 
 smaller lists.  The first list links pages together, and the smaller
-lists reside within a single page. (Figure~\ref{fig:LHT})
+lists reside within a single page (Figure~\ref{fig:LHT}).
 All of the entries within a single page may be traversed without
 unpinning and repinning the page in memory, providing very fast
 traversal over lists that have good locality.  This optimization would
 not be possible if it were not for the low level interfaces provided
 by the buffer manager.  In particular, we need to be able to specify
-which page we would like to allocate space on, and need to be able to
+which page on which to allocate space, and need to be able to
 read and write multiple records with a single call to pin/unpin.  Due to
 this data structure's nice locality properties, and good performance
 for short lists, it can also be used on its own.
@ -1515,7 +1495,7 @@ described in that section and lock the entire hashtable for each
 operation.  This prevents the hashtable implementation from fully
 exploiting multiprocessor systems,\footnote{\yad passes regression 
 tests on multiprocessor systems.}  but seems to be adequate on single 
-processor machines. (Figure~\ref{fig:TPS}) 
+processor machines (Figure~\ref{fig:TPS}).
 We describe a finer grained concurrency mechanism below.
 %We have found a simple recipe for converting a non-concurrent data structure into a concurrent one, which involves three steps:
@ -1585,16 +1565,16 @@ straightforward.  The only complications are a) defining a logical undo, and b)
 %Next we describe some additional optimizations and evaluate the
 %performance of our implementations.
-\subsection{The optimized hashtable}
+\subsection{The Optimized Hashtable}
-Our optimized hashtable implementation is optimized for log
+Our optimized hashtable implementation is optimized for log bandwidth,
-bandwidth, only stores fixed-length entries, and does not obey normal
+only stores fixed-length entries, and exploits a more aggresive
-recovery semantics.  
+version of nested top actions.
 Instead of using nested top actions, the optimized implementation
 applies updates in a carefully chosen order that minimizes the extent
-to which the on disk representation of the hash table could be
+to which the on disk representation of the hash table can be
-corrupted.  (Figure~\ref{linkedList}) Before beginning updates, it
+corrupted (Figure~\ref{linkedList}). Before beginning updates, it
 writes an undo entry that will check and restore the consistency of
 the hashtable during recovery, and then invokes the inverse of the
 operation that needs to be undone.  This recovery scheme does not
@ -1602,20 +1582,18 @@ require record-level undo information.  Therefore, pre-images of
 records do not need to be written to log, saving log bandwidth and
 enhancing performance.
-Also, since this implementation does not need to support variable size
+Also, since this implementation does not need to support variable-size
 entries, it stores the first entry of each bucket in the ArrayList
 that represents the bucket list, reducing the number of buffer manager
 calls that must be made.  Finally, this implementation caches
-information about hashtables in memory so that it does not have to 
+the header information in memory, rather than getting it from the buffer manager on each request.
 obtain a copy of hashtable
 header information from the buffer mananger for each request.
 The most important component of \yad for this optimization is \yad's
 flexible recovery and logging scheme.  For brevity we only mention
-that this hashtable implementation uses bucket granularity latching, 
+that this hashtable implementation uses bucket-granularity latching;
-but we do not describe how this was implemented.  Finer grained
+fine-grain latching is relatively easy in this case since all
-latching is relatively easy in this case since all operations only 
+operations only affect a few buckets, and buckets have a natural
-affect a few buckets, and buckets have a natural ordering.
+ordering.
 \begin{figure*}
 \includegraphics[%
@ -1649,16 +1627,16 @@ library.  For comparison, we also provide throughput for many different
 \yad operations, BerkeleyDB's DB\_HASH hashtable implementation,
 and lower level DB\_RECNO record number based interface.  
-Both of \yad's hashtable implementations perform well, but the complex
+Both of \yad's hashtable implementations perform well, but the
 optimized implementation is clearly faster.  This is not surprising as
 it issues fewer buffer manager requests and writes fewer log entries
 than the straightforward implementation.
-We see that \yad's other operation implementations also perform well
+\eab{missing} We see that \yad's other operation implementations also
-in this test.  The page-oriented list implementation is geared toward
+perform well in this test.  The page-oriented list implementation is
-preserving the locality of short lists, and we see that it has
+geared toward preserving the locality of short lists, and we see that
-quadratic performance in this test.  This is because the list is
+it has quadratic performance in this test.  This is because the list
-traversed each time a new page must be allocated.  
+is traversed each time a new page must be allocated.
 %Note that page allocation is relatively infrequent since many entries
 %will typically fit on the same page.  In the case of our linear
@ -1671,13 +1649,13 @@ traversed each time a new page must be allocated.
 Since the linear hash table bounds the length of these lists, the 
 performance of the list when only contains one or two elements is
-much more important than asymptotic behavior. In a seperate experiment
+much more important than asymptotic behavior. In a separate experiment
 not presented here, we compared the
 implementation of the page-oriented linked list to \yad's conventional
 linked-list implementation.  Although the conventional implementation
 performs better when bulk loading large amounts of data into a single
 list, we have found that a hashtable built with the page-oriented list
-outperforms an otherwise equivalent hashtable implementation that uses 
+outperforms one built with 
 conventional linked lists.
@ -1693,7 +1671,7 @@ The second test (Figure~\ref{fig:TPS}) measures the two libraries' ability to ex
 concurrent transactions to reduce logging overhead.  Both systems
 can service concurrent calls to commit with a single 
 synchronous I/O. Because different approaches to this 
-optimization make sense under different circumstances,~\cite{findWorkOnThisOrRemoveTheSentence} this may 
+optimization make sense under different circumstances~\cite{findWorkOnThisOrRemoveTheSentence}, this may 
 be another aspect of transactional storage systems where
 application control over a transactional storage policy is desirable.  
@ -1727,7 +1705,7 @@ general purpose structures when applied to an appropriate application.
 This finding suggests that it is appropriate for
 application developers to consider the development of custom
-transactional storage mechanisms if application performance is
+transactional storage mechanisms when application performance is
 important.
 \begin{figure*}
@ -1762,8 +1740,8 @@ serialization is also a convenient way of adding persistent storage to
 an existing application without developing an explicit file format or
 dealing with low-level I/O interfaces.
-A simple object serialization scheme would bulk-write and bulk-read
+A simple serialization scheme would bulk-write and bulk-read
-sets of application objects to an operating system file.  These
+sets of application objects to an OS file.  These
 schemes suffer from high read and write latency, and do not handle
 small updates well.  More sophisticated schemes store each object in a
 seperate, randomly accessible record, such as a database tuple or
--- a/doc/paper2/graph-traversal.pdf
+++ b/doc/paper2/graph-traversal.pdf