updated arraylist

2005-03-25 22:57:53 +00:00 · 2005-03-25 22:57:53 +00:00 · 823af05adc
commit 823af05adc
parent eca4fc1cac
2 changed files with 35 additions and 47 deletions
--- a/doc/paper2/LHT2.pdf
+++ b/doc/paper2/LHT2.pdf
--- a/doc/paper2/LLADD.tex
+++ b/doc/paper2/LLADD.tex
@ -1399,63 +1399,51 @@ hash table in order to this emphasize that it is easy to implement
 high-performance transactional data structures with \yad and because
 it is easy to understand.
-We decided to implement a {\em linear} hash table~\cite{lht}.  Linear hash tables are
+We decided to implement a {\em linear} hash table~\cite{lht}.  Linear
-hash tables that are able to extend their bucket list incrementally at
+hash tables are hash tables that are able to extend their bucket list
-runtime. They work as follows. Imagine that we want to double the size
+incrementally at runtime. They work as follows. Imagine that we want
-of a hash table of size $2^{n}$ and that the hash table has been
+to double the size of a hash table of size $2^{n}$ and that the hash
-constructed with some hash function $h_{n}(x)=h(x)\, mod\,2^{n}$.
+table has been constructed with some hash function $h_{n}(x)=h(x)\,
-Choose $h_{n+1}(x)=h(x)\, mod\,2^{n+1}$ as the hash function for the
+mod\,2^{n}$.  Choose $h_{n+1}(x)=h(x)\, mod\,2^{n+1}$ as the hash
-new table. Conceptually, we are simply prepending a random bit to the
+function for the new table. Conceptually, we are simply prepending a
-old value of the hash function, so all lower order bits remain the
+random bit to the old value of the hash function, so all lower order
-same. At this point, we could simply block all concurrent access and
+bits remain the same. At this point, we could simply block all
-iterate over the entire hash table, reinserting values according to
+concurrent access and iterate over the entire hash table, reinserting
-the new hash function.
+values according to the new hash function.
 However, 
 %because of the way we chose $h_{n+1}(x),$ 
-we know that the
+we know that the contents of each bucket, $m$, will be split between
-contents of each bucket, $m$, will be split between bucket $m$ and
+bucket $m$ and bucket $m+2^{n}$. Therefore, if we keep track of the
-bucket $m+2^{n}$. Therefore, if we keep track of the last bucket that
+last bucket that was split then we can split a few buckets at a time,
-was split then we can split a few buckets at a time, resizing the hash
+resizing the hash table without introducing long pauses~\cite{lht}.
 table without introducing long pauses~\cite{lht}. 
 In order to implement this scheme we need two building blocks.  We
-need a data structure that can handle bucket overflow, and we need to
+need a map from bucket number to bucket contents (lists), and we need to handle bucket overflow.
 be able index into an expandable set of buckets using the bucket
 number.
-\subsection{The Bucket List}
+\subsection{The Bucket Map}
-%\rcs{This seems overly complicated to me...}
+The simplest bucket map would simply use a fixed-size transactional
 array. However, since we want the size of the table to grow, we should
 not assume that it fits in a contiguous range of pages. Insteed, we build
 on top of \yad's transactional ArrayList data structure (inspired by
 Java's structure of the same name).
-\yad provides access to transactional storage with page-level
+The ArrayList provides the appearance of large growable array by
-granularity and stores all record information in the same page file.
+breaking the array into a tuple of contiguous page intervals that
-Therefore, our bucket list must be partitioned into page-size chunks,
+partition the array.  Since we expect relatively few partitions (one
-and we cannot assume that the entire bucket list is contiguous.
+per enlargement typically), this leads to an efficient map. We use a
-We need some level of indirection to allow us to map from
+single ``header'' page to store the list of intervals and their sizes.
 bucket number to the record that stores the corresponding bucket.
-\yad's allocation routines allow applications to reserve regions of
+%We use fixed-sized buckets, which allows us to treat a region of pages
-contiguous pages.  We use this functionality to allocate the bucket
+% as an array of buckets. 
-list in sufficiently large chunks, bounding the number of distinct
+For space efficiency, the array elements themselves are stored using
-contiguous regions.  Borrowing from Java's ArrayList structure, we
+the fixed-size record page layout. Thus, we use the header page to
-initially allocate a fixed number of pages to store buckets and
+find the right interval, and then index into it to get the $(page,
-allocate more pages as necessary, doubling the allocation each
+slot)$ address.  Once we have this address, the redo/undo entries are
-time. We use a single ``header'' page to store the list of regions and
+trivial: they simply log the before and after image of the that
-their sizes.
+record.
 We use fixed-sized buckets, which allows us to treat a region of pages
 as an array of buckets.  For space efficiency, the buckets are stored 
 using the fixed-size record page layout. Thus, we use the
 header page to find the right region, and then index into it, to get
 the $(page, slot)$ address.  Once we have this address, the redo/undo
 entries are trivial: they simply log the before and after image of the
 appropriate record.
 %Since we double the amount of space allocated at each step, we arrange
 %to run out of addressable space before the lookup table that we need
 %runs out of space. 
 %\rcs{This paragraph doesn't really belong}
 %Normal \yad slotted pages are not without overhead.  Each record has