updated arraylist
This commit is contained in:
parent
eca4fc1cac
commit
823af05adc
2 changed files with 35 additions and 47 deletions
Binary file not shown.
|
@ -1399,63 +1399,51 @@ hash table in order to this emphasize that it is easy to implement
|
||||||
high-performance transactional data structures with \yad and because
|
high-performance transactional data structures with \yad and because
|
||||||
it is easy to understand.
|
it is easy to understand.
|
||||||
|
|
||||||
We decided to implement a {\em linear} hash table~\cite{lht}. Linear hash tables are
|
We decided to implement a {\em linear} hash table~\cite{lht}. Linear
|
||||||
hash tables that are able to extend their bucket list incrementally at
|
hash tables are hash tables that are able to extend their bucket list
|
||||||
runtime. They work as follows. Imagine that we want to double the size
|
incrementally at runtime. They work as follows. Imagine that we want
|
||||||
of a hash table of size $2^{n}$ and that the hash table has been
|
to double the size of a hash table of size $2^{n}$ and that the hash
|
||||||
constructed with some hash function $h_{n}(x)=h(x)\, mod\,2^{n}$.
|
table has been constructed with some hash function $h_{n}(x)=h(x)\,
|
||||||
Choose $h_{n+1}(x)=h(x)\, mod\,2^{n+1}$ as the hash function for the
|
mod\,2^{n}$. Choose $h_{n+1}(x)=h(x)\, mod\,2^{n+1}$ as the hash
|
||||||
new table. Conceptually, we are simply prepending a random bit to the
|
function for the new table. Conceptually, we are simply prepending a
|
||||||
old value of the hash function, so all lower order bits remain the
|
random bit to the old value of the hash function, so all lower order
|
||||||
same. At this point, we could simply block all concurrent access and
|
bits remain the same. At this point, we could simply block all
|
||||||
iterate over the entire hash table, reinserting values according to
|
concurrent access and iterate over the entire hash table, reinserting
|
||||||
the new hash function.
|
values according to the new hash function.
|
||||||
|
|
||||||
However,
|
However,
|
||||||
%because of the way we chose $h_{n+1}(x),$
|
%because of the way we chose $h_{n+1}(x),$
|
||||||
we know that the
|
we know that the contents of each bucket, $m$, will be split between
|
||||||
contents of each bucket, $m$, will be split between bucket $m$ and
|
bucket $m$ and bucket $m+2^{n}$. Therefore, if we keep track of the
|
||||||
bucket $m+2^{n}$. Therefore, if we keep track of the last bucket that
|
last bucket that was split then we can split a few buckets at a time,
|
||||||
was split then we can split a few buckets at a time, resizing the hash
|
resizing the hash table without introducing long pauses~\cite{lht}.
|
||||||
table without introducing long pauses~\cite{lht}.
|
|
||||||
|
|
||||||
In order to implement this scheme we need two building blocks. We
|
In order to implement this scheme we need two building blocks. We
|
||||||
need a data structure that can handle bucket overflow, and we need to
|
need a map from bucket number to bucket contents (lists), and we need to handle bucket overflow.
|
||||||
be able index into an expandable set of buckets using the bucket
|
|
||||||
number.
|
|
||||||
|
|
||||||
\subsection{The Bucket List}
|
\subsection{The Bucket Map}
|
||||||
|
|
||||||
%\rcs{This seems overly complicated to me...}
|
The simplest bucket map would simply use a fixed-size transactional
|
||||||
|
array. However, since we want the size of the table to grow, we should
|
||||||
|
not assume that it fits in a contiguous range of pages. Insteed, we build
|
||||||
|
on top of \yad's transactional ArrayList data structure (inspired by
|
||||||
|
Java's structure of the same name).
|
||||||
|
|
||||||
\yad provides access to transactional storage with page-level
|
The ArrayList provides the appearance of large growable array by
|
||||||
granularity and stores all record information in the same page file.
|
breaking the array into a tuple of contiguous page intervals that
|
||||||
Therefore, our bucket list must be partitioned into page-size chunks,
|
partition the array. Since we expect relatively few partitions (one
|
||||||
and we cannot assume that the entire bucket list is contiguous.
|
per enlargement typically), this leads to an efficient map. We use a
|
||||||
We need some level of indirection to allow us to map from
|
single ``header'' page to store the list of intervals and their sizes.
|
||||||
bucket number to the record that stores the corresponding bucket.
|
|
||||||
|
|
||||||
\yad's allocation routines allow applications to reserve regions of
|
%We use fixed-sized buckets, which allows us to treat a region of pages
|
||||||
contiguous pages. We use this functionality to allocate the bucket
|
% as an array of buckets.
|
||||||
list in sufficiently large chunks, bounding the number of distinct
|
For space efficiency, the array elements themselves are stored using
|
||||||
contiguous regions. Borrowing from Java's ArrayList structure, we
|
the fixed-size record page layout. Thus, we use the header page to
|
||||||
initially allocate a fixed number of pages to store buckets and
|
find the right interval, and then index into it to get the $(page,
|
||||||
allocate more pages as necessary, doubling the allocation each
|
slot)$ address. Once we have this address, the redo/undo entries are
|
||||||
time. We use a single ``header'' page to store the list of regions and
|
trivial: they simply log the before and after image of the that
|
||||||
their sizes.
|
record.
|
||||||
|
|
||||||
We use fixed-sized buckets, which allows us to treat a region of pages
|
|
||||||
as an array of buckets. For space efficiency, the buckets are stored
|
|
||||||
using the fixed-size record page layout. Thus, we use the
|
|
||||||
header page to find the right region, and then index into it, to get
|
|
||||||
the $(page, slot)$ address. Once we have this address, the redo/undo
|
|
||||||
entries are trivial: they simply log the before and after image of the
|
|
||||||
appropriate record.
|
|
||||||
|
|
||||||
|
|
||||||
%Since we double the amount of space allocated at each step, we arrange
|
|
||||||
%to run out of addressable space before the lookup table that we need
|
|
||||||
%runs out of space.
|
|
||||||
|
|
||||||
%\rcs{This paragraph doesn't really belong}
|
%\rcs{This paragraph doesn't really belong}
|
||||||
%Normal \yad slotted pages are not without overhead. Each record has
|
%Normal \yad slotted pages are not without overhead. Each record has
|
||||||
|
|
Loading…
Reference in a new issue