Wrote linear hash table section, cleaned up "experiemental setup"
This commit is contained in:
parent
0b643dd34d
commit
91889bfdad
1 changed files with 384 additions and 110 deletions
|
@ -1012,40 +1012,40 @@ most strongly differentiates \yad from other, similar libraries.
|
|||
an application that frequently update small ranges within blobs, for
|
||||
example.}
|
||||
|
||||
\subsection{Array List}
|
||||
%\subsection{Array List}
|
||||
% Example of how to avoid nested top actions
|
||||
\subsection{Linked Lists}
|
||||
%\subsection{Linked Lists}
|
||||
% Example of two different page allocation strategies.
|
||||
% Explain how to implement linked lists w/out NTA's (even though we didn't do that)?
|
||||
|
||||
\subsection{Linear Hash Table\label{sub:Linear-Hash-Table}}
|
||||
% The implementation has changed too much to directly reuse old section, other than description of linear hash tables:
|
||||
|
||||
Linear hash tables are hash tables that are able to extend their bucket
|
||||
list incrementally at runtime. They work as follows. Imagine that
|
||||
we want to double the size of a hash table of size $2^{n}$, and that
|
||||
the hash table has been constructed with some hash function $h_{n}(x)=h(x)\, mod\,2^{n}$.
|
||||
Choose $h_{n+1}(x)=h(x)\, mod\,2^{n+1}$ as the hash function for
|
||||
the new table. Conceptually we are simply prepending a random bit
|
||||
to the old value of the hash function, so all lower order bits remain
|
||||
the same. At this point, we could simply block all concurrent access
|
||||
and iterate over the entire hash table, reinserting values according
|
||||
to the new hash function.
|
||||
|
||||
However, because of the way we chose $h_{n+1}(x),$ we know that the
|
||||
contents of each bucket, $m$, will be split between bucket $m$ and
|
||||
bucket $m+2^{n}$. Therefore, if we keep track of the last bucket that
|
||||
was split, we can split a few buckets at a time, resizing the hash
|
||||
table without introducing long pauses while we reorganize the hash
|
||||
table~\cite{lht}.
|
||||
|
||||
We can handle overflow using standard techniques;
|
||||
\yad's linear hash table simply uses the linked list implementations
|
||||
described above. The bucket list is implemented by reusing the array
|
||||
list implementation described above.
|
||||
|
||||
% Implementation simple! Just slap together the stuff from the prior two sections, and add a header + bucket locking.
|
||||
|
||||
%\subsection{Linear Hash Table\label{sub:Linear-Hash-Table}}
|
||||
% % The implementation has changed too much to directly reuse old section, other than description of linear hash tables:
|
||||
%
|
||||
%Linear hash tables are hash tables that are able to extend their bucket
|
||||
%list incrementally at runtime. They work as follows. Imagine that
|
||||
%we want to double the size of a hash table of size $2^{n}$, and that
|
||||
%the hash table has been constructed with some hash function $h_{n}(x)=h(x)\, mod\,2^{n}$.
|
||||
%Choose $h_{n+1}(x)=h(x)\, mod\,2^{n+1}$ as the hash function for
|
||||
%the new table. Conceptually we are simply prepending a random bit
|
||||
%to the old value of the hash function, so all lower order bits remain
|
||||
%the same. At this point, we could simply block all concurrent access
|
||||
%and iterate over the entire hash table, reinserting values according
|
||||
%to the new hash function.
|
||||
%
|
||||
%However, because of the way we chose $h_{n+1}(x),$ we know that the
|
||||
%contents of each bucket, $m$, will be split between bucket $m$ and
|
||||
%bucket $m+2^{n}$. Therefore, if we keep track of the last bucket that
|
||||
%was split, we can split a few buckets at a time, resizing the hash
|
||||
%table without introducing long pauses while we reorganize the hash
|
||||
%table~\cite{lht}.
|
||||
%
|
||||
%We can handle overflow using standard techniques;
|
||||
%\yad's linear hash table simply uses the linked list implementations
|
||||
%described above. The bucket list is implemented by reusing the array
|
||||
%list implementation described above.
|
||||
%
|
||||
%% Implementation simple! Just slap together the stuff from the prior two sections, and add a header + bucket locking.
|
||||
%
|
||||
\item {\bf Asynchronous log implementation/Fast
|
||||
writes. Prioritization of log writes (one {}``log'' per page)
|
||||
implies worst case performance (write, then immediate read) will
|
||||
|
@ -1069,20 +1069,56 @@ list implementation described above.
|
|||
|
||||
\end{enumerate}
|
||||
|
||||
\section{Benchmarks}
|
||||
\section{Experimental setup}
|
||||
|
||||
\subsection{Experimental setup}
|
||||
The following sections describe the design and implementation of
|
||||
non-trivial functionality using \yad, and use Berkeley DB for
|
||||
comparison where appropriate. We chose Berkeley DB because, among
|
||||
commonly used systems, it provides transactional storage that is most
|
||||
similar to \yad. Also, it is available in open source form, and as a
|
||||
commercially maintained and supported program. Finally, it has been
|
||||
designed for high performance, high concurrency environments.
|
||||
|
||||
All benchmarks were run on and Intel .... {\em @todo} with the
|
||||
following Berkeley DB flags enabled {\em @todo}. These flags were
|
||||
chosen to match Berkeley DB's configuration to \yad's as closely as
|
||||
possible. In cases where
|
||||
following Berkeley DB flags enabled {\em @todo}. We used the copy
|
||||
of Berkeley DB 4.2.52 as it existed in Debian Linux's testing
|
||||
branch during March of 2005. These flags were chosen to match
|
||||
Berkeley DB's configuration to \yad's as closely as possible. In cases where
|
||||
Berkeley DB implements a feature that is not provided by \yad, we
|
||||
enable the feature if it improves Berkeley DB's performance, but
|
||||
disable the feature if it degrades Berkeley DB's performance. With
|
||||
the exception of \yad's optimized serialization mechanism in the
|
||||
OASYS test, the two libraries provide the same set of transactional
|
||||
semantics during each test.
|
||||
semantics during each test.
|
||||
|
||||
Optimizations to Berkeley DB that we performed included disabling the
|
||||
lock manager (we still use ``Free Threaded'' handles for all tests.
|
||||
This yielded a significant increase in performance because it removed
|
||||
the possbility of transaction deadlock, abort and repetition.
|
||||
However, after introducing this optimization high concurrency Berkeley
|
||||
DB benchmarks became unstable, suggesting that we are calling the
|
||||
library incorrectly. We believe that this problem would only improve
|
||||
Berkeley DB's performance in the benchmarks that we ran, so we
|
||||
disabled the lock manager for our tests. Without this optimization,
|
||||
Berkeley DB's performance for Figure~\ref{fig:TPS} strictly decreased as
|
||||
concurrency increased because of lock contention and deadlock resolution.
|
||||
|
||||
We increased Berkeley DB's buffer cache and log buffer sizes, to match
|
||||
\yad's default sizes. Running with \yad's (larger) default values
|
||||
roughly doubled Berkeley DB's performance on the bulk loading tests.
|
||||
|
||||
Finally, we would like to point out that we expended a considerable
|
||||
effort while tuning Berkeley DB, and that our efforts significantly
|
||||
improved Berkeley DB's performance on these tests. While further
|
||||
tuning by Berkeley DB experts would probably improve Berkeley DB's
|
||||
numbers, we think that we have produced a reasonbly fair comparison
|
||||
between the two systems. The source code and scripts we used to
|
||||
generate this data is publicly available, and we have been able to
|
||||
reproduce the trends reported here on multiple systems.
|
||||
|
||||
|
||||
|
||||
\section{Linear Hash Table}
|
||||
|
||||
\begin{figure*}
|
||||
\includegraphics[%
|
||||
|
@ -1098,80 +1134,292 @@ the stair stepping, and split the numbers into 'hashtable' and 'raw
|
|||
access' graphs.}}
|
||||
\end{figure*}
|
||||
|
||||
\subsection{Conventional workloads}
|
||||
%\subsection{Conventional workloads}
|
||||
|
||||
Existing database servers and transactional libraries are tuned to
|
||||
support OLTP (Online Transaction Processing) workloads well. Roughly
|
||||
speaking, the workload of these systems is dominated by short
|
||||
transactions and response time is important. We are confident that a
|
||||
sophisticated system based upon our approach to transactional storage
|
||||
will compete well in this area, as our algorithm is based upon ARIES,
|
||||
which is the foundation of IBM's DB/2 database. However, our current
|
||||
implementation is geared toward simpler, specialized applications, so
|
||||
we cannot verify this directly. Instead, we present a number of
|
||||
microbenchmarks that compare our system against Berkeley DB, the most
|
||||
popular transactional library. Berkeley DB is a mature product and is
|
||||
actively maintained. While it currently provides more functionality
|
||||
than our current implementation, we believe that our architecture
|
||||
could support a broader range of features than those that are provided
|
||||
by BerkeleyDB's monolithic interface.
|
||||
%Existing database servers and transactional libraries are tuned to
|
||||
%support OLTP (Online Transaction Processing) workloads well. Roughly
|
||||
%speaking, the workload of these systems is dominated by short
|
||||
%transactions and response time is important.
|
||||
%
|
||||
%We are confident that a
|
||||
%sophisticated system based upon our approach to transactional storage
|
||||
%will compete well in this area, as our algorithm is based upon ARIES,
|
||||
%which is the foundation of IBM's DB/2 database. However, our current
|
||||
%implementation is geared toward simpler, specialized applications, so
|
||||
%we cannot verify this directly. Instead, we present a number of
|
||||
%microbenchmarks that compare our system against Berkeley DB, the most
|
||||
%popular transactional library. Berkeley DB is a mature product and is
|
||||
%actively maintained. While it currently provides more functionality
|
||||
%than our current implementation, we believe that our architecture
|
||||
%could support a broader range of features than those that are provided
|
||||
%by BerkeleyDB's monolithic interface.
|
||||
|
||||
Hash table indices are common in the OLTP (Online Transsaction
|
||||
Processing) world, and are also applicable to a large number of
|
||||
applications. In this section, we describe how we implemented two
|
||||
variants of Linear Hash tables using \yad, and describe how \yad's
|
||||
flexible page and log formats allow end-users of our library to
|
||||
perform similar optimizations. We also argue that \yad makes it
|
||||
trivial to produce concurrent data structure implementations, and
|
||||
provide a set of mechanical steps that will allow a non-concurrent
|
||||
data structure implementation to be used by interleaved transactions.
|
||||
|
||||
Finally, we describe a number of more complex optimizations, and
|
||||
compare the performance of our optimized implementation, the
|
||||
straightforward implementation, and Berkeley DB's hash implementation.
|
||||
The straightforward implementation is used by the other applications
|
||||
presented in this paper, and is \yad's default hashtable
|
||||
implementation. We chose this implmentation over the faster optimized
|
||||
hash table in order to this emphasize that it is easy to implement
|
||||
high-performance transactional data structures with \yad, and because
|
||||
it is easy to understand and convince ourselves that the
|
||||
straightforward implementation is correct.
|
||||
|
||||
The first test (Figure~\ref{fig:BULK_LOAD}) measures the throughput of a single long running
|
||||
transaction that loads a synthetic data set into the
|
||||
library. For comparison, we provide throughput for many different
|
||||
\yad operations, BerkeleyDB's DB\_HASH hashtable implementation,
|
||||
and lower level DB\_RECNO record number based interface. We see
|
||||
that \yad's operation implementations outperform Berkeley DB in
|
||||
this test, which is not surprising, as Berkeley DB's hash table
|
||||
implements a number of extensions (such the association of sorted
|
||||
sets of values with a single key) that are not supported by \yad.
|
||||
We decided to implement a linear hash table. Linear hash tables are
|
||||
hash tables that are able to extend their bucket list incrementally at
|
||||
runtime. They work as follows. Imagine that we want to double the size
|
||||
of a hash table of size $2^{n}$, and that the hash table has been
|
||||
constructed with some hash function $h_{n}(x)=h(x)\, mod\,2^{n}$.
|
||||
Choose $h_{n+1}(x)=h(x)\, mod\,2^{n+1}$ as the hash function for the
|
||||
new table. Conceptually we are simply prepending a random bit to the
|
||||
old value of the hash function, so all lower order bits remain the
|
||||
same. At this point, we could simply block all concurrent access and
|
||||
iterate over the entire hash table, reinserting values according to
|
||||
the new hash function.
|
||||
|
||||
The NTA (Nested Top Action) version of \yad's hash table is very
|
||||
cleanly implemented by making use of existing \yad data structures,
|
||||
and is not fundamentally more complex then normal multithreaded code.
|
||||
We expect application developers to write code in this style. The
|
||||
fact that the NTA hash table outperforms Berkeley DB's hashtable validates
|
||||
our hypothesis that a straightforward implementation of a specialized
|
||||
data structure can easily outperform a highly tuned implementation of
|
||||
a more general structure.
|
||||
However, because of the way we chose $h_{n+1}(x),$ we know that the
|
||||
contents of each bucket, $m$, will be split between bucket $m$ and
|
||||
bucket $m+2^{n}$. Therefore, if we keep track of the last bucket that
|
||||
was split, we can split a few buckets at a time, resizing the hash
|
||||
table without introducing long pauses while we reorganize the hash
|
||||
table~\cite{lht}.
|
||||
|
||||
The ``Fast'' \yad hashtable implementation is optimized for log
|
||||
In order to implement this scheme, we need two building blocks. We
|
||||
need a data structure that can handle bucket overflow, and we need to
|
||||
be able index into an expandible set of buckets using the bucket
|
||||
number.
|
||||
|
||||
\subsection{The Bucket List}
|
||||
|
||||
\yad provides access to transactional storage with page-level
|
||||
granularity and stores all record information in the same page file.
|
||||
Therefore, our bucket list must be partitioned into page size chunks,
|
||||
and (since other data structures may concurrently use the page file)
|
||||
we cannot assume that the entire bucket list is contiguous.
|
||||
Therefore, we need some level of indirection to allow us to map from
|
||||
bucket number to the record that stores the corresponding bucket.
|
||||
|
||||
\yad's allocation routines allow applications to reserve regions of
|
||||
contiguous pages. Therefore, if we are willing to allocate the bucket
|
||||
list in sufficiently large chunks, we can limit the number of such
|
||||
contiguous regions that we will require. Borrowing from Java's
|
||||
ArrayList structure, we initially allocate a fixed number of pages to
|
||||
store buckets, and allocate more pages as necessary, doubling the
|
||||
number allocated each time.
|
||||
|
||||
We allocate a fixed amount of storage for each bucket, so we know how
|
||||
many buckets will fit in each of these pages. Therefore, in order to
|
||||
look up an aribtrary bucket, we simply need to calculate which chunk
|
||||
of allocated pages will contain the bucket, and then the offset the
|
||||
appropriate page within that group of allocated pages.
|
||||
|
||||
Since we double the amount of space allocated at each step, we arrange
|
||||
to run out of addressable space before the lookup table that we need
|
||||
runs out of space.
|
||||
|
||||
Normal \yad slotted pages are not without overhead. Each record has
|
||||
an assoiciated size field, and an offset pointer that points to a
|
||||
location within the page. Throughout our bucket list implementation,
|
||||
we only deal with fixed length slots. \yad includes a ``Fixed page''
|
||||
interface that implements an on-page format that avoids these
|
||||
overheads by only handling fixed length entries. We use this
|
||||
interface directly to store the actual bucket entries. We override
|
||||
the ``page type'' field of the page that holds the lookup table.
|
||||
|
||||
This routes requests to access recordid's that reside in the index
|
||||
page to Array List's page handling code which uses the existing
|
||||
``Fixed page'' interface to read and write to the lookup table.
|
||||
Nothing in \yad's extendible page interface forced us to used the
|
||||
existing interface for this purpose, and we could have implemented the
|
||||
lookup table using the byte-oriented interface, but we decided to
|
||||
reuse existing code in order to simplify our implementation, and the
|
||||
Fixed page interface is already quite efficient.
|
||||
|
||||
The ArrayList page handling code overrides the recordid ``slot'' field
|
||||
to refer to a logical offset within the ArrayList. Therefore,
|
||||
ArrayList provides an interface that can be used as though it were
|
||||
backed by an infinitely large page that contains fixed length records.
|
||||
This seems to be generally useful, so the ArrayList implementation may
|
||||
be used independently of the hashtable.
|
||||
|
||||
For brevity we do not include a description of how the ArrayList
|
||||
operations are logged and implemented.
|
||||
|
||||
\subsection{Bucket Overflow}
|
||||
|
||||
For simplicity, our buckets are fixed length. However, we want to
|
||||
store variable length objects. Therefore, we store a header record in
|
||||
the bucket list that contains the location of the first item in the
|
||||
list. This is represented as a $(page,slot)$ tuple. If the bucket is
|
||||
empty, we let $page=-1$. We could simply store each linked list entry
|
||||
as a seperate record, but it would be nicer if we could preserve
|
||||
locality, but it is unclear how \yad's generic record allocation
|
||||
routine could support this directly. Based upon the observation that
|
||||
a space reservation scheme could arrange for pages to maintain a bit
|
||||
of free space we take a 'list of lists' approach to our bucket list
|
||||
implementation. Bucket lists consist of two types of entries. The
|
||||
first maintains a linked list of pages, and contains an offset
|
||||
internal to the page that it resides in, and a $(page,slot)$ tuple
|
||||
that points to the next page that contains items in the list. All of
|
||||
the internal page offsets may be traversed without asking the buffer
|
||||
manager to unpin and repin the page in memory, providing very fast
|
||||
list traversal if the members if the list is allocated in a way that
|
||||
preserves locality. This optimization would not be possible if it
|
||||
were not for the low level interfaces provided by the buffer manager
|
||||
(which seperates pinning pages and reading records into seperate
|
||||
API's) Again, since this data structure seems to have some intersting
|
||||
properties, it can also be used on its own.
|
||||
|
||||
\subsection{Concurrency}
|
||||
|
||||
Given the structures described above, implementation of a linear hash
|
||||
table is straightforward. A linear hash function is used to map keys
|
||||
to buckets, insertions and deletions are handled by the linked list
|
||||
implementation, and the table can be extended by removing items from
|
||||
one linked list and adding them to another list.
|
||||
|
||||
Provided the underlying data structures are transactional and there
|
||||
are never any concurrent transactions, this is actually all that is
|
||||
needed to complete the linear hash table implementation.
|
||||
Unfortunately, as we mentioned in section~\ref{todo}, things become a
|
||||
bit more complex if we allow interleaved transactions. To get around
|
||||
this, and to allow multithreaded access to the hashtable, we protect
|
||||
all of the hashtable operations with pthread mutexes. Then, we
|
||||
implement inverse operations for each operation we want to support
|
||||
(this is trivial in the case of the hash table, since ``insert'' is
|
||||
the logical inverse of ``remove.''), then we add calls to begin nested
|
||||
top actions in each of the places where we added a mutex acquisition,
|
||||
and remove the nested top action wherever we release a mutex. Of
|
||||
course, nested top actions are not necessary for read only operations.
|
||||
|
||||
This completes our description of \yad's default hashtable
|
||||
implementation. We would like to emphasize the fact that implementing
|
||||
transactional support and concurrency for this data structure is
|
||||
straightforward, and (other than requiring the design of a logical
|
||||
logging format, and the restrictions imposed by fixed length pages) is
|
||||
not fundamentally more difficult or than the implementation of normal
|
||||
data structures). Also, while implementing the hash table, we also
|
||||
implemented two generally useful transactional data structures.
|
||||
|
||||
Next we describe some additional optimizations that
|
||||
we could have performed, and evaluate the performance of our
|
||||
implementations.
|
||||
|
||||
\subsection{The optimized hashtable}
|
||||
|
||||
Our optimized hashtable implementation is optimized for log
|
||||
bandwidth, only stores fixed length entries, and does not obey normal
|
||||
recovery semantics. It is included in this test as an example of the
|
||||
sort of optimizations that are possible (but difficult) to perform
|
||||
with \yad. The slower, stable NTA hashtable is used
|
||||
in all other benchmarks in this paper.
|
||||
recovery semantics.
|
||||
|
||||
In the future, we hope that improved
|
||||
tool support for \yad will allow application developers to easily apply
|
||||
sophisticated optimizations to their operations. Until then, application
|
||||
developers that settle for ``slow'' straightforward implementations of
|
||||
specialized data structures should see a significant increase in
|
||||
performance over existing systems.
|
||||
Instead of using nested top actions, the optimized implementation
|
||||
applies updates in a carefully chosen order that minimizes the extent
|
||||
to which the on disk representation of the hash table could be
|
||||
corrupted. (Figure~\ref{linkedList}) Before beginning updates, it
|
||||
writes an undo entry that will check and restore the consistency of
|
||||
the hashtable during recovery, and then invoke the inverse of the
|
||||
operation that needs to be undone. This recovery scheme does not
|
||||
require record level undo information. Therefore, pre-images of
|
||||
records do not need to be written to log, saving log bandwidth and
|
||||
enhancing performance.
|
||||
|
||||
Also, since this implementation does not need to support variable size
|
||||
entries, it stores the first entry of each bucket in the ArrayList
|
||||
that represents the bucket list, reducing the number of buffer manager
|
||||
calls that must be made. Finally, this implementation caches
|
||||
information about each hashtable that the application is working with
|
||||
in memory so that it does not have to obtain a copy of hashtable
|
||||
header information from the buffer mananger for each request.
|
||||
|
||||
The most important component of \yad for this optimization is \yad's
|
||||
flexible recovery and logging scheme. For brevity we only mention
|
||||
that this hashtable implementation finer grained latching than the one
|
||||
mentioned above, but do not describe how this was implemented. Finer
|
||||
grained latching is relatively easy in this case since most changes
|
||||
only affect a few buckets.
|
||||
|
||||
\subsection{Performance}
|
||||
|
||||
We ran a number of benchmarks on the two hashtable implementations
|
||||
mentioned above, and used Berkeley BD for comparison.
|
||||
|
||||
%In the future, we hope that improved
|
||||
%tool support for \yad will allow application developers to easily apply
|
||||
%sophisticated optimizations to their operations. Until then, application
|
||||
%developers that settle for ``slow'' straightforward implementations of
|
||||
%specialized data structures should achieve better performance than would
|
||||
%be possible by using existing systems that only provide general purpose
|
||||
%primatives.
|
||||
|
||||
The first test (Figure~\ref{fig:BULK_LOAD}) measures the throughput of
|
||||
a single long running
|
||||
transaction that loads a synthetic data set into the
|
||||
library. For comparison, we also provide throughput for many different
|
||||
\yad operations, BerkeleyDB's DB\_HASH hashtable implementation,
|
||||
and lower level DB\_RECNO record number based interface.
|
||||
|
||||
Both of \yad's hashtable implementations perform well, but the complex
|
||||
optimized implementation is clearly faster. This is not surprising as
|
||||
it issues fewer buffer manager requests and writes fewer log entries
|
||||
than the straightforward implementation.
|
||||
|
||||
We see that \yad's other operation implementations also perform well
|
||||
in this test. The page oriented list implementation is geared toward
|
||||
preserving the locality of short lists, and we see that it has
|
||||
quadratic performance in this test. This is because the list is
|
||||
traversed each time a new page must be allocated.
|
||||
|
||||
Note that page allocation is relatively infrequent since many entries
|
||||
will typically fit on the same page. In the case of our linear
|
||||
hashtable, bucket reorganization ensures that the average occupancy of
|
||||
a bucket is less than one. Buckets that have recently had entries
|
||||
added to them will tend to have occupancies greater than or equal to
|
||||
one. As the average occupancy of these buckets drops over time, the
|
||||
page oriented list should have the opportunity to allocate space on
|
||||
pages that it already occupies.
|
||||
|
||||
In a seperate experiment not presented here, we compared the
|
||||
implementation of the page oriented linked list to \yad's conventional
|
||||
linked list implementation. While the conventional implementation
|
||||
performs better when bulk loading large amounts of data into a single
|
||||
linked list, we have found that a hashtable built with the page oriented list
|
||||
outperforms otherwise equivalent hashtables that use conventional linked lists.
|
||||
|
||||
|
||||
%The NTA (Nested Top Action) version of \yad's hash table is very
|
||||
%cleanly implemented by making use of existing \yad data structures,
|
||||
%and is not fundamentally more complex then normal multithreaded code.
|
||||
%We expect application developers to write code in this style.
|
||||
|
||||
%{\em @todo need to explain why page-oriented list is slower in the
|
||||
%second chart, but provides better hashtable performance.}
|
||||
|
||||
The second test (Figure~\ref{fig:TPS}) measures the two libraries' ability to exploit
|
||||
concurrent transactions to reduce logging overhead. Both systems
|
||||
implement a simple optimization that allows multiple calls to commit()
|
||||
to be serviced by a single synchronous disk request. This test shows
|
||||
that both Berkeley DB and \yad's are able to take advantage of
|
||||
multiple outstanding requests. \yad seems to more aggressively
|
||||
merge log force requests although Berkeley DB could probably be
|
||||
tuned to improve performance here. Also, it is possible that
|
||||
Berkeley DB's log force merging scheme is more robust than \yad's
|
||||
under certain workloads. Without extensively testing \yad under
|
||||
many real world workloads, it is difficult to tell whether our log
|
||||
merging scheme is too aggressive. This may be another example where
|
||||
can service concurrent calls to commit with a single
|
||||
synchronous I/O. Because different approaches to this
|
||||
optimization make sense under different circumstances,~\cite{findWorkOnThisOrRemoveTheSentence} this may
|
||||
be another aspect of transasctional storage systems where
|
||||
application control over a transactional storage policy is desirable.
|
||||
\footnote{Although our current implementation does not provide the hooks that
|
||||
would be necessary to alter log scheduling policy, the logger
|
||||
interface is cleanly seperated from the rest of \yad. In fact,
|
||||
the current commit merging policy was implemented in an hour or
|
||||
two, months after the log file implementation was written. In
|
||||
future work, we would like to explore the possiblity of virtualizing
|
||||
more of \yad's internal api's. Our choice of C as an implementation
|
||||
language complicates this task somewhat.}
|
||||
|
||||
%\footnote{Although our current implementation does not provide the hooks that
|
||||
%would be necessary to alter log scheduling policy, the logger
|
||||
%interface is cleanly seperated from the rest of \yad. In fact,
|
||||
%the current commit merging policy was implemented in an hour or
|
||||
%two, months after the log file implementation was written. In
|
||||
%future work, we would like to explore the possiblity of virtualizing
|
||||
%more of \yad's internal api's. Our choice of C as an implementation
|
||||
%language complicates this task somewhat.}
|
||||
|
||||
|
||||
\begin{figure*}
|
||||
\includegraphics[%
|
||||
|
@ -1197,10 +1445,19 @@ response times for each case.
|
|||
|
||||
@todo analysis / come up with a more sane graph format.
|
||||
|
||||
The fact that our straightfoward hashtable outperforms Berkeley DB's hashtable shows that
|
||||
straightforward implementations of specialized data structures can
|
||||
often outperform highly tuned, general purpose implementations.
|
||||
This finding suggests that it is appropriate for
|
||||
application developers to consider the development of custom
|
||||
transactional storage mechanisms if application performance is
|
||||
important.
|
||||
|
||||
|
||||
\subsection{Object Serialization}\label{OASYS}
|
||||
|
||||
Object serialization performance is extremely important in modern web
|
||||
service systems such as Enterprise Java Beans. Object serialization is also a
|
||||
application systems such as Enterprise Java Beans. Object serialization is also a
|
||||
convenient way of adding persistant storage to an existing application
|
||||
without developing an explicit file format or dealing with low level
|
||||
I/O interfaces.
|
||||
|
@ -1425,11 +1682,11 @@ optimization techniques it may be possible to narrow or close this
|
|||
gap, increasing the benefits that our library offers to applications
|
||||
that implement specialized data access routines.
|
||||
|
||||
We also would like to extend our work into distributed system
|
||||
We would like to extend our work into distributed system
|
||||
development. We believe that \yad's implementation anticipates many
|
||||
of the issues that we will face in extending our work to distributed
|
||||
domains. By adding networking support to our logical log interface,
|
||||
we should be able to multiplex and replicate log entries to multiple
|
||||
of the issues that we will face in distributed domains. By adding
|
||||
networking support to our logical log interface,
|
||||
we should be able to multiplex and replicate log entries to sets of
|
||||
nodes easily. Single node optimizations such as the demand based log
|
||||
reordering primative should be directly applicable to multi-node
|
||||
systems.~\footnote{For example, our (local, and non-redundant) log
|
||||
|
@ -1442,19 +1699,36 @@ that make use of streaming data or that need to perform
|
|||
transformations on application requests before they are materialzied
|
||||
in a transactional data store.
|
||||
|
||||
We also hope to provide a library of
|
||||
transactional data structures with functionality that is comparable to
|
||||
standard programming language libraries such as Java's Collection API
|
||||
or portions of C++'s STL. Our linked list implementations, array list
|
||||
implementation and hashtable represent an initial attempt to implement
|
||||
this functionality. We are unaware of any transactional system that
|
||||
provides such a broad range of data structure implementations.
|
||||
|
||||
Also, we have noticed that the intergration between transactional
|
||||
storage primatives and in memory data structures is often fairly
|
||||
limited. (For example, JDBC does not reuse Java's iterator
|
||||
interface.) We have been experimenting with the production of a
|
||||
uniform interface to iterators, maps, and other structures which would
|
||||
allow code to be simultaneously written for native in-memory storage
|
||||
and for our transactional layer. We believe the fundamental reason
|
||||
for the differing API's of past systems is the heavy weight nature of
|
||||
the primatives provided by transactional systems, and the highly
|
||||
specialized, light weight interfaces provided by typical in memory
|
||||
structures. Because \yad makes it easy to implement light weight
|
||||
transactional structures, it may be easy to integrate it further with
|
||||
programming language constructs.
|
||||
|
||||
Finally, due to the large amount of prior work in this area, we have
|
||||
found that there are a large number of optimizations and features that
|
||||
could be applied to \yad. It is our intention to produce a usable
|
||||
system from our research prototype. To this end, we have already
|
||||
released \yad as an open source library, and intend to produce a
|
||||
stable release once we are confident that the implementation is correct
|
||||
and reliable. We also hope to provide a library of
|
||||
transactional data structures with functionality that is comparable to
|
||||
standard programming language libraries such as Java's Collection API
|
||||
or portions of C++'s STL. Our linked list implementations, array list
|
||||
implementation and hashtable represent an initial attempt to implement
|
||||
this functionality. We are unaware of any transactional system that
|
||||
provides such a broad range of data structure implementations.
|
||||
and reliable.
|
||||
|
||||
|
||||
\section{Conclusion}
|
||||
|
||||
|
|
Loading…
Reference in a new issue