Paper edits..
This commit is contained in:
parent
04af977f3a
commit
2e7686e483
1 changed files with 169 additions and 140 deletions
|
@ -657,7 +657,7 @@ during normal operation.
|
|||
|
||||
|
||||
As long as operation implementations obey the atomicity constraints
|
||||
outlined above, and the algorithms they use correctly manipulate
|
||||
outlined above and the algorithms they use correctly manipulate
|
||||
on-disk data structures, the write ahead logging protocol will provide
|
||||
the application with the ACID transactional semantics, and provide
|
||||
high performance, highly concurrent and scalable access to the
|
||||
|
@ -683,11 +683,12 @@ independently extended and improved.
|
|||
|
||||
We have implemented a number of simple, high performance
|
||||
and general-purpose data structures. These are used by our sample
|
||||
applications, and as building blocks for new data structures. Example
|
||||
applications and as building blocks for new data structures. Example
|
||||
data structures include two distinct linked-list implementations, and
|
||||
an growable array. Surprisingly, even these simple operations have
|
||||
a growable array. Surprisingly, even these simple operations have
|
||||
important performance characteristics that are not available from
|
||||
existing systems.
|
||||
%(Sections~\ref{sub:Linear-Hash-Table} and~\ref{TransClos})
|
||||
|
||||
The remainder of this section is devoted to a description of the
|
||||
various primitives that \yad provides to application developers.
|
||||
|
@ -696,14 +697,14 @@ various primitives that \yad provides to application developers.
|
|||
\label{lock-manager}
|
||||
\eab{present the API?}
|
||||
|
||||
\yad
|
||||
provides a default page-level lock manager that performs deadlock
|
||||
\yad provides a default page-level lock manager that performs deadlock
|
||||
detection, although we expect many applications to make use of
|
||||
deadlock-avoidance schemes, which are already prevalent in
|
||||
multithreaded application development. The Lock Manager is flexible
|
||||
enough to also provide index locks for hashtable implementations, and more complex locking protocols.
|
||||
enough to also provide index locks for hashtable implementations and
|
||||
more complex locking protocols.
|
||||
|
||||
For example, it would be relatively easy to build a strict two-phase
|
||||
Also, it would be relatively easy to build a strict two-phase
|
||||
locking hierarchical lock
|
||||
manager~\cite{hierarcicalLocking,hierarchicalLockingOnAriesExample} on
|
||||
top of \yad. Such a lock manager would provide isolation guarantees
|
||||
|
@ -852,6 +853,8 @@ that should be presented here. {\em Physical logging }
|
|||
is the practice of logging physical (byte-level) updates
|
||||
and the physical (page-number) addresses to which they are applied.
|
||||
|
||||
\rcs{Do we really need to differentiate between types of diffs appiled to pages? The concept of physical redo/logical undo is probably more important...}
|
||||
|
||||
{\em Physiological logging } is what \yad recommends for its redo
|
||||
records~\cite{physiological}. The physical address (page number) is
|
||||
stored, but the byte offset and the actual delta are stored implicitly
|
||||
|
@ -871,7 +874,8 @@ This forms the basis of \yad's flexible page layouts. We current
|
|||
support three layouts: a raw page (RawPage), which is just an array of
|
||||
bytes, a record-oriented page with fixed-size records (FixedPage), and
|
||||
a slotted-page that support variable-sized records (SlottedPage).
|
||||
Data structures can pick the layout that is most convenient.
|
||||
Data structures can pick the layout that is most convenient or implement
|
||||
new layouts.
|
||||
|
||||
{\em Logical logging} uses a higher-level key to specify the
|
||||
UNDO/REDO. Since these higher-level keys may affect multiple pages,
|
||||
|
@ -919,8 +923,8 @@ without considering the data values and structural changes introduced
|
|||
$B$, which is likely to cause corruption. At this point, $B$ would
|
||||
have to be aborted as well ({\em cascading aborts}).
|
||||
|
||||
With nested top actions, ARIES defines the structural changes as their
|
||||
own mini-transaction. This means that the structural change
|
||||
With nested top actions, ARIES defines the structural changes as a
|
||||
mini-transaction. This means that the structural change
|
||||
``commits'' even if the containing transaction ($A$) aborts, which
|
||||
ensures that $B$'s update remains valid.
|
||||
|
||||
|
@ -936,26 +940,29 @@ In particular, we have found a simple recipe for converting a
|
|||
non-concurrent data structure into a concurrent one, which involves
|
||||
three steps:
|
||||
\begin{enumerate}
|
||||
\item Wrap a mutex around each operation, this can be done with the lock
|
||||
manager, or just using pthread mutexes. This provides fine-grain isolation.
|
||||
\item Wrap a mutex around each operation. If full transactional isolation
|
||||
with deadlock detection is required, this can be done with the lock
|
||||
manager. Alternatively, this can be done using pthread mutexes which
|
||||
provides fine-grain isolation and allows the application to decide
|
||||
what sort of isolation scheme to use.
|
||||
\item Define a logical UNDO for each operation (rather than just using
|
||||
a lower-level physical undo). For example, this is easy for a
|
||||
hashtable; e.g. the undo for an {\em insert} is {\em remove}.
|
||||
\item For mutating operations (not read-only), add a ``begin nested
|
||||
top action'' right after the mutex acquisition, and a ``commit
|
||||
nested top action'' where we release the mutex.
|
||||
nested top action'' right before the mutex is released.
|
||||
\end{enumerate}
|
||||
This recipe ensures that any operations that might span multiple pages
|
||||
commit any structural changes and thus avoids cascading aborts. If
|
||||
this transaction aborts, the logical undo will {\em compensate} for
|
||||
This recipe ensures that operations that might span multiple pages
|
||||
atomically apply and commit any structural changes and thus avoids
|
||||
cascading aborts. If the transaction that encloses the operations
|
||||
aborts, the logical undo will {\em compensate} for
|
||||
its effects, but leave its structural changes intact (or augment
|
||||
them). Note that by releasing the mutex before we commit, we are
|
||||
violating strict two-phase locking in exchange for better performance
|
||||
and support for deadlock avoidance schemes.
|
||||
We have found the recipe to be easy to follow and very effective, and
|
||||
we use in everywhere we have structural changes, such as growing a
|
||||
hash table or array.
|
||||
|
||||
we use it everywhere our concurrent data structures may make structural
|
||||
changes, such as growing a hash table or array.
|
||||
|
||||
%% \textcolor{red}{OLD TEXT:} Section~\ref{sub:OperationProperties} states that \yad does not allow
|
||||
%% cascading aborts, implying that operation implementors must protect
|
||||
|
@ -1017,7 +1024,8 @@ the relevant data.
|
|||
\item Redo operations use page numbers and possibly record numbers
|
||||
while Undo operations use these or logical names/keys
|
||||
\item Acquire latches as needed (typically per page or record)
|
||||
\item Use nested top actions or ``big locks'' for multi-page updates
|
||||
\item Use nested top actions (which require a logical undo log record)
|
||||
or ``big locks'' (which drastically reduce concurrency) for multi-page updates.
|
||||
\end{enumerate}
|
||||
|
||||
\subsubsection{Example: Increment/Decrement}
|
||||
|
@ -1284,9 +1292,6 @@ All reported numbers
|
|||
correspond to the mean of multiple runs and represent a 95\%
|
||||
confidence interval with a standard deviation of +/- 5\%.
|
||||
|
||||
\mjd{Eric: Please reword the above to be accurate}
|
||||
\eab{I think Rusty has to do this, as I don't know what the scrips do. Assuming they intended for 5\% on each side, this is a fine way to say it.}
|
||||
|
||||
We used Berkeley DB 4.2.52 as it existed in Debian Linux's testing
|
||||
branch during March of 2005, with the flags DB\_TXN\_SYNC, and DB\_THREAD
|
||||
enabled. These flags were chosen to match
|
||||
|
@ -1312,7 +1317,7 @@ improve Berkeley DB's performance in our benchmarks, so we disabled
|
|||
the lock manager for all tests. Without this optimization, Berkeley
|
||||
DB's performance for Figure~\ref{fig:TPS} strictly decreases with increased concurrency due to contention and deadlock recovery.
|
||||
|
||||
We increased Berkeley DB's buffer cache and log buffer sizes, to match
|
||||
We increased Berkeley DB's buffer cache and log buffer sizes to match
|
||||
\yad's default sizes. Running with \yad's (larger) default values
|
||||
roughly doubled Berkeley DB's performance on the bulk loading tests.
|
||||
|
||||
|
@ -1328,17 +1333,6 @@ reproduce the trends reported here on multiple systems.
|
|||
|
||||
\section{Linear Hash Table\label{sub:Linear-Hash-Table}}
|
||||
|
||||
\begin{figure*}
|
||||
\includegraphics[%
|
||||
width=1\columnwidth]{bulk-load.pdf}
|
||||
\includegraphics[%
|
||||
width=1\columnwidth]{bulk-load-raw.pdf}
|
||||
\caption{\label{fig:BULK_LOAD} This test measures the raw performance
|
||||
of the data structures provided by \yad and Berkeley DB. Since the
|
||||
test is run as a single transaction, overheads due to synchronous I/O
|
||||
and logging are minimized.}
|
||||
\end{figure*}
|
||||
|
||||
|
||||
%\subsection{Conventional workloads}
|
||||
|
||||
|
@ -1360,32 +1354,33 @@ and logging are minimized.}
|
|||
%could support a broader range of features than those that are provided
|
||||
%by BerkeleyDB's monolithic interface.
|
||||
|
||||
Hash table indices are common in databases, and are also applicable to
|
||||
Hash table indices are common in databases and are also applicable to
|
||||
a large number of applications. In this section, we describe how we
|
||||
implemented two variants of Linear Hash tables on top of \yad, and
|
||||
implemented two variants of Linear Hash tables on top of \yad and
|
||||
describe how \yad's flexible page and log formats enable interesting
|
||||
optimizations. We also argue that \yad makes it trivial to produce
|
||||
concurrent data structure implementations, and provide a set of
|
||||
mechanical steps that will allow a non-concurrent data structure
|
||||
implementation to be used by interleaved transactions.
|
||||
concurrent data structure implementations.
|
||||
%, and provide a set of
|
||||
%mechanical steps that will allow a non-concurrent data structure
|
||||
%implementation to be used by interleaved transactions.
|
||||
|
||||
Finally, we describe a number of more complex optimizations, and
|
||||
Finally, we describe a number of more complex optimizations and
|
||||
compare the performance of our optimized implementation, the
|
||||
straightforward implementation, and Berkeley DB's hash implementation.
|
||||
straightforward implementation and Berkeley DB's hash implementation.
|
||||
The straightforward implementation is used by the other applications
|
||||
presented in this paper, and is \yad's default hashtable
|
||||
presented in this paper and is \yad's default hashtable
|
||||
implementation. We chose this implmentation over the faster optimized
|
||||
hash table in order to this emphasize that it is easy to implement
|
||||
high-performance transactional data structures with \yad, and because
|
||||
high-performance transactional data structures with \yad and because
|
||||
it is easy to understand.
|
||||
|
||||
We decided to implement a {\em linear} hash table. Linear hash tables are
|
||||
hash tables that are able to extend their bucket list incrementally at
|
||||
runtime. They work as follows. Imagine that we want to double the size
|
||||
of a hash table of size $2^{n}$, and that the hash table has been
|
||||
of a hash table of size $2^{n}$ and that the hash table has been
|
||||
constructed with some hash function $h_{n}(x)=h(x)\, mod\,2^{n}$.
|
||||
Choose $h_{n+1}(x)=h(x)\, mod\,2^{n+1}$ as the hash function for the
|
||||
new table. Conceptually we are simply prepending a random bit to the
|
||||
new table. Conceptually, we are simply prepending a random bit to the
|
||||
old value of the hash function, so all lower order bits remain the
|
||||
same. At this point, we could simply block all concurrent access and
|
||||
iterate over the entire hash table, reinserting values according to
|
||||
|
@ -1396,20 +1391,17 @@ However,
|
|||
we know that the
|
||||
contents of each bucket, $m$, will be split between bucket $m$ and
|
||||
bucket $m+2^{n}$. Therefore, if we keep track of the last bucket that
|
||||
was split, we can split a few buckets at a time, resizing the hash
|
||||
was split then we can split a few buckets at a time, resizing the hash
|
||||
table without introducing long pauses.~\cite{lht}.
|
||||
|
||||
In order to implement this scheme, we need two building blocks. We
|
||||
In order to implement this scheme we need two building blocks. We
|
||||
need a data structure that can handle bucket overflow, and we need to
|
||||
be able index into an expandible set of buckets using the bucket
|
||||
number.
|
||||
|
||||
\subsection{The Bucket List}
|
||||
|
||||
\begin{figure}
|
||||
\includegraphics[width=3.25in]{LHT2.pdf}
|
||||
\caption{\label{fig:LHT}Structure of linked lists...}
|
||||
\end{figure}
|
||||
\rcs{This seems overly complicated to me...}
|
||||
|
||||
\yad provides access to transactional storage with page-level
|
||||
granularity and stores all record information in the same page file.
|
||||
|
@ -1424,19 +1416,20 @@ contiguous pages. Therefore, if we are willing to allocate the bucket
|
|||
list in sufficiently large chunks, we can limit the number of such
|
||||
contiguous regions that we will require. Borrowing from Java's
|
||||
ArrayList structure, we initially allocate a fixed number of pages to
|
||||
store buckets, and allocate more pages as necessary, doubling the
|
||||
store buckets and allocate more pages as necessary, doubling the
|
||||
number allocated each time.
|
||||
|
||||
We allocate a fixed amount of storage for each bucket, so we know how
|
||||
many buckets will fit in each of these pages. Therefore, in order to
|
||||
look up an aribtrary bucket, we simply need to calculate which chunk
|
||||
of allocated pages will contain the bucket, and then the offset the
|
||||
look up an aribtrary bucket we simply need to calculate which chunk
|
||||
of allocated pages will contain the bucket and then caculate the offset the
|
||||
appropriate page within that group of allocated pages.
|
||||
|
||||
%Since we double the amount of space allocated at each step, we arrange
|
||||
%to run out of addressable space before the lookup table that we need
|
||||
%runs out of space.
|
||||
|
||||
\rcs{This parapgraph doesn't really belong}
|
||||
Normal \yad slotted pages are not without overhead. Each record has
|
||||
an assoiciated size field, and an offset pointer that points to a
|
||||
location within the page. Throughout our bucket list implementation,
|
||||
|
@ -1449,11 +1442,15 @@ to record numbers within a page.
|
|||
\yad provides a call that allocates a contiguous range of pages. We
|
||||
use this method to allocate increasingly larger regions of pages as
|
||||
the array list expands, and store the regions' offsets in a single
|
||||
page header. When we need to access a record, we first calculate
|
||||
page header.
|
||||
|
||||
When we need to access a record, we first calculate
|
||||
which region the record is in, and use the header page to determine
|
||||
its offset. (We can do this because the size of each region is
|
||||
its offset. We can do this because the size of each region is
|
||||
deterministic; it is simply $size_{first~region} * 2^{region~number}$.
|
||||
We then calculate the $(page,slot)$ offset within that region. \yad
|
||||
We then calculate the $(page,slot)$ offset within that region.
|
||||
|
||||
\yad
|
||||
allows us to reference records by using a $(page,slot,size)$ triple,
|
||||
which we call a {\em recordid}, and we already know the size of the
|
||||
record. Once we have the recordid, the redo/undo entries are trivial.
|
||||
|
@ -1485,32 +1482,26 @@ and are provided by the Fixed Page interface.
|
|||
|
||||
\eab{don't get this section, and it sounds really complicated, which is counterproductive at this point -- Is this better now? -- Rusty}
|
||||
|
||||
For simplicity, our buckets are fixed length. However, we want to
|
||||
store variable length objects. For simplicity, we decided to store
|
||||
the keys and values outside of the bucket list.
|
||||
%Therefore, we store a header record in
|
||||
%the bucket list that contains the location of the first item in the
|
||||
%list. This is represented as a $(page,slot)$ tuple. If the bucket is
|
||||
%empty, we let $page=-1$. We could simply store each linked list entry
|
||||
%as a seperate record, but it would be nicer if we could preserve
|
||||
%locality, but it is unclear how \yad's generic record allocation
|
||||
%routine could support this directly.
|
||||
%Based upon the observation that
|
||||
%a space reservation scheme could arrange for pages to maintain a bit
|
||||
In order to help maintain the locality of our bucket lists, store these lists as a list of smaller lists. The first list links pages together. The smaller lists reside within a single page.
|
||||
%of free space we take a 'list of lists' approach to our bucket list
|
||||
%implementation. Bucket lists consist of two types of entries. The
|
||||
%first maintains a linked list of pages, and contains an offset
|
||||
%internal to the page that it resides in, and a $(page,slot)$ tuple
|
||||
%that points to the next page that contains items in the list.
|
||||
All of entries within a single page may be traversed without
|
||||
\begin{figure}
|
||||
\includegraphics[width=3.25in]{LHT2.pdf}
|
||||
\caption{\label{fig:LHT}Structure of linked lists...}
|
||||
\end{figure}
|
||||
|
||||
For simplicity, our buckets are fixed length. In order to support
|
||||
variable length entries we store the keys and values
|
||||
in linked lists, and represent each list as a list of
|
||||
smaller lists. The first list links pages together, and the smaller
|
||||
lists reside within a single page. (Figure~\ref{fig:LHT})
|
||||
|
||||
All of the entries within a single page may be traversed without
|
||||
unpinning and repinning the page in memory, providing very fast
|
||||
traversal if the list has good locality.
|
||||
This optimization would not be possible if it
|
||||
were not for the low level interfaces provided by the buffer manager
|
||||
(which seperates pinning pages and reading records into seperate
|
||||
API's) Since this data structure has some intersting
|
||||
properties (good locality and very fast access to short linked lists), it can also be used on its own.
|
||||
traversal over lists that have good locality. This optimization would
|
||||
not be possible if it were not for the low level interfaces provided
|
||||
by the buffer manager. In particular, we need to be able to specify
|
||||
which page we would like to allocate space on, and need to be able to
|
||||
read and write multiple records with a single call to pin/unpin. Due to
|
||||
this data structure's nice locality properties, and good performance
|
||||
for short lists, it can also be used on its own.
|
||||
|
||||
\subsection{Concurrency}
|
||||
|
||||
|
@ -1524,42 +1515,51 @@ from one bucket and adding them to another.
|
|||
Given that the underlying data structures are transactional and there
|
||||
are never any concurrent transactions, this is actually all that is
|
||||
needed to complete the linear hash table implementation.
|
||||
Unfortunately, as we mentioned in Section~\ref{todo}, things become a
|
||||
bit more complex if we allow interleaved transactions.
|
||||
Unfortunately, as we mentioned in Section~\ref{nested-top-actions},
|
||||
things become a bit more complex if we allow interleaved transactions.
|
||||
Therefore, we simply apply Nested Top Actions according to the recipe
|
||||
described in that section and lock the entire hashtable for each
|
||||
operation. This prevents the hashtable implementation from fully
|
||||
exploiting multiprocessor systems,\footnote{\yad passes regression
|
||||
tests on multiprocessor systems.} but seems to be adequate on single
|
||||
processor machines. (Figure~\ref{fig:TPS})
|
||||
We describe a finer grained concurrency mechanism below.
|
||||
|
||||
We have found a simple recipe for converting a non-concurrent data structure into a concurrent one, which involves three steps:
|
||||
\begin{enumerate}
|
||||
\item Wrap a mutex around each operation, this can be done with a lock
|
||||
manager, or just using pthread mutexes. This provides isolation.
|
||||
\item Define a logical UNDO for each operation (rather than just using
|
||||
the lower-level undo in the transactional array). This is easy for a
|
||||
hash table; e.g. the undo for an {\em insert} is {\em remove}.
|
||||
\item For mutating operations (not read-only), add a ``begin nested
|
||||
top action'' right after the mutex acquisition, and a ``commit
|
||||
nested top action'' where we release the mutex.
|
||||
\end{enumerate}
|
||||
%We have found a simple recipe for converting a non-concurrent data structure into a concurrent one, which involves three steps:
|
||||
%\begin{enumerate}
|
||||
%\item Wrap a mutex around each operation, this can be done with a lock
|
||||
% manager, or just using pthread mutexes. This provides isolation.
|
||||
%\item Define a logical UNDO for each operation (rather than just using
|
||||
% the lower-level undo in the transactional array). This is easy for a
|
||||
% hash table; e.g. the undo for an {\em insert} is {\em remove}.
|
||||
%\item For mutating operations (not read-only), add a ``begin nested
|
||||
% top action'' right after the mutex acquisition, and a ``commit
|
||||
% nested top action'' where we release the mutex.
|
||||
%\end{enumerate}
|
||||
%
|
||||
%Note that this scheme prevents multiple threads from accessing the
|
||||
%hashtable concurrently. However, it achieves a more important (and
|
||||
%somewhat unintuitive) goal. The use of a nested top action protects
|
||||
%the hashtable against {\em future} modifications by other
|
||||
%transactions. Since other transactions may commit even if this
|
||||
%transaction aborts, we need to make sure that we can safely undo the
|
||||
%hashtable insertion. Unfortunately, a future hashtable operation
|
||||
%could split a hash bucket, or manipulate a bucket overflow list,
|
||||
%potentially rendering any phyisical undo information that we could
|
||||
%record useless. Therefore, we need to have a logical undo operation
|
||||
%to protect against this. However, we could still crash as the
|
||||
%physical update is taking place, leaving the hashtable in an
|
||||
%inconsistent state after REDO completes. Therefore, we need to use
|
||||
%physical undo until the hashtable operation completes, and then {\em
|
||||
%switch to} logical undo before any other operation manipulates data we
|
||||
%just altered. This is exactly the functionality that a nested top
|
||||
%action provides.
|
||||
|
||||
Note that this scheme prevents multiple threads from accessing the
|
||||
hashtable concurrently. However, it achieves a more important (and
|
||||
somewhat unintuitive) goal. The use of a nested top action protects
|
||||
the hashtable against {\em future} modifications by other
|
||||
transactions. Since other transactions may commit even if this
|
||||
transaction aborts, we need to make sure that we can safely undo the
|
||||
hashtable insertion. Unfortunately, a future hashtable operation
|
||||
could split a hash bucket, or manipulate a bucket overflow list,
|
||||
potentially rendering any phyisical undo information that we could
|
||||
record useless. Therefore, we need to have a logical undo operation
|
||||
to protect against this. However, we could still crash as the
|
||||
physical update is taking place, leaving the hashtable in an
|
||||
inconsistent state after REDO completes. Therefore, we need to use
|
||||
physical undo until the hashtable operation completes, and then {\em
|
||||
switch to} logical undo before any other operation manipulates data we
|
||||
just altered. This is exactly the functionality that a nested top
|
||||
action provides. Since a normal hashtable operation is usually fast,
|
||||
and this is meant to be a simple hashtable implementation, we simply
|
||||
latch the entire hashtable to prevent any other threads from
|
||||
manipulating the hashtable until after we switch from phyisical to
|
||||
logical undo.
|
||||
%Since a normal hashtable operation is usually fast,
|
||||
%and this is meant to be a simple hashtable implementation, we simply
|
||||
%latch the entire hashtable to prevent any other threads from
|
||||
%manipulating the hashtable until after we switch from phyisical to
|
||||
%logical undo.
|
||||
|
||||
%\eab{need to explain better why this gives us concurrent
|
||||
%transactions.. is there a mutex for each record? each bucket? need to
|
||||
|
@ -1589,8 +1589,8 @@ straightforward. The only complications are a) defining a logical undo, and b)
|
|||
%\eab{this needs updating:} Also, while implementing the hash table, we also
|
||||
%implemented two generally useful transactional data structures.
|
||||
|
||||
Next we describe some additional optimizations and evaluate the
|
||||
performance of our implementations.
|
||||
%Next we describe some additional optimizations and evaluate the
|
||||
%performance of our implementations.
|
||||
|
||||
\subsection{The optimized hashtable}
|
||||
|
||||
|
@ -1624,6 +1624,18 @@ but we do not describe how this was implemented. Finer grained
|
|||
latching is relatively easy in this case since all operations only
|
||||
affect a few buckets, and buckets have a natural ordering.
|
||||
|
||||
\begin{figure*}
|
||||
\includegraphics[%
|
||||
width=1\columnwidth]{bulk-load.pdf}
|
||||
\includegraphics[%
|
||||
width=1\columnwidth]{bulk-load-raw.pdf}
|
||||
\caption{\label{fig:BULK_LOAD} This test measures the raw performance
|
||||
of the data structures provided by \yad and Berkeley DB. Since the
|
||||
test is run as a single transaction, overheads due to synchronous I/O
|
||||
and logging are minimized.}
|
||||
\end{figure*}
|
||||
|
||||
|
||||
\subsection{Performance}
|
||||
|
||||
We ran a number of benchmarks on the two hashtable implementations
|
||||
|
@ -1701,21 +1713,7 @@ application control over a transactional storage policy is desirable.
|
|||
%more of \yad's internal api's. Our choice of C as an implementation
|
||||
%language complicates this task somewhat.}
|
||||
|
||||
|
||||
\begin{figure*}
|
||||
\includegraphics[%
|
||||
width=1\columnwidth]{tps-new.pdf}
|
||||
\includegraphics[%
|
||||
width=1\columnwidth]{tps-extended.pdf}
|
||||
\caption{\label{fig:TPS} The logging mechanisms of \yad and Berkeley
|
||||
DB are able to combine multiple calls to commit() into a single disk force.
|
||||
This graph shows how \yad and Berkeley DB's throughput increases as
|
||||
the number of concurrent requests increases. The Berkeley DB line is
|
||||
cut off at 50 concurrent transactions because we were unable to
|
||||
reliable scale it past this point, although we believe that this is an
|
||||
artifact of our testing environment, and is not fundamental to
|
||||
Berkeley DB.}
|
||||
\end{figure*}
|
||||
\rcs{Is the graph for the next paragraph worth the space?}
|
||||
|
||||
The final test measures the maximum number of sustainable transactions
|
||||
per second for the two libraries. In these cases, we generate a
|
||||
|
@ -1726,7 +1724,8 @@ response times for each case.
|
|||
|
||||
\rcs{analysis / come up with a more sane graph format.}
|
||||
|
||||
The fact that our straightfoward hashtable is competitive with Berkeley DB's hashtable shows that
|
||||
The fact that our straightfoward hashtable is competitive
|
||||
with Berkeley DB's hashtable shows that
|
||||
straightforward implementations of specialized data structures can
|
||||
compete with comparable, highly tuned, general-purpose implementations.
|
||||
Similarly, it seems as though it is not difficult to implement specialized
|
||||
|
@ -1738,6 +1737,19 @@ application developers to consider the development of custom
|
|||
transactional storage mechanisms if application performance is
|
||||
important.
|
||||
|
||||
\begin{figure*}
|
||||
\includegraphics[%
|
||||
width=1\columnwidth]{tps-new.pdf}
|
||||
\includegraphics[%
|
||||
width=1\columnwidth]{tps-extended.pdf}
|
||||
\caption{\label{fig:TPS} The logging mechanisms of \yad and Berkeley
|
||||
DB are able to combine multiple calls to commit() into a single disk
|
||||
force, increasing throughput as the number of concurrent transactions
|
||||
grows. A problem with our testing environment prevented us from
|
||||
scaling Berkeley DB past 50 threads.
|
||||
}
|
||||
\end{figure*}
|
||||
|
||||
This section uses:
|
||||
\begin{enumerate}
|
||||
\item{Custom page layouts to implement ArrayList}
|
||||
|
@ -1779,8 +1791,25 @@ maintains a separate in-memory buffer pool with the serialized versions of
|
|||
some objects, as a cache of the on-disk data representation.
|
||||
Accesses to objects that are only present in this buffer
|
||||
pool incur medium latency, as they must be unmarshalled (deserialized)
|
||||
before the application may access them. There is often yet a third
|
||||
copy of the serialized data in the filesystem's buffer cache.
|
||||
before the application may access them.
|
||||
|
||||
\rcs{ MIKE FIX THIS }
|
||||
Worse, most transactional layers (including ARIES) must read a page into memory to
|
||||
service a write request to the page. If the transactional layer's page cache
|
||||
is too small, write requests must be serviced with potentially random disk I/O.
|
||||
This removes the primary advantage of write ahead logging, which is to ensure
|
||||
application data durability with sequential disk I/O.
|
||||
|
||||
In summary, this system architecture (though commonly deployed~\cite{ejb,ordbms,jdo,...}) is fundamentally
|
||||
flawed. In order to access objects quickly, the application must keep
|
||||
its working set in cache. In order to service write requests, the
|
||||
transactional layer must store a redundant copy of the entire working
|
||||
set in memory or resort to random I/O. Therefore, roughly half of
|
||||
system memory must be wasted by any write intensive application.
|
||||
|
||||
%There is often yet a third
|
||||
%copy of the serialized data in the filesystem's buffer cache.
|
||||
|
||||
|
||||
%Finally, some objects may
|
||||
%only reside on disk, and require a disk read.
|
||||
|
@ -2008,7 +2037,7 @@ We loosly base the graphs for this test on the graphs used by the oo7
|
|||
benchmark~\cite{oo7}. For the test, we hardcode the outdegree of
|
||||
graph nodes to 3, 6 and 9. This allows us to represent graph nodes as
|
||||
fixed length records. The Array List from our linear hash table
|
||||
implementation (Section~\ref{linear-hash-table}) provides access to an
|
||||
implementation (Section~\ref{sub:Linear-Hash-Table}) provides access to an
|
||||
array of such records with performance that is competive with native
|
||||
recordid accesses, so we use an Array List to store the records. We
|
||||
could have opted for a slightly more efficient representation by
|
||||
|
|
Loading…
Reference in a new issue