sec 6, reduce figures

This commit is contained in:
Eric Brewer 2005-03-26 02:22:02 +00:00
parent 49e3385b34
commit c8c7abf16c

View file

@ -675,13 +675,13 @@ fuzzy snapshot is fine.
\begin{figure} \begin{figure}
\includegraphics[% \includegraphics[%
width=1\columnwidth]{structure.pdf} width=1\columnwidth]{structure.pdf}
\caption{\sf \label{fig:structure} Structure of an action...} \caption{\sf\label{fig:structure} \eab{not ref'd} Structure of an action...}
\end{figure} \end{figure}
As long as operation implementations obey the atomicity constraints As long as operation implementations obey the atomicity constraints
outlined above and the algorithms they use correctly manipulate outlined above and the algorithms they use correctly manipulate
on-disk data structures, the write ahead logging protocol will provide on-disk data structures, the write-ahead logging protocol will provide
the application with the ACID transactional semantics, and provide the application with the ACID transactional semantics, and provide
high performance, highly concurrent and scalable access to the high performance, highly concurrent and scalable access to the
application data that is stored in the system. This suggests a application data that is stored in the system. This suggests a
@ -698,7 +698,7 @@ and optimizations. This layer is the core of \yad.
The upper layer, which can be authored by the application developer, The upper layer, which can be authored by the application developer,
provides the actual data structure implementations, policies regarding provides the actual data structure implementations, policies regarding
page layout (other than the location of the LSN field), and the page layout, and the
implementation of any application-specific operations. As long as implementation of any application-specific operations. As long as
each layer provides well defined interfaces, the application, each layer provides well defined interfaces, the application,
operation implementation, and write-ahead logging component can be operation implementation, and write-ahead logging component can be
@ -712,7 +712,6 @@ a growable array. Surprisingly, even these simple operations have
important performance characteristics that are not available from important performance characteristics that are not available from
existing systems. existing systems.
%(Sections~\ref{sub:Linear-Hash-Table} and~\ref{TransClos}) %(Sections~\ref{sub:Linear-Hash-Table} and~\ref{TransClos})
The remainder of this section is devoted to a description of the The remainder of this section is devoted to a description of the
various primitives that \yad provides to application developers. various primitives that \yad provides to application developers.
@ -738,6 +737,7 @@ implementations that may be used with \yad and its index implementations.
%top of \yad. Such a lock manager would provide isolation guarantees %top of \yad. Such a lock manager would provide isolation guarantees
%for all applications that make use of it. %for all applications that make use of it.
However, applications that However, applications that
make use of a lock manager must handle deadlocked transactions make use of a lock manager must handle deadlocked transactions
that have been aborted by the lock manager. This is easy if all of that have been aborted by the lock manager. This is easy if all of
@ -870,7 +870,7 @@ work, or deal with the corner cases that aborted transactions create.
% lock manager, etc can come later... % lock manager, etc can come later...
% %
% \item {\bf {}``Write ahead logging protocol'' vs {}``Data structure implementation''} % \item {\bf {}``Write-ahead logging protocol'' vs {}``Data structure implementation''}
% %
%A \yad operation consists of some code that manipulates data that has %A \yad operation consists of some code that manipulates data that has
%been stored in transactional pages. These operations implement %been stored in transactional pages. These operations implement
@ -917,6 +917,7 @@ semantics.
%In addition to supporting custom log entries, this mechanism %In addition to supporting custom log entries, this mechanism
%is the basis of \yad's {\em flexible page layouts}. %is the basis of \yad's {\em flexible page layouts}.
\yad also uses this mechanism to support four {\em page layouts}: \yad also uses this mechanism to support four {\em page layouts}:
{\em raw-page}, which is just an array of {\em raw-page}, which is just an array of
bytes, {\em fixed-page}, a record-oriented page with fixed-length records, bytes, {\em fixed-page}, a record-oriented page with fixed-length records,
@ -984,7 +985,7 @@ high-performance data structures. In particular, an operation that
spans pages can be made atomic by simply wrapping it in a nested top spans pages can be made atomic by simply wrapping it in a nested top
action and obtaining appropriate latches at runtime. This approach action and obtaining appropriate latches at runtime. This approach
reduces development of atomic page spanning operations to something reduces development of atomic page spanning operations to something
very similar to conventional multithreaded development that use mutexes very similar to conventional multithreaded development that uses mutexes
for synchronization. for synchronization.
In particular, we have found a simple recipe for converting a In particular, we have found a simple recipe for converting a
non-concurrent data structure into a concurrent one, which involves non-concurrent data structure into a concurrent one, which involves
@ -993,7 +994,7 @@ three steps:
\item Wrap a mutex around each operation. If this is done with care, \item Wrap a mutex around each operation. If this is done with care,
it may be possible to use finer grained mutexes. it may be possible to use finer grained mutexes.
\item Define a logical UNDO for each operation (rather than just using \item Define a logical UNDO for each operation (rather than just using
a lower-level physical UNDO). For example, this is easy for a a set of page-level UNDOs). For example, this is easy for a
hashtable; e.g. the UNDO for an {\em insert} is {\em remove}. hashtable; e.g. the UNDO for an {\em insert} is {\em remove}.
\item For mutating operations (not read-only), add a ``begin nested \item For mutating operations (not read-only), add a ``begin nested
top action'' right after the mutex acquisition, and a ``commit top action'' right after the mutex acquisition, and a ``commit
@ -1061,7 +1062,6 @@ changes, such as growing a hash table or array.
Given this background, we now cover adding new operations. \yad is Given this background, we now cover adding new operations. \yad is
designed to allow application developers to easily add new data designed to allow application developers to easily add new data
representations and data structures by defining new operations. representations and data structures by defining new operations.
There are a number of invariants that these operations must obey: There are a number of invariants that these operations must obey:
\begin{enumerate} \begin{enumerate}
\item Pages should only be updated inside of a REDO or UNDO function. \item Pages should only be updated inside of a REDO or UNDO function.
@ -1070,10 +1070,10 @@ There are a number of invariants that these operations must obey:
the page that the REDO function sees, then the wrapper should latch the page that the REDO function sees, then the wrapper should latch
the relevant data. the relevant data.
\item REDO operations use page numbers and possibly record numbers \item REDO operations use page numbers and possibly record numbers
while UNDO operations use these or logical names/keys while UNDO operations use these or logical names/keys.
\item Acquire latches as needed (typically per page or record) %\item Acquire latches as needed (typically per page or record)
\item Use nested top actions (which require a logical UNDO log record) \item Use nested top actions (which require a logical UNDO)
or ``big locks'' (which drastically reduce concurrency) for multi-page updates. or ``big locks'' (which reduce concurrency) for multi-page updates.
\end{enumerate} \end{enumerate}
\noindent{\bf An Example: Increment/Decrement} \noindent{\bf An Example: Increment/Decrement}
@ -1087,7 +1087,7 @@ trivial). Here we show how increment/decrement map onto \yad operations.
First, we define the operation-specific part of the log record: First, we define the operation-specific part of the log record:
\begin{small} \begin{small}
\begin{verbatim} \begin{verbatim}
typedef struct { int amount } inc_dec_t; typedef struct { int amount } inc_dec_t;
\end{verbatim} \end{verbatim}
\noindent {\normalsize Here is the increment operation; decrement is \noindent {\normalsize Here is the increment operation; decrement is
analogous:} analogous:}
@ -1097,13 +1097,14 @@ int operateIncrement(int xid, Page* p, lsn_t lsn,
recordid rid, const void *d) { recordid rid, const void *d) {
inc_dec_t * arg = (inc_dec_t)d; inc_dec_t * arg = (inc_dec_t)d;
int i; int i;
latchRecord(rid);
latchRecord(p, rid);
readRecord(xid, p, rid, &i); // read current value readRecord(xid, p, rid, &i); // read current value
i += arg->amount; i += arg->amount;
// write new value and update the LSN // write new value and update the LSN
writeRecord(xid, p, lsn, rid, &i); writeRecord(xid, p, lsn, rid, &i);
unlatchRecord(rid); unlatchRecord(p, rid);
return 0; // no error return 0; // no error
} }
\end{verbatim} \end{verbatim}
@ -1114,12 +1115,13 @@ ops[OP_INCREMENT].implementation= &operateIncrement;
ops[OP_INCREMENT].argumentSize = sizeof(inc_dec_t); ops[OP_INCREMENT].argumentSize = sizeof(inc_dec_t);
// set the REDO to be the same as normal operation // set the REDO to be the same as normal operation
// Sometime is useful to have them differ. // Sometimes useful to have them differ
ops[OP_INCREMENT].redoOperation = OP_INCREMENT; ops[OP_INCREMENT].redoOperation = OP_INCREMENT;
// set UNDO to be the inverse // set UNDO to be the inverse
ops[OP_INCREMENT].undoOperation = OP_DECREMENT; ops[OP_INCREMENT].undoOperation = OP_DECREMENT;
\end{verbatim} \end{verbatim}
{\normalsize Finally, here is the wrapper that uses the {\normalsize Finally, here is the wrapper that uses the
operation, which is identified via {\small\tt OP\_INCREMENT}; operation, which is identified via {\small\tt OP\_INCREMENT};
applications use the wrapper rather than the operation, as it tends to applications use the wrapper rather than the operation, as it tends to
@ -1146,13 +1148,16 @@ int Tincrement(int xid, recordid rid, int amount) {
With some examination it is possible to show that this example meets With some examination it is possible to show that this example meets
the invariants. In addition, because the REDO code is used for normal the invariants. In addition, because the REDO code is used for normal
operation, most bugs are easy to find with conventional testing operation, most bugs are easy to find with conventional testing
strategies. strategies. However, as we will see in Section~\ref{OASYS}, even
these invariants can be stretched by sophisticated developers.
% covered this in future work... % covered this in future work...
%As future work, there is some hope of verifying these %As future work, there is some hope of verifying these
%invariants statically; for example, it is easy to verify that pages %invariants statically; for example, it is easy to verify that pages
%are only modified by operations, and it is also possible to verify %are only modified by operations, and it is also possible to verify
%latching for our page layouts that support records. %latching for our page layouts that support records.
%% Furthermore, we plan to develop a number of tools that will %% Furthermore, we plan to develop a number of tools that will
%% automatically verify or test new operation implementations' behavior %% automatically verify or test new operation implementations' behavior
%% with respect to these constraints, and behavior during recovery. For %% with respect to these constraints, and behavior during recovery. For
@ -1161,8 +1166,6 @@ strategies.
%% could be used to check operation behavior under various recovery %% could be used to check operation behavior under various recovery
%% conditions and thread schedules. %% conditions and thread schedules.
However, as we will see in Section~\ref{OASYS}, even these invariants
can be stretched by sophisticated developers.
\subsection{Summary} \subsection{Summary}
@ -1320,18 +1323,18 @@ and simplify software design.
The following sections describe the design and implementation of The following sections describe the design and implementation of
non-trivial functionality using \yad, and use Berkeley DB for non-trivial functionality using \yad, and use Berkeley DB for
comparison where appropriate. We chose Berkeley DB because, among comparison. We chose Berkeley DB because, among
commonly used systems, it provides transactional storage that is most commonly used systems, it provides transactional storage that is most
similar to \yad, and it was similar to \yad, and it was
designed for high-performance, high-concurrency environments. designed for high performance and high concurrency.
All benchmarks were run on an Intel Xeon 2.8 GHz with 1GB of RAM and a All benchmarks were run on an Intel Xeon 2.8 GHz with 1GB of RAM and a
10K RPM SCSI drive, formatted with reiserfs\footnote{We found that the 10K RPM SCSI drive, formatted with reiserfs.\footnote{We found that the
relative performance of Berkeley DB and \yad is highly sensitive to relative performance of Berkeley DB and \yad is highly sensitive to
filesystem choice, and we plan to investigate the reasons why the filesystem choice, and we plan to investigate the reasons why the
performance of \yad under ext3 is degraded. However, the results performance of \yad under ext3 is degraded. However, the results
relating to the \yad optimizations are consistent across filesystem relating to the \yad optimizations are consistent across filesystem
types.}. All reported numbers correspond to the mean of multiple runs types.} All results correspond to the mean of multiple runs
with a 95\% confidence interval with a half-width of 5\%. with a 95\% confidence interval with a half-width of 5\%.
We used Berkeley DB 4.2.52 as it existed in Debian Linux's testing We used Berkeley DB 4.2.52 as it existed in Debian Linux's testing
@ -1340,13 +1343,8 @@ enabled. These flags were chosen to match
Berkeley DB's configuration to \yad's as closely as possible. In cases where Berkeley DB's configuration to \yad's as closely as possible. In cases where
Berkeley DB implements a feature that is not provided by \yad, we Berkeley DB implements a feature that is not provided by \yad, we
enable the feature if it improves Berkeley DB's performance, but enable the feature if it improves Berkeley DB's performance, but
disable the feature if it degrades Berkeley DB's performance. disable it otherwise.
For each of the tests, the two libraries provide the same transactional semantics. For each of the tests, the two libraries provide the same transactional semantics.
% With
%the exception of \yad's optimized serialization mechanism in the
%\oasys test (see Section \ref{OASYS}),
%the two libraries provide the same set of transactional
%semantics during each test.
Optimizations to Berkeley DB that we performed included disabling the Optimizations to Berkeley DB that we performed included disabling the
lock manager, though we still use ``Free Threaded'' handles for all lock manager, though we still use ``Free Threaded'' handles for all
@ -1411,10 +1409,11 @@ compare the performance of our optimized implementation, the
straightforward implementation and Berkeley DB's hash implementation. straightforward implementation and Berkeley DB's hash implementation.
The straightforward implementation is used by the other applications The straightforward implementation is used by the other applications
presented in this paper and is \yad's default hashtable presented in this paper and is \yad's default hashtable
implementation. We chose this implementation over the faster optimized implementation.
hash table in order to this emphasize that it is easy to implement % We chose this implementation over the faster optimized
high-performance transactional data structures with \yad and because %hash table in order to this emphasize that it is easy to implement
it is easy to understand. %high-performance transactional data structures with \yad and because
%it is easy to understand.
We decided to implement a {\em linear} hash table~\cite{lht}. Linear We decided to implement a {\em linear} hash table~\cite{lht}. Linear
hash tables are hash tables that are able to extend their bucket list hash tables are hash tables that are able to extend their bucket list
@ -1445,7 +1444,7 @@ The simplest bucket map would simply use a fixed-length transactional
array. However, since we want the size of the table to grow, we should array. However, since we want the size of the table to grow, we should
not assume that it fits in a contiguous range of pages. Instead, we build not assume that it fits in a contiguous range of pages. Instead, we build
on top of \yad's transactional ArrayList data structure (inspired by on top of \yad's transactional ArrayList data structure (inspired by
Java's structure of the same name). the Java class).
The ArrayList provides the appearance of large growable array by The ArrayList provides the appearance of large growable array by
breaking the array into a tuple of contiguous page intervals that breaking the array into a tuple of contiguous page intervals that
@ -1457,8 +1456,7 @@ For space efficiency, the array elements themselves are stored using
the fixed-length record page layout. Thus, we use the header page to the fixed-length record page layout. Thus, we use the header page to
find the right interval, and then index into it to get the $(page, find the right interval, and then index into it to get the $(page,
slot)$ address. Once we have this address, the REDO/UNDO entries are slot)$ address. Once we have this address, the REDO/UNDO entries are
trivial: they simply log the before and after image of the that trivial: they simply log the before or after image of that record.
record.
%\rcs{This paragraph doesn't really belong} %\rcs{This paragraph doesn't really belong}
@ -1485,20 +1483,13 @@ record.
\subsection{Bucket List} \subsection{Bucket List}
%\eab{don't get this section, and it sounds really complicated, which is counterproductive at this point -- Is this better now? -- Rusty}
%
%\eab{some basic questions: 1) does the record described above contain
%key/value pairs or a pointer to a linked list? Ideally it would be
%one bucket with a next pointer at the end... 2) what about values that
%are bigger than one bucket?, 3) add caption to figure.}
\begin{figure} \begin{figure}
\hspace{.25in} \hspace{.25in}
\includegraphics[width=3.25in]{LHT2.pdf} \includegraphics[width=3.25in]{LHT2.pdf}
\caption{\sf \label{fig:LHT}Structure of locality preserving ({\em page-oriented}) \caption{\sf\label{fig:LHT}Structure of locality preserving ({\em
linked lists. Hashtable bucket overflow lists tend to be of some small fixed page-oriented}) linked lists. By keeping sub-lists within one page,
length. This data structure allows \yad to aggressively maintain page locality \yad improves locality and simplifies most list operations to a single
for short lists, providing fast overflow bucket traversal for the hash table.} log entry.}
\end{figure} \end{figure}
Given the map, which locates the bucket, we need a transactional Given the map, which locates the bucket, we need a transactional
@ -1511,8 +1502,8 @@ However, in order to achieve good locality, we instead implement a
{\em page-oriented} transactional linked list, shown in {\em page-oriented} transactional linked list, shown in
Figure~\ref{fig:LHT}. The basic idea is to place adjacent elements of Figure~\ref{fig:LHT}. The basic idea is to place adjacent elements of
the list on the same page: thus we use a list of lists. The main list the list on the same page: thus we use a list of lists. The main list
links pages together, while the smaller lists reside with that links pages together, while the smaller lists reside within one
page. \yad's slotted pages allows the smaller lists to support page. \yad's slotted pages allow the smaller lists to support
variable-size values, and allow list reordering and value resizing variable-size values, and allow list reordering and value resizing
with a single log entry (since everything is on one page). with a single log entry (since everything is on one page).
@ -1520,22 +1511,11 @@ In addition, all of the entries within a page may be traversed without
unpinning and repinning the page in memory, providing very fast unpinning and repinning the page in memory, providing very fast
traversal over lists that have good locality. This optimization would traversal over lists that have good locality. This optimization would
not be possible if it were not for the low-level interfaces provided not be possible if it were not for the low-level interfaces provided
by the buffer manager. In particular, we need to specify which page by the buffer manager. In particular, we need to control space
we would like to allocate space from and we need to be able to allocation, and be able to read and write multiple records with a
read and write multiple records with a single call to pin/unpin. Due to single call to pin/unpin. Due to this data structure's nice locality
this data structure's nice locality properties and good performance properties and good performance for short lists, it can also be used
for short lists, it can also be used on its own. on its own.
\begin{figure*}
\includegraphics[%
width=1\columnwidth]{bulk-load.pdf}
\includegraphics[%
width=1\columnwidth]{bulk-load-raw.pdf}
\caption{\sf \label{fig:BULK_LOAD} This test measures the raw performance
of the data structures provided by \yad and Berkeley DB. Since the
test is run as a single transaction, overheads due to synchronous I/O
and logging are minimized.}
\end{figure*}
@ -1548,14 +1528,14 @@ implementation, and the table can be extended lazily by
transactionally removing items from one bucket and adding them to transactionally removing items from one bucket and adding them to
another. another.
Given that the underlying data structures are transactional and a Given the underlying transactional data structures and a
single lock around the hashtable, this is actually all that is needed single lock around the hashtable, this is actually all that is needed
to complete the linear hash table implementation. Unfortunately, as to complete the linear hash table implementation. Unfortunately, as
we mentioned in Section~\ref{nested-top-actions}, things become a bit we mentioned in Section~\ref{nested-top-actions}, things become a bit
more complex if we allow interleaved transactions. The solution for more complex if we allow interleaved transactions. The solution for
the default hashtable is simply to follow the recipe for Nested the default hashtable is simply to follow the recipe for Nested
Top Actions, and only lock the whole table during structural changes. Top Actions, and only lock the whole table during structural changes.
We explore a version with finer-grain locking below. We also explore a version with finer-grain locking below.
%This prevents the %This prevents the
%hashtable implementation from fully exploiting multiprocessor %hashtable implementation from fully exploiting multiprocessor
%systems,\footnote{\yad passes regression tests on multiprocessor %systems,\footnote{\yad passes regression tests on multiprocessor
@ -1615,9 +1595,10 @@ We explore a version with finer-grain locking below.
%% course, nested top actions are not necessary for read only operations. %% course, nested top actions are not necessary for read only operations.
This completes our description of \yad's default hashtable This completes our description of \yad's default hashtable
implementation. We would like to emphasize the fact that implementing implementation. We would like to emphasize that implementing
transactional support and concurrency for this data structure is transactional support and concurrency for this data structure is
straightforward. The only complications are a) defining a logical UNDO, and b) dealing with fixed-length records. straightforward. The only complications are a) defining a logical
UNDO, and b) dealing with fixed-length records.
%, and (other than requiring the design of a logical %, and (other than requiring the design of a logical
%logging format, and the restrictions imposed by fixed length pages) is %logging format, and the restrictions imposed by fixed length pages) is
@ -1638,14 +1619,15 @@ version of nested top actions.
Instead of using nested top actions, the optimized implementation Instead of using nested top actions, the optimized implementation
applies updates in a carefully chosen order that minimizes the extent applies updates in a carefully chosen order that minimizes the extent
to which the on disk representation of the hash table can be to which the on disk representation of the hash table can be corrupted
corrupted (Figure~\ref{linkedList}). Before beginning updates, it \eab{(Figure~\ref{linkedList})}. This is essentially ``soft updates''
writes an UNDO entry that will check and restore the consistency of applied to a multi-page update~\cite{soft-updates}. Before beginning
the hashtable during recovery, and then invokes the inverse of the the update, it writes an UNDO entry that will check and restore the
operation that needs to be undone. This recovery scheme does not consistency of the hashtable during recovery, and then invokes the
require record-level UNDO information. Therefore, pre-images of inverse of the operation that needs to be undone. This recovery
records do not need to be written to log, saving log bandwidth and scheme does not require record-level UNDO information, and thus avoids
enhancing performance. before-image log entries, which saves log bandwidth and improves
performance.
Also, since this implementation does not need to support variable-size Also, since this implementation does not need to support variable-size
entries, it stores the first entry of each bucket in the ArrayList entries, it stores the first entry of each bucket in the ArrayList
@ -1663,9 +1645,19 @@ ordering.
\subsection{Performance} \subsection{Performance}
\begin{figure}[t]
\includegraphics[%
width=1\columnwidth]{bulk-load.pdf}
%\includegraphics[%
% width=1\columnwidth]{bulk-load-raw.pdf}
\caption{\sf\label{fig:BULK_LOAD} This test measures the raw performance
of the data structures provided by \yad and Berkeley DB. Since the
test is run as a single transaction, overheads due to synchronous I/O
and logging are minimized.}
\end{figure}
We ran a number of benchmarks on the two hashtable implementations We ran a number of benchmarks on the two hashtable implementations
mentioned above, and used Berkeley DB for comparison. mentioned above, and used Berkeley DB for comparison.
%In the future, we hope that improved %In the future, we hope that improved
%tool support for \yad will allow application developers to easily apply %tool support for \yad will allow application developers to easily apply
%sophisticated optimizations to their operations. Until then, application %sophisticated optimizations to their operations. Until then, application
@ -1673,7 +1665,6 @@ mentioned above, and used Berkeley DB for comparison.
%specialized data structures should achieve better performance than would %specialized data structures should achieve better performance than would
%be possible by using existing systems that only provide general purpose %be possible by using existing systems that only provide general purpose
%primitives. %primitives.
The first test (Figure~\ref{fig:BULK_LOAD}) measures the throughput of The first test (Figure~\ref{fig:BULK_LOAD}) measures the throughput of
a single long-running a single long-running
transaction that loads a synthetic data set into the transaction that loads a synthetic data set into the
@ -1686,29 +1677,29 @@ optimized implementation is clearly faster. This is not surprising as
it issues fewer buffer manager requests and writes fewer log entries it issues fewer buffer manager requests and writes fewer log entries
than the straightforward implementation. than the straightforward implementation.
\eab{missing} With the exception of the page oriented list, we see %% \eab{remove?} With the exception of the page oriented list, we see
that \yad's other operation implementations also perform well in %% that \yad's other operation implementations also perform well in
this test. The page-oriented list implementation is %% this test. The page-oriented list implementation is
geared toward preserving the locality of short lists, and we see that %% geared toward preserving the locality of short lists, and we see that
it has quadratic performance in this test. This is because the list %% it has quadratic performance in this test. This is because the list
is traversed each time a new page must be allocated. %% is traversed each time a new page must be allocated.
%Note that page allocation is relatively infrequent since many entries %% %Note that page allocation is relatively infrequent since many entries
%will typically fit on the same page. In the case of our linear %% %will typically fit on the same page. In the case of our linear
%hashtable, bucket reorganization ensures that the average occupancy of %% %hashtable, bucket reorganization ensures that the average occupancy of
%a bucket is less than one. Buckets that have recently had entries %% %a bucket is less than one. Buckets that have recently had entries
%added to them will tend to have occupancies greater than or equal to %% %added to them will tend to have occupancies greater than or equal to
%one. As the average occupancy of these buckets drops over time, the %% %one. As the average occupancy of these buckets drops over time, the
%page oriented list should have the opportunity to allocate space on %% %page oriented list should have the opportunity to allocate space on
%pages that it already occupies. %% %pages that it already occupies.
Since the linear hash table bounds the length of these lists, %% Since the linear hash table bounds the length of these lists,
asymptotic behavior of the list is less important than the %% asymptotic behavior of the list is less important than the
behavior with a bounded number of list entries. In a separate experiment %% behavior with a bounded number of list entries. In a separate experiment
not presented here, we compared the implementation of the %% not presented here, we compared the implementation of the
page-oriented linked list to \yad's conventional linked-list %% page-oriented linked list to \yad's conventional linked-list
implementation, and found that the page-oriented list is faster %% implementation, and found that the page-oriented list is faster
when used within the context of our hashtable implementation. %% when used within the context of our hashtable implementation.
%The NTA (Nested Top Action) version of \yad's hash table is very %The NTA (Nested Top Action) version of \yad's hash table is very
%cleanly implemented by making use of existing \yad data structures, %cleanly implemented by making use of existing \yad data structures,
@ -1718,21 +1709,29 @@ when used within the context of our hashtable implementation.
%{\em @todo need to explain why page-oriented list is slower in the %{\em @todo need to explain why page-oriented list is slower in the
%second chart, but provides better hashtable performance.} %second chart, but provides better hashtable performance.}
The second test (Figure~\ref{fig:TPS}) measures the two libraries' ability to exploit \begin{figure}[t]
concurrent transactions to reduce logging overhead. Both systems %\includegraphics[%
can service concurrent calls to commit with a single % width=1\columnwidth]{tps-new.pdf}
synchronous I/O.~\footnote{The multi-threading benchmarks presented \includegraphics[%
width=1\columnwidth]{tps-extended.pdf}
\caption{\sf\label{fig:TPS} The logging mechanisms of \yad and Berkeley
DB are able to combine multiple calls to commit() into a single disk
force, increasing throughput as the number of concurrent transactions
grows. We were unable to get Berkeley DB to work correctly with more than 50 threads (see text).
}
\end{figure}
The second test (Figure~\ref{fig:TPS}) measures the two libraries'
ability to exploit concurrent transactions to reduce logging overhead.
Both systems can service concurrent calls to commit with a single
synchronous I/O~\footnote{The multi-threading benchmarks presented
here were performed using an ext3 file system, as high thread here were performed using an ext3 file system, as high thread
concurrency caused Berkeley DB and \yad to behave unpredictably concurrency caused Berkeley DB and \yad to behave unpredictably when
when reiserfs was used. However, \yad's multithreaded throughput was reiserfs was used. However, \yad's multithreaded throughput was
significantly better than Berkeley DB's with both filesystems.} significantly better than Berkeley DB's with both filesystems.}. \yad
scales very well with higher concurrency, delivering over 6000 (ACID)
transactions per second. \yad had about double the throughput of Berkeley DB (up to 50 threads).
%Because different approaches to this
%optimization make sense under different circumstances~\cite{findWorkOnThisOrRemoveTheSentence}, this may
%be another aspect of transactional storage systems where
%application control over a transactional storage policy is
%desirable.
%\footnote{Although our current implementation does not provide the hooks that %\footnote{Although our current implementation does not provide the hooks that
%would be necessary to alter log scheduling policy, the logger %would be necessary to alter log scheduling policy, the logger
@ -1743,49 +1742,34 @@ significantly better than Berkeley DB's with both filesystems.}
%more of \yad's internal APIs. Our choice of C as an implementation %more of \yad's internal APIs. Our choice of C as an implementation
%language complicates this task somewhat.} %language complicates this task somewhat.}
%\rcs{Is the graph for the next paragraph worth the space?}
%\eab{I can combine them onto one graph I think (not 2).}
%
%The final test measures the maximum number of sustainable transactions
%per second for the two libraries. In these cases, we generate a
%uniform number of transactions per second by spawning a fixed number of
%threads, and varying the number of requests each thread issues per
%second, and report the cumulative density of the distribution of
%response times for each case.
%
%\rcs{analysis / come up with a more sane graph format.}
Finally, we developed a simple load generator which spawns a pool of threads that Finally, we developed a simple load generator which spawns a pool of threads that
generate a fixed number of requests per second. We then measured generate a fixed number of requests per second. We then measured
response latency, and found that Berkeley DB and \yad behave response latency, and found that Berkeley DB and \yad behave
similarly. similarly.
In summary, there are a number of primatives that are necessary to In summary, there are a number of primitives that are necessary to
implement custom, high concurrency and low level transactional data implement custom, high-concurrency transactional data structures. In
structures. In order to implement and optimize a hashtable we used a order to implement and optimize the hashtable we used a number of
number of low level APIs that are not supported by other systems. We low-level APIs that are not supported by other systems. We needed to
needed to customize page layouts to implement ArrayList. The page-oriented customize page layouts to implement ArrayList. The page-oriented list
list addresses and allocates data with respect to pages in order to addresses and allocates data with respect to pages in order to
preserve locality. The hashtable implementation is built upon these two preserve locality. The hashtable implementation is built upon these
data structures, and needs to be able to generate custom log entries, two data structures, and needs to generate custom log
define custom latching/locking semantics, and make use of, or entries, define custom latching/locking semantics, and make use of, or
implement a custom variant of nested top actions. even customize, nested top actions.
The fact that our straightforward hashtable is competitive The fact that our default hashtable is competitive with Berkeley BD
with Berkeley DB shows that shows that simple \yad implementations of transactional data structures
straightforward implementations of specialized data structures can can compete with comparable, highly tuned, general-purpose
compete with comparable, highly-tuned, general-purpose implementations. implementations. Similarly, this example shows that \yad's flexibility enables optimizations that can significantly
Similarly, it seems as though it is not difficult to implement specialized outperform existing solutions.
data structures that can significantly outperform existing
general purpose structures.
This finding suggests that it is appropriate for This finding suggests that it is appropriate for
application developers to consider the development of custom application developers to consider the development of custom
transactional storage mechanisms when application performance is transactional storage mechanisms when application performance is
important. The next two sections are devoted to confirming the important. The next two sections are devoted to confirming the
practicality of such mechanisms by applying them to applications practicality of such mechanisms by applying them to applications
that suffer from long-standing performance problems with layered that suffer from long-standing performance problems with traditional databases.
transactional systems.
%This section uses: %This section uses:
@ -1799,18 +1783,7 @@ transactional systems.
%\end{enumerate} %\end{enumerate}
\begin{figure*}
\includegraphics[%
width=1\columnwidth]{tps-new.pdf}
\includegraphics[%
width=1\columnwidth]{tps-extended.pdf}
\caption{\sf \label{fig:TPS} The logging mechanisms of \yad and Berkeley
DB are able to combine multiple calls to commit() into a single disk
force, increasing throughput as the number of concurrent transactions
grows. A problem with our testing environment prevented us from
scaling Berkeley DB past 50 threads.
}
\end{figure*}
\section{Object Serialization} \section{Object Serialization}
\label{OASYS} \label{OASYS}
@ -1855,7 +1828,7 @@ causes performance degradation. Most transactional layers
into memory to service a write request to the page; if the buffer pool into memory to service a write request to the page; if the buffer pool
is too small, these operations trigger potentially random disk I/O. is too small, these operations trigger potentially random disk I/O.
This removes the primary This removes the primary
advantage of write ahead logging, which is to ensure application data advantage of write-ahead logging, which is to ensure application data
durability with mostly sequential disk I/O. durability with mostly sequential disk I/O.
In summary, this system architecture (though commonly In summary, this system architecture (though commonly