update general benchmark and sections
This commit is contained in:
parent
b390e1388e
commit
7060ce6c51
1 changed files with 173 additions and 99 deletions
|
@ -35,7 +35,6 @@
|
|||
\maketitle
|
||||
|
||||
|
||||
|
||||
%\subsection*{Abstract}
|
||||
|
||||
\rcs{Should we add a
|
||||
|
@ -1265,6 +1264,7 @@ that most strongly differentiates \yad from other, similar libraries.
|
|||
|
||||
|
||||
\section{Experimental setup}
|
||||
\label{sec:experimental_setup}
|
||||
|
||||
The following sections describe the design and implementation of
|
||||
non-trivial functionality using \yad, and use Berkeley DB for
|
||||
|
@ -1274,27 +1274,40 @@ similar to \yad. Also, it is available both in open-source form, and as a
|
|||
commercially maintained and supported program. Finally, it has been
|
||||
designed for high-performance, high-concurrency environments.
|
||||
|
||||
All benchmarks were run on and Intel .... \rcs{@todo} with the
|
||||
following Berkeley DB flags enabled \rcs{@todo}. We used the copy
|
||||
of Berkeley DB 4.2.52 as it existed in Debian Linux's testing
|
||||
branch during March of 2005. These flags were chosen to match
|
||||
All benchmarks were run on an Intel Xeon 2.8 GHz with 1GB of RAM and a
|
||||
10K RPM SCSI drive, formatted with reiserfs\footnote{We found that
|
||||
the relative performance of Berkeley DB and \yad is highly sensitive
|
||||
to filesystem choice, and we plan to investigate the reasons why the
|
||||
performance of \yad under ext3 is degraded. However, the results
|
||||
relating to the \yad optimizations are consistent across filesystem
|
||||
types.}.
|
||||
All reported numbers
|
||||
correspond to the mean of multiple runs and represent a 95\%
|
||||
confidence interval with a standard deviation of +/- 5\%.
|
||||
|
||||
\mjd{Eric: Please reword the above to be accurate}
|
||||
|
||||
We used Berkeley DB 4.2.52 as it existed in Debian Linux's testing
|
||||
branch during March of 2005, with the flags DB\_TXN\_SYNC, and DB\_THREAD
|
||||
enabled. These flags were chosen to match
|
||||
Berkeley DB's configuration to \yad's as closely as possible. In cases where
|
||||
Berkeley DB implements a feature that is not provided by \yad, we
|
||||
enable the feature if it improves Berkeley DB's performance, but
|
||||
disable the feature if it degrades Berkeley DB's performance. With
|
||||
the exception of \yad's optimized serialization mechanism in the
|
||||
OASYS test, the two libraries provide the same set of transactional
|
||||
\oasys test (see Section \ref{OASYS}),
|
||||
the two libraries provide the same set of transactional
|
||||
semantics during each test.
|
||||
|
||||
Optimizations to Berkeley DB that we performed included disabling the
|
||||
lock manager (we still use ``Free Threaded'' handles for all tests.
|
||||
lock manager, though we still use ``Free Threaded'' handles for all tests.
|
||||
This yielded a significant increase in performance because it removed
|
||||
the possbility of transaction deadlock, abort and repetition.
|
||||
However, after introducing this optimization high concurrency Berkeley
|
||||
DB benchmarks became unstable, suggesting that we are calling the
|
||||
the possiblity of transaction deadlock, abort, and repetition.
|
||||
However, after introducing this optimization, highly concurrent Berkeley
|
||||
DB benchmarks became unstable, suggesting that we may be calling the
|
||||
library incorrectly. We believe that this problem would only improve
|
||||
Berkeley DB's performance in the benchmarks that we ran, so we
|
||||
disabled the lock manager for our tests. Without this optimization,
|
||||
Berkeley DB's performance in our benchmarks, so we
|
||||
disabled the lock manager for all tests. Without this optimization,
|
||||
Berkeley DB's performance for Figure~\ref{fig:TPS} strictly decreased as
|
||||
concurrency increased because of lock contention and deadlock resolution.
|
||||
|
||||
|
@ -1733,7 +1746,7 @@ This section uses:
|
|||
|
||||
Object serialization performance is extremely important in modern web
|
||||
application systems such as Enterprise Java Beans. Object
|
||||
serialization is also a convenient way of adding persistant storage to
|
||||
serialization is also a convenient way of adding persistent storage to
|
||||
an existing application without developing an explicit file format or
|
||||
dealing with low-level I/O interfaces.
|
||||
|
||||
|
@ -1741,39 +1754,61 @@ A simple object serialization scheme would bulk-write and bulk-read
|
|||
sets of application objects to an operating system file. These
|
||||
schemes suffer from high read and write latency, and do not handle
|
||||
small updates well. More sophisticated schemes store each object in a
|
||||
seperate randomly accessible record, such as a database tuple, or
|
||||
Berkeley DB hashtable entry. These schemes allow for fast single
|
||||
seperate, randomly accessible record, such as a database tuple or
|
||||
a Berkeley DB hashtable entry. These schemes allow for fast single
|
||||
object reads and writes, and are typically the solutions used by
|
||||
application servers.
|
||||
|
||||
Unfortunately, most of these schemes ``double buffer'' application
|
||||
data. Typically, the application maintains a set of in-memory objects
|
||||
which may be accessed with low latency. The backing data store
|
||||
maintains a seperate buffer pool which contains serialized versions of
|
||||
the objects in memory, and corresponds to the on-disk representation
|
||||
of the data. Accesses to objects that are only present in the buffer
|
||||
One drawback of many such schemes is that any update typically
|
||||
requires a full serialization of the entire object. In many
|
||||
application scenarios, this can be highly inefficient, as it may be
|
||||
that only a single field of a complex object has been modified.
|
||||
|
||||
Furthermore, most of these schemes ``double cache'' object
|
||||
data. Typically, the application maintains a set of in-memory
|
||||
objects in their unserialized form, so they can be accessed with low latency.
|
||||
The backing data store also
|
||||
maintains a separate in-memory buffer pool with the serialized versions of
|
||||
some objects, as a cache of the on-disk data representation.
|
||||
Accesses to objects that are only present in this buffer
|
||||
pool incur medium latency, as they must be unmarshalled (deserialized)
|
||||
before the application may access them. Finally, some objects may
|
||||
only reside on disk, and require a disk read.
|
||||
before the application may access them. There is often yet a third
|
||||
copy of the serialized data in the filesystem's buffer cache.
|
||||
|
||||
%Finally, some objects may
|
||||
%only reside on disk, and require a disk read.
|
||||
|
||||
%Since these applications are typically data-centric, it is important
|
||||
%to make efficient use of system memory in order to reduce hardware
|
||||
%costs.
|
||||
|
||||
A straightforward solution to this problem would be to bound
|
||||
the amount of memory the application may consume by preventing it from
|
||||
caching deserialized objects. This scheme conserves memory, but it
|
||||
incurs the cost of an in-memory deserialization to read the object,
|
||||
and an in-memory deserialization/serialization cycle to write to an
|
||||
object.
|
||||
For I/O bound applications, efficient use of in-memory caching is
|
||||
well-known to be critical to performance. Note that for these schemes,
|
||||
the memory consumed by the buffer pool is basically redundant, since
|
||||
it just caches the translated form of the object so it can be read or
|
||||
written to disk. However, naively restricting the memory consumed by
|
||||
the buffer pool results in poor performance in existing transactional
|
||||
storage systems. This is due to the fact that an object update must
|
||||
update the current state of the backing store, which typically
|
||||
requires reading in the old copy of the page on which the object is
|
||||
stored to update the object data.
|
||||
|
||||
Alternatively, the amount of memory consumed by the buffer pool could
|
||||
be bounded to some small value, and the application could maintain a
|
||||
large object cache. This scheme would incur no overhead for a read
|
||||
request. However, it would incur the overhead of a disk-based
|
||||
serialization in order to service a write request.\footnote{In
|
||||
practice, the transactional backing store would probably fetch the
|
||||
page that contains the object from disk, causing two disk I/O's.}
|
||||
%% A straightforward solution to this problem would be to bound
|
||||
%% the amount of memory the application may consume by preventing it from
|
||||
%% caching deserialized objects. This scheme conserves memory, but it
|
||||
%% incurs the cost of an in-memory deserialization to read the object,
|
||||
%% and an in-memory deserialization/serialization cycle to write to an
|
||||
%% object.
|
||||
|
||||
%% Alternatively, the amount of memory consumed by the buffer pool could
|
||||
%% be bounded to some small value, and the application could maintain a
|
||||
%% large object cache. This scheme would incur no overhead for a read
|
||||
%% request. However, it would incur the overhead of a disk-based
|
||||
%% serialization in order to service a write request.\footnote{In
|
||||
%% practice, the transactional backing store would probably fetch the
|
||||
%% page that contains the object from disk, causing two disk I/O's.}
|
||||
|
||||
\subsection{\yad Optimizations}
|
||||
|
||||
\yad's architecture allows us to apply two interesting optimizations
|
||||
to object serialization. First, since \yad supports
|
||||
|
@ -1781,13 +1816,16 @@ custom log entries, it is trivial to have it store diffs of objects to
|
|||
the log instead of writing the entire object to log during an update.
|
||||
Such an optimization would be difficult to achieve with Berkeley DB,
|
||||
but could be performed by a database server if the fields of the
|
||||
objects were broken into database table columns. It is unclear if
|
||||
objects were broken into database table columns.
|
||||
\footnote{It is unclear if
|
||||
this optimization would outweigh the overheads associated with an SQL
|
||||
based interface. Depending on the database server, it may be
|
||||
necessary to issue a SQL update query that only updates a subset of a
|
||||
tuple's fields in order to generate a diff-based log entry. Doing so
|
||||
would preclude the use of prepared statements, or would require a large
|
||||
number of prepared statements to be maintained by the DBMS.
|
||||
number of prepared statements to be maintained by the DBMS. We plan to
|
||||
investigate the overheads of SQL in this context in the future.}
|
||||
|
||||
% If IPC or
|
||||
%the network is being used to comminicate with the DBMS, then it is very
|
||||
%likely that a seperate prepared statement for each type of diff that the
|
||||
|
@ -1802,57 +1840,46 @@ number of prepared statements to be maintained by the DBMS.
|
|||
The second optimization is a bit more sophisticated, but still easy to
|
||||
implement in \yad. We do not believe that it would be possible to
|
||||
achieve using existing relational database systems or with Berkeley
|
||||
DB.
|
||||
DB. This optimization allows us to drastically limit the size of the
|
||||
\yad buffer cache, yet still achieve good performance.
|
||||
|
||||
\yad services a request to write to a record by pinning (and possibly
|
||||
reading in) a page, generating a log entry, writing the
|
||||
new record value to the page, and unpinning the page.
|
||||
The basic idea of this optimization is to postpone expensive
|
||||
operations that update the page file for objects that are frequently
|
||||
modified, relying on some support from the application's object cache
|
||||
to maintain the transactional semantics.
|
||||
|
||||
If \yad knows that the client will not ask to read the record, then
|
||||
there is no real reason to update the version of the record in the
|
||||
page file. In fact, if no undo or redo information needs to be
|
||||
generated, there is no need to bring the page into memory in
|
||||
order to service a write.
|
||||
There are at least two scenarios that allow \yad to avoid loading the page.
|
||||
To implement this, we added two custom \yad operations. The
|
||||
{\tt``update()''} operation is called when an object is modified and
|
||||
still exists in the object cache. This causes a log entry to be
|
||||
written, but does not update the page file. The fact that the modified
|
||||
object still resides in the object cache guarantees that the now stale
|
||||
records will not be read from the page file. The {\tt ``flush()''}
|
||||
operation is called whenever a modified object is evicted from the
|
||||
cache. This operation updates the object in the buffer pool (and
|
||||
therefore the page file), likely incurring the cost of a disk {\em
|
||||
read} to pull in the page, and a {\em write} to evict another page
|
||||
from the relative small buffer pool. Multiple modifications that
|
||||
update an object can then incur relatively inexpensive log additions,
|
||||
and are then coalesced into a single update to the page file.
|
||||
|
||||
\eab{are you arguing that the client doesn't need to read the record in the page file, or doesn't need to read the object at all?}
|
||||
\yad provides a few mechanisms to handle undo records in the context
|
||||
of object serialization. The first is to use a single transaction for
|
||||
each object modification, avoiding the cost of generating or logging
|
||||
any undo records. No other transactional system that we know of allows
|
||||
this type of optimization. The second option is to assume that the
|
||||
application will provide the necessary undo information along with the
|
||||
update, which would generate an ``undiff'' log record for each update
|
||||
operation, but would still avoid the need to read or update the page
|
||||
file.
|
||||
|
||||
\begin{figure}
|
||||
\includegraphics[width=1\columnwidth]{mem-pressure.pdf}
|
||||
\caption{\label{fig:mem-pressure}Memory pressure...}
|
||||
\end{figure}
|
||||
|
||||
\eab{I don't get this section either...}
|
||||
|
||||
First, the application might not be interested in transactional
|
||||
atomicity. In this case, by writing no-op undo information instead of
|
||||
real undo log entries, \yad could guarantee that some prefix of the
|
||||
log will be applied to the page file after recovery. The redo
|
||||
information is already available: the object is in the application's
|
||||
cache. ``Transactions'' could still be durable, as commit() could be
|
||||
used to force the log to disk.
|
||||
|
||||
Second, the application could provide the undo information to \yad.
|
||||
This could be implemented in a straightforward manner by adding
|
||||
special accessor methods to the object which generate undo information
|
||||
as the object is updated in memory. For our benchmarks, we opted for
|
||||
the first approach.
|
||||
|
||||
We have removed the need to use the on-disk version of the object to
|
||||
generate log entries, but still need to guarantee that the application
|
||||
will not attempt to read a stale record from the page file. We use
|
||||
the cache to guarantee this. In order to service a write
|
||||
request made by the application, the cache calls a special
|
||||
``update()'' operation that only writes a log entry, but does not
|
||||
update the page file. If the
|
||||
cache must evict an object, it performs a special ``flush()''
|
||||
operation. This method writes the object to the buffer pool (and
|
||||
probably incurs the cost of a disk {\em read}), using a LSN recorded by the
|
||||
most recent update() call that was associated with the object. Since
|
||||
\yad implements no-force, it does not matter if the
|
||||
version of the object in the page file is stale. The idea that the
|
||||
current version is available outside of transactional storage,
|
||||
typically in a cache, seems broadly useful.
|
||||
The third option is to relax the atomicity requirements for a set of
|
||||
object updates, and again avoid generating any undo records. This
|
||||
assumes that the application cannot use abort, and is willing to
|
||||
accept that a prefix of the logged updates will be applied to the page
|
||||
file after recovery. These ``transactions'' would still be durable, as
|
||||
commit() could force the log to disk. For the benchmarks below, we
|
||||
opted for this approach, as it is the most aggressive and would be the
|
||||
most difficult to implement in another storage system.
|
||||
|
||||
\subsection{Recovery and Log Truncation}
|
||||
|
||||
|
@ -1865,33 +1892,69 @@ Nothing stops our current scheme from breaking this invariant.
|
|||
|
||||
We have two solutions to this problem. One solution is to
|
||||
implement a cache eviction policy that respects the ordering of object
|
||||
updates on a per-page basis. Instead of interfering with the eviction policy
|
||||
of the cache (and keeping with the theme of this paper), we sought a
|
||||
solution that leverages \yad's interfaces instead.
|
||||
updates on a per-page basis.
|
||||
However, this approach would impose an unnatural restriction on the
|
||||
cache replacement policy, and would likely suffer from performance
|
||||
impacts resulting from the (arbitrary) manner in which \yad allocates
|
||||
objects to pages.
|
||||
|
||||
We can force \yad to ignore page LSN values when considering our
|
||||
special update() log entries during the REDO phase of recovery. This
|
||||
The second solution is to
|
||||
force \yad to ignore the page LSN values when considering
|
||||
special {\tt update()} log entries during the REDO phase of recovery. This
|
||||
forces \yad to re-apply the diffs in the same order in which the application
|
||||
generated them. This works as intended because we use an
|
||||
idempotent diff format that will produce the correct result even if we
|
||||
start with a copy of the object that is newer than the first diff that
|
||||
we apply.
|
||||
|
||||
The only remaining detail is to implement a custom checkpointing
|
||||
algorithm that understands the page cache. In order to produce a
|
||||
To avoid needing to replay the entire log on recovery, we add a custom
|
||||
checkpointing algorithm that interacts with the page cache.
|
||||
To produce a
|
||||
fuzzy checkpoint, we simply iterate over the object pool, calculating
|
||||
the minimum LSN of the objects in the pool.\footnote{This LSN is distinct from
|
||||
the one used by flush(); it is the LSN of the object's {\em first}
|
||||
call to update() after the object was added to the cache.} At this
|
||||
point, we can invoke a normal ARIES checkpoint with the restriction
|
||||
the minimum LSN of the {\em first} call to update() on any object in
|
||||
the pool (that has not yet called flush()).
|
||||
We can then invoke a normal ARIES checkpoint with the restriction
|
||||
that the log is not truncated past the minimum LSN encountered in the
|
||||
object pool.\footnote{We do not yet enfore this checkpoint limitation.}
|
||||
A background process that calls flush() for all objects in the cache
|
||||
allows efficient log truncation without blocking any high-priority
|
||||
operations.
|
||||
|
||||
\subsection{Evaluation}
|
||||
|
||||
We implemented a \yad plugin for OASYS, a C++ object serialization
|
||||
library includes various object serialization backends, including one
|
||||
for Berkeley DB. The \yad plugin makes use of the optimizations
|
||||
\begin{figure*}
|
||||
\includegraphics[%
|
||||
width=1\columnwidth]{mem-pressure.pdf}
|
||||
\includegraphics[%
|
||||
width=1\columnwidth]{mem-pressure.pdf}
|
||||
\caption{\label{fig:OASYS} \yad optimizations for object
|
||||
serialization. The first graph shows the effectiveness of both the
|
||||
diff-based log records and the update/flush optimization as a function
|
||||
of the portion of each object that is modified. The second graph
|
||||
disables the filesystem buffer cache (via O\_DIRECT) and shows the
|
||||
benefits of the update/flush optimization when there is memory
|
||||
pressure.}
|
||||
\end{figure*}
|
||||
|
||||
We implemented a \yad plugin for \oasys, a C++ object serialization
|
||||
library that includes various object serialization backends, including
|
||||
one for Berkeley DB. We set up an experiment in which objects are
|
||||
retrieved from a cache according to a hot-set distribution\footnote{In
|
||||
an example hot-set distribution, 10\% of the objects (the hot set) are
|
||||
selected 90\% of the time.} and then have certain fields modified. The
|
||||
object cache size is set to twice the size of the hot set, and all
|
||||
experiments were run with identical cache sizings and random seeds for
|
||||
both Berkeley DB and the various \yad configurations.
|
||||
|
||||
The first graph in Figure \ref{fig:OASYS} shows the time to perform
|
||||
100,000 updates to the object as we vary the fraction of the object
|
||||
data that is modified in each update. In the most extreme case, when
|
||||
only one integer field from an ~1KB object is modified, the fully
|
||||
optimized \yad shows a threefold speedup over Berkeley DB.
|
||||
|
||||
and \ref{fig:oasys-mem}
|
||||
|
||||
The \yad plugin makes use of the optimizations
|
||||
described in this section, and was used to generate Figure~[TODO].
|
||||
For comparison, we also implemented a non-optimized \yad plugin to
|
||||
directly measure the effect of our optimizations.
|
||||
|
@ -1914,6 +1977,14 @@ complex, the simplicity of the implementation is encouraging.
|
|||
|
||||
\rcs{analyse OASYS data.}
|
||||
|
||||
test 1: small oasys buffer cache (23 pages), O\_DIRECT turned on
|
||||
|
||||
The test used 5000 objects, a cache size of 20\% of the objects, and a
|
||||
hot set size of 10\% of the objects. Turns out that ratio is actually
|
||||
necessary to achieve the desired effects, otherwise you will evict hot
|
||||
objects more than you want. 10000 iterations.
|
||||
|
||||
|
||||
This section uses:
|
||||
|
||||
\begin{enumerate}
|
||||
|
@ -2168,6 +2239,9 @@ and reliable.
|
|||
|
||||
\section{Conclusion}
|
||||
|
||||
\mjd{need to search and replace for ``lladd'' and ``oasys''}
|
||||
|
||||
|
||||
\rcs{write conclusion section}
|
||||
|
||||
\begin{thebibliography}{99}
|
||||
|
|
Loading…
Reference in a new issue