update general benchmark and sections

This commit is contained in:
Mike Demmer 2005-03-25 05:35:23 +00:00
parent b390e1388e
commit 7060ce6c51

View file

@ -35,7 +35,6 @@
\maketitle
%\subsection*{Abstract}
\rcs{Should we add a
@ -1265,6 +1264,7 @@ that most strongly differentiates \yad from other, similar libraries.
\section{Experimental setup}
\label{sec:experimental_setup}
The following sections describe the design and implementation of
non-trivial functionality using \yad, and use Berkeley DB for
@ -1274,27 +1274,40 @@ similar to \yad. Also, it is available both in open-source form, and as a
commercially maintained and supported program. Finally, it has been
designed for high-performance, high-concurrency environments.
All benchmarks were run on and Intel .... \rcs{@todo} with the
following Berkeley DB flags enabled \rcs{@todo}. We used the copy
of Berkeley DB 4.2.52 as it existed in Debian Linux's testing
branch during March of 2005. These flags were chosen to match
All benchmarks were run on an Intel Xeon 2.8 GHz with 1GB of RAM and a
10K RPM SCSI drive, formatted with reiserfs\footnote{We found that
the relative performance of Berkeley DB and \yad is highly sensitive
to filesystem choice, and we plan to investigate the reasons why the
performance of \yad under ext3 is degraded. However, the results
relating to the \yad optimizations are consistent across filesystem
types.}.
All reported numbers
correspond to the mean of multiple runs and represent a 95\%
confidence interval with a standard deviation of +/- 5\%.
\mjd{Eric: Please reword the above to be accurate}
We used Berkeley DB 4.2.52 as it existed in Debian Linux's testing
branch during March of 2005, with the flags DB\_TXN\_SYNC, and DB\_THREAD
enabled. These flags were chosen to match
Berkeley DB's configuration to \yad's as closely as possible. In cases where
Berkeley DB implements a feature that is not provided by \yad, we
enable the feature if it improves Berkeley DB's performance, but
disable the feature if it degrades Berkeley DB's performance. With
the exception of \yad's optimized serialization mechanism in the
OASYS test, the two libraries provide the same set of transactional
\oasys test (see Section \ref{OASYS}),
the two libraries provide the same set of transactional
semantics during each test.
Optimizations to Berkeley DB that we performed included disabling the
lock manager (we still use ``Free Threaded'' handles for all tests.
lock manager, though we still use ``Free Threaded'' handles for all tests.
This yielded a significant increase in performance because it removed
the possbility of transaction deadlock, abort and repetition.
However, after introducing this optimization high concurrency Berkeley
DB benchmarks became unstable, suggesting that we are calling the
the possiblity of transaction deadlock, abort, and repetition.
However, after introducing this optimization, highly concurrent Berkeley
DB benchmarks became unstable, suggesting that we may be calling the
library incorrectly. We believe that this problem would only improve
Berkeley DB's performance in the benchmarks that we ran, so we
disabled the lock manager for our tests. Without this optimization,
Berkeley DB's performance in our benchmarks, so we
disabled the lock manager for all tests. Without this optimization,
Berkeley DB's performance for Figure~\ref{fig:TPS} strictly decreased as
concurrency increased because of lock contention and deadlock resolution.
@ -1733,7 +1746,7 @@ This section uses:
Object serialization performance is extremely important in modern web
application systems such as Enterprise Java Beans. Object
serialization is also a convenient way of adding persistant storage to
serialization is also a convenient way of adding persistent storage to
an existing application without developing an explicit file format or
dealing with low-level I/O interfaces.
@ -1741,39 +1754,61 @@ A simple object serialization scheme would bulk-write and bulk-read
sets of application objects to an operating system file. These
schemes suffer from high read and write latency, and do not handle
small updates well. More sophisticated schemes store each object in a
seperate randomly accessible record, such as a database tuple, or
Berkeley DB hashtable entry. These schemes allow for fast single
seperate, randomly accessible record, such as a database tuple or
a Berkeley DB hashtable entry. These schemes allow for fast single
object reads and writes, and are typically the solutions used by
application servers.
Unfortunately, most of these schemes ``double buffer'' application
data. Typically, the application maintains a set of in-memory objects
which may be accessed with low latency. The backing data store
maintains a seperate buffer pool which contains serialized versions of
the objects in memory, and corresponds to the on-disk representation
of the data. Accesses to objects that are only present in the buffer
One drawback of many such schemes is that any update typically
requires a full serialization of the entire object. In many
application scenarios, this can be highly inefficient, as it may be
that only a single field of a complex object has been modified.
Furthermore, most of these schemes ``double cache'' object
data. Typically, the application maintains a set of in-memory
objects in their unserialized form, so they can be accessed with low latency.
The backing data store also
maintains a separate in-memory buffer pool with the serialized versions of
some objects, as a cache of the on-disk data representation.
Accesses to objects that are only present in this buffer
pool incur medium latency, as they must be unmarshalled (deserialized)
before the application may access them. Finally, some objects may
only reside on disk, and require a disk read.
before the application may access them. There is often yet a third
copy of the serialized data in the filesystem's buffer cache.
%Finally, some objects may
%only reside on disk, and require a disk read.
%Since these applications are typically data-centric, it is important
%to make efficient use of system memory in order to reduce hardware
%costs.
A straightforward solution to this problem would be to bound
the amount of memory the application may consume by preventing it from
caching deserialized objects. This scheme conserves memory, but it
incurs the cost of an in-memory deserialization to read the object,
and an in-memory deserialization/serialization cycle to write to an
object.
For I/O bound applications, efficient use of in-memory caching is
well-known to be critical to performance. Note that for these schemes,
the memory consumed by the buffer pool is basically redundant, since
it just caches the translated form of the object so it can be read or
written to disk. However, naively restricting the memory consumed by
the buffer pool results in poor performance in existing transactional
storage systems. This is due to the fact that an object update must
update the current state of the backing store, which typically
requires reading in the old copy of the page on which the object is
stored to update the object data.
Alternatively, the amount of memory consumed by the buffer pool could
be bounded to some small value, and the application could maintain a
large object cache. This scheme would incur no overhead for a read
request. However, it would incur the overhead of a disk-based
serialization in order to service a write request.\footnote{In
practice, the transactional backing store would probably fetch the
page that contains the object from disk, causing two disk I/O's.}
%% A straightforward solution to this problem would be to bound
%% the amount of memory the application may consume by preventing it from
%% caching deserialized objects. This scheme conserves memory, but it
%% incurs the cost of an in-memory deserialization to read the object,
%% and an in-memory deserialization/serialization cycle to write to an
%% object.
%% Alternatively, the amount of memory consumed by the buffer pool could
%% be bounded to some small value, and the application could maintain a
%% large object cache. This scheme would incur no overhead for a read
%% request. However, it would incur the overhead of a disk-based
%% serialization in order to service a write request.\footnote{In
%% practice, the transactional backing store would probably fetch the
%% page that contains the object from disk, causing two disk I/O's.}
\subsection{\yad Optimizations}
\yad's architecture allows us to apply two interesting optimizations
to object serialization. First, since \yad supports
@ -1781,13 +1816,16 @@ custom log entries, it is trivial to have it store diffs of objects to
the log instead of writing the entire object to log during an update.
Such an optimization would be difficult to achieve with Berkeley DB,
but could be performed by a database server if the fields of the
objects were broken into database table columns. It is unclear if
objects were broken into database table columns.
\footnote{It is unclear if
this optimization would outweigh the overheads associated with an SQL
based interface. Depending on the database server, it may be
necessary to issue a SQL update query that only updates a subset of a
tuple's fields in order to generate a diff-based log entry. Doing so
would preclude the use of prepared statements, or would require a large
number of prepared statements to be maintained by the DBMS.
number of prepared statements to be maintained by the DBMS. We plan to
investigate the overheads of SQL in this context in the future.}
% If IPC or
%the network is being used to comminicate with the DBMS, then it is very
%likely that a seperate prepared statement for each type of diff that the
@ -1802,57 +1840,46 @@ number of prepared statements to be maintained by the DBMS.
The second optimization is a bit more sophisticated, but still easy to
implement in \yad. We do not believe that it would be possible to
achieve using existing relational database systems or with Berkeley
DB.
DB. This optimization allows us to drastically limit the size of the
\yad buffer cache, yet still achieve good performance.
\yad services a request to write to a record by pinning (and possibly
reading in) a page, generating a log entry, writing the
new record value to the page, and unpinning the page.
The basic idea of this optimization is to postpone expensive
operations that update the page file for objects that are frequently
modified, relying on some support from the application's object cache
to maintain the transactional semantics.
If \yad knows that the client will not ask to read the record, then
there is no real reason to update the version of the record in the
page file. In fact, if no undo or redo information needs to be
generated, there is no need to bring the page into memory in
order to service a write.
There are at least two scenarios that allow \yad to avoid loading the page.
To implement this, we added two custom \yad operations. The
{\tt``update()''} operation is called when an object is modified and
still exists in the object cache. This causes a log entry to be
written, but does not update the page file. The fact that the modified
object still resides in the object cache guarantees that the now stale
records will not be read from the page file. The {\tt ``flush()''}
operation is called whenever a modified object is evicted from the
cache. This operation updates the object in the buffer pool (and
therefore the page file), likely incurring the cost of a disk {\em
read} to pull in the page, and a {\em write} to evict another page
from the relative small buffer pool. Multiple modifications that
update an object can then incur relatively inexpensive log additions,
and are then coalesced into a single update to the page file.
\eab{are you arguing that the client doesn't need to read the record in the page file, or doesn't need to read the object at all?}
\yad provides a few mechanisms to handle undo records in the context
of object serialization. The first is to use a single transaction for
each object modification, avoiding the cost of generating or logging
any undo records. No other transactional system that we know of allows
this type of optimization. The second option is to assume that the
application will provide the necessary undo information along with the
update, which would generate an ``undiff'' log record for each update
operation, but would still avoid the need to read or update the page
file.
\begin{figure}
\includegraphics[width=1\columnwidth]{mem-pressure.pdf}
\caption{\label{fig:mem-pressure}Memory pressure...}
\end{figure}
\eab{I don't get this section either...}
First, the application might not be interested in transactional
atomicity. In this case, by writing no-op undo information instead of
real undo log entries, \yad could guarantee that some prefix of the
log will be applied to the page file after recovery. The redo
information is already available: the object is in the application's
cache. ``Transactions'' could still be durable, as commit() could be
used to force the log to disk.
Second, the application could provide the undo information to \yad.
This could be implemented in a straightforward manner by adding
special accessor methods to the object which generate undo information
as the object is updated in memory. For our benchmarks, we opted for
the first approach.
We have removed the need to use the on-disk version of the object to
generate log entries, but still need to guarantee that the application
will not attempt to read a stale record from the page file. We use
the cache to guarantee this. In order to service a write
request made by the application, the cache calls a special
``update()'' operation that only writes a log entry, but does not
update the page file. If the
cache must evict an object, it performs a special ``flush()''
operation. This method writes the object to the buffer pool (and
probably incurs the cost of a disk {\em read}), using a LSN recorded by the
most recent update() call that was associated with the object. Since
\yad implements no-force, it does not matter if the
version of the object in the page file is stale. The idea that the
current version is available outside of transactional storage,
typically in a cache, seems broadly useful.
The third option is to relax the atomicity requirements for a set of
object updates, and again avoid generating any undo records. This
assumes that the application cannot use abort, and is willing to
accept that a prefix of the logged updates will be applied to the page
file after recovery. These ``transactions'' would still be durable, as
commit() could force the log to disk. For the benchmarks below, we
opted for this approach, as it is the most aggressive and would be the
most difficult to implement in another storage system.
\subsection{Recovery and Log Truncation}
@ -1865,33 +1892,69 @@ Nothing stops our current scheme from breaking this invariant.
We have two solutions to this problem. One solution is to
implement a cache eviction policy that respects the ordering of object
updates on a per-page basis. Instead of interfering with the eviction policy
of the cache (and keeping with the theme of this paper), we sought a
solution that leverages \yad's interfaces instead.
updates on a per-page basis.
However, this approach would impose an unnatural restriction on the
cache replacement policy, and would likely suffer from performance
impacts resulting from the (arbitrary) manner in which \yad allocates
objects to pages.
We can force \yad to ignore page LSN values when considering our
special update() log entries during the REDO phase of recovery. This
The second solution is to
force \yad to ignore the page LSN values when considering
special {\tt update()} log entries during the REDO phase of recovery. This
forces \yad to re-apply the diffs in the same order in which the application
generated them. This works as intended because we use an
idempotent diff format that will produce the correct result even if we
start with a copy of the object that is newer than the first diff that
we apply.
The only remaining detail is to implement a custom checkpointing
algorithm that understands the page cache. In order to produce a
To avoid needing to replay the entire log on recovery, we add a custom
checkpointing algorithm that interacts with the page cache.
To produce a
fuzzy checkpoint, we simply iterate over the object pool, calculating
the minimum LSN of the objects in the pool.\footnote{This LSN is distinct from
the one used by flush(); it is the LSN of the object's {\em first}
call to update() after the object was added to the cache.} At this
point, we can invoke a normal ARIES checkpoint with the restriction
the minimum LSN of the {\em first} call to update() on any object in
the pool (that has not yet called flush()).
We can then invoke a normal ARIES checkpoint with the restriction
that the log is not truncated past the minimum LSN encountered in the
object pool.\footnote{We do not yet enfore this checkpoint limitation.}
A background process that calls flush() for all objects in the cache
allows efficient log truncation without blocking any high-priority
operations.
\subsection{Evaluation}
We implemented a \yad plugin for OASYS, a C++ object serialization
library includes various object serialization backends, including one
for Berkeley DB. The \yad plugin makes use of the optimizations
\begin{figure*}
\includegraphics[%
width=1\columnwidth]{mem-pressure.pdf}
\includegraphics[%
width=1\columnwidth]{mem-pressure.pdf}
\caption{\label{fig:OASYS} \yad optimizations for object
serialization. The first graph shows the effectiveness of both the
diff-based log records and the update/flush optimization as a function
of the portion of each object that is modified. The second graph
disables the filesystem buffer cache (via O\_DIRECT) and shows the
benefits of the update/flush optimization when there is memory
pressure.}
\end{figure*}
We implemented a \yad plugin for \oasys, a C++ object serialization
library that includes various object serialization backends, including
one for Berkeley DB. We set up an experiment in which objects are
retrieved from a cache according to a hot-set distribution\footnote{In
an example hot-set distribution, 10\% of the objects (the hot set) are
selected 90\% of the time.} and then have certain fields modified. The
object cache size is set to twice the size of the hot set, and all
experiments were run with identical cache sizings and random seeds for
both Berkeley DB and the various \yad configurations.
The first graph in Figure \ref{fig:OASYS} shows the time to perform
100,000 updates to the object as we vary the fraction of the object
data that is modified in each update. In the most extreme case, when
only one integer field from an ~1KB object is modified, the fully
optimized \yad shows a threefold speedup over Berkeley DB.
and \ref{fig:oasys-mem}
The \yad plugin makes use of the optimizations
described in this section, and was used to generate Figure~[TODO].
For comparison, we also implemented a non-optimized \yad plugin to
directly measure the effect of our optimizations.
@ -1914,6 +1977,14 @@ complex, the simplicity of the implementation is encouraging.
\rcs{analyse OASYS data.}
test 1: small oasys buffer cache (23 pages), O\_DIRECT turned on
The test used 5000 objects, a cache size of 20\% of the objects, and a
hot set size of 10\% of the objects. Turns out that ratio is actually
necessary to achieve the desired effects, otherwise you will evict hot
objects more than you want. 10000 iterations.
This section uses:
\begin{enumerate}
@ -2168,6 +2239,9 @@ and reliable.
\section{Conclusion}
\mjd{need to search and replace for ``lladd'' and ``oasys''}
\rcs{write conclusion section}
\begin{thebibliography}{99}