update general benchmark and sections

2005-03-25 05:35:23 +00:00 · 2005-03-25 05:35:23 +00:00 · 7060ce6c51
commit 7060ce6c51
parent b390e1388e
1 changed files with 173 additions and 99 deletions
--- a/doc/paper2/LLADD.tex
+++ b/doc/paper2/LLADD.tex
@ -35,7 +35,6 @@
 \maketitle


-
 %\subsection*{Abstract}

 \rcs{Should we add a
@ -1265,6 +1264,7 @@ that most strongly differentiates \yad from other, similar libraries.


 \section{Experimental setup}
+\label{sec:experimental_setup}

 The following sections describe the design and implementation of
 non-trivial functionality using \yad, and use Berkeley DB for
@ -1274,27 +1274,40 @@ similar to \yad.  Also, it is available both in open-source form, and as a
 commercially maintained and supported program.  Finally, it has been 
 designed for high-performance, high-concurrency environments.

-All benchmarks were run on and Intel .... \rcs{@todo} with the
-following Berkeley DB flags enabled \rcs{@todo}.  We used the copy
-of Berkeley DB 4.2.52 as it existed in Debian Linux's testing
-branch during March of 2005.  These flags were chosen to match 
+All benchmarks were run on an Intel Xeon 2.8 GHz with 1GB of RAM and a
+10K RPM SCSI drive, formatted with reiserfs\footnote{We found that
+the relative performance of Berkeley DB and \yad is highly sensitive
+to filesystem choice, and we plan to investigate the reasons why the
+performance of \yad under ext3 is degraded. However, the results
+relating to the \yad optimizations are consistent across filesystem
+types.}.
+All reported numbers
+correspond to the mean of multiple runs and represent a 95\%
+confidence interval with a standard deviation of +/- 5\%.
+
+\mjd{Eric: Please reword the above to be accurate}
+
+We used Berkeley DB 4.2.52 as it existed in Debian Linux's testing
+branch during March of 2005, with the flags DB\_TXN\_SYNC, and DB\_THREAD
+enabled. These flags were chosen to match 
 Berkeley DB's configuration to \yad's as closely as possible.  In cases where
 Berkeley DB implements a feature that is not provided by \yad, we
 enable the feature if it improves Berkeley DB's performance, but
 disable the feature if it degrades Berkeley DB's performance.  With 
 the exception of \yad's optimized serialization mechanism in the 
-OASYS test, the two libraries provide the same set of transactional 
+\oasys test (see Section \ref{OASYS}), 
+the two libraries provide the same set of transactional 
 semantics during each test.  

 Optimizations to Berkeley DB that we performed included disabling the
-lock manager (we still use ``Free Threaded'' handles for all tests.
+lock manager, though we still use ``Free Threaded'' handles for all tests.
 This yielded a significant increase in performance because it removed
-the possbility of transaction deadlock, abort and repetition.
-However, after introducing this optimization high concurrency Berkeley
-DB benchmarks became unstable, suggesting that we are calling the
+the possiblity of transaction deadlock, abort, and repetition.
+However, after introducing this optimization, highly concurrent Berkeley
+DB benchmarks became unstable, suggesting that we may be calling the
 library incorrectly.  We believe that this problem would only improve
-Berkeley DB's performance in the benchmarks that we ran, so we
-disabled the lock manager for our tests.  Without this optimization,
+Berkeley DB's performance in our benchmarks, so we
+disabled the lock manager for all tests.  Without this optimization,
 Berkeley DB's performance for Figure~\ref{fig:TPS} strictly decreased as
 concurrency increased because of lock contention and deadlock resolution.

@ -1733,7 +1746,7 @@ This section uses:

 Object serialization performance is extremely important in modern web
 application systems such as Enterprise Java Beans.  Object
-serialization is also a convenient way of adding persistant storage to
+serialization is also a convenient way of adding persistent storage to
 an existing application without developing an explicit file format or
 dealing with low-level I/O interfaces.

@ -1741,39 +1754,61 @@ A simple object serialization scheme would bulk-write and bulk-read
 sets of application objects to an operating system file.  These
 schemes suffer from high read and write latency, and do not handle
 small updates well.  More sophisticated schemes store each object in a
-seperate randomly accessible record, such as a database tuple, or
-Berkeley DB hashtable entry.  These schemes allow for fast single
+seperate, randomly accessible record, such as a database tuple or
+a Berkeley DB hashtable entry.  These schemes allow for fast single
 object reads and writes, and are typically the solutions used by
 application servers.

-Unfortunately, most of these schemes ``double buffer'' application
-data.  Typically, the application maintains a set of in-memory objects
-which may be accessed with low latency.  The backing data store
-maintains a seperate buffer pool which contains serialized versions of
-the objects in memory, and corresponds to the on-disk representation
-of the data.  Accesses to objects that are only present in the buffer
+One drawback of many such schemes is that any update typically
+requires a full serialization of the entire object. In many
+application scenarios, this can be highly inefficient, as it may be
+that only a single field of a complex object has been modified.
+
+Furthermore, most of these schemes ``double cache'' object
+data.  Typically, the application maintains a set of in-memory
+objects in their unserialized form, so they can be accessed with low latency.
+The backing data store also
+maintains a separate in-memory buffer pool with the serialized versions of
+some objects, as a cache of the on-disk data representation.
+Accesses to objects that are only present in this buffer
 pool incur medium latency, as they must be unmarshalled (deserialized)
-before the application may access them.  Finally, some objects may
-only reside on disk, and require a disk read.
+before the application may access them. There is often yet a third
+copy of the serialized data in the filesystem's buffer cache.
+
+%Finally, some objects may
+%only reside on disk, and require a disk read.

 %Since these applications are typically data-centric, it is important
 %to make efficient use of system memory in order to reduce hardware
 %costs. 

-A straightforward solution to this problem would be to bound
-the amount of memory the application may consume by preventing it from
-caching deserialized objects.  This scheme conserves memory, but it
-incurs the cost of an in-memory deserialization to read the object,
-and an in-memory deserialization/serialization cycle to write to an
-object.
+For I/O bound applications, efficient use of in-memory caching is
+well-known to be critical to performance. Note that for these schemes,
+the memory consumed by the buffer pool is basically redundant, since
+it just caches the translated form of the object so it can be read or
+written to disk. However, naively restricting the memory consumed by
+the buffer pool results in poor performance in existing transactional
+storage systems. This is due to the fact that an object update must
+update the current state of the backing store, which typically
+requires reading in the old copy of the page on which the object is
+stored to update the object data.

-Alternatively, the amount of memory consumed by the buffer pool could
-be bounded to some small value, and the application could maintain a
-large object cache.  This scheme would incur no overhead for a read
-request.  However, it would incur the overhead of a disk-based
-serialization in order to service a write request.\footnote{In
-practice, the transactional backing store would probably fetch the
-page that contains the object from disk, causing two disk I/O's.}
+%% A straightforward solution to this problem would be to bound
+%% the amount of memory the application may consume by preventing it from
+%% caching deserialized objects.  This scheme conserves memory, but it
+%% incurs the cost of an in-memory deserialization to read the object,
+%% and an in-memory deserialization/serialization cycle to write to an
+%% object.
+
+%% Alternatively, the amount of memory consumed by the buffer pool could
+%% be bounded to some small value, and the application could maintain a
+%% large object cache.  This scheme would incur no overhead for a read
+%% request.  However, it would incur the overhead of a disk-based
+%% serialization in order to service a write request.\footnote{In
+%% practice, the transactional backing store would probably fetch the
+%% page that contains the object from disk, causing two disk I/O's.}
+
+\subsection{\yad Optimizations}

 \yad's architecture allows us to apply two interesting optimizations
 to object serialization.  First, since \yad supports
@ -1781,13 +1816,16 @@ custom log entries, it is trivial to have it store diffs of objects to
 the log instead of writing the entire object to log during an update.
 Such an optimization would be difficult to achieve with Berkeley DB,
 but could be performed by a database server if the fields of the
-objects were broken into database table columns.  It is unclear if
+objects were broken into database table columns. 
+\footnote{It is unclear if
 this optimization would outweigh the overheads associated with an SQL
 based interface.  Depending on the database server, it may be
 necessary to issue a SQL update query that only updates a subset of a
 tuple's fields in order to generate a diff-based log entry.  Doing so
 would preclude the use of prepared statements, or would require a large
-number of prepared statements to be maintained by the DBMS.
+number of prepared statements to be maintained by the DBMS. We plan to
+investigate the overheads of SQL in this context in the future.}
+
 %  If IPC or 
 %the network is being used to comminicate with the DBMS, then it is very
 %likely that a seperate prepared statement for each type of diff that the 
@ -1802,57 +1840,46 @@ number of prepared statements to be maintained by the DBMS.
 The second optimization is a bit more sophisticated, but still easy to
 implement in \yad.  We do not believe that it would be possible to
 achieve using existing relational database systems or with Berkeley
-DB.  
+DB.  This optimization allows us to drastically limit the size of the
+\yad buffer cache, yet still achieve good performance.

-\yad services a request to write to a record by pinning (and possibly
-reading in) a page, generating a log entry, writing the
-new record value to the page, and unpinning the page.
+The basic idea of this optimization is to postpone expensive
+operations that update the page file for objects that are frequently
+modified, relying on some support from the application's object cache
+to maintain the transactional semantics.

-If \yad knows that the client will not ask to read the record, then
-there is no real reason to update the version of the record in the
-page file.  In fact, if no undo or redo information needs to be
-generated, there is no need to bring the page into memory in 
-order to service a write.
-There are at least two scenarios that allow \yad to avoid loading the page.
+To implement this, we added two custom \yad operations. The
+{\tt``update()''} operation is called when an object is modified and
+still exists in the object cache. This causes a log entry to be
+written, but does not update the page file. The fact that the modified
+object still resides in the object cache guarantees that the now stale
+records will not be read from the page file. The {\tt ``flush()''}
+operation is called whenever a modified object is evicted from the
+cache. This operation updates the object in the buffer pool (and
+therefore the page file), likely incurring the cost of a disk {\em
+read} to pull in the page, and a {\em write} to evict another page
+from the relative small buffer pool. Multiple modifications that
+update an object can then incur relatively inexpensive log additions,
+and are then coalesced into a single update to the page file.

-\eab{are you arguing that the client doesn't need to read the record in the page file, or doesn't need to read the object at all?}
+\yad provides a few mechanisms to handle undo records in the context
+of object serialization. The first is to use a single transaction for
+each object modification, avoiding the cost of generating or logging
+any undo records. No other transactional system that we know of allows
+this type of optimization. The second option is to assume that the
+application will provide the necessary undo information along with the
+update, which would generate an ``undiff'' log record for each update
+operation, but would still avoid the need to read or update the page
+file.

-\begin{figure}
-\includegraphics[width=1\columnwidth]{mem-pressure.pdf}
-\caption{\label{fig:mem-pressure}Memory pressure...}
-\end{figure}
-
-\eab{I don't get this section either...}
-
-First, the application might not be interested in transactional
-atomicity.  In this case, by writing no-op undo information instead of
-real undo log entries, \yad could guarantee that some prefix of the
-log will be applied to the page file after recovery.  The redo
-information is already available: the object is in the application's
-cache.  ``Transactions'' could still be durable, as commit() could be
-used to force the log to disk.  
-
-Second, the application could provide the undo information to \yad.
-This could be implemented in a straightforward manner by adding
-special accessor methods to the object which generate undo information
-as the object is updated in memory.  For our benchmarks, we opted for
-the first approach.
-
-We have removed the need to use the on-disk version of the object to
-generate log entries, but still need to guarantee that the application
-will not attempt to read a stale record from the page file.  We use
-the cache to guarantee this.  In order to service a write
-request made by the application, the cache calls a special
-``update()'' operation that only writes a log entry, but does not 
-update the page file.  If the
-cache must evict an object, it performs a special ``flush()''
-operation.  This method writes the object to the buffer pool (and
-probably incurs the cost of a disk {\em read}), using a LSN recorded by the
-most recent update() call that was associated with the object.  Since
-\yad implements no-force, it does not matter if the
-version of the object in the page file is stale. The idea that the
-current version is available outside of transactional storage, 
-typically in a cache, seems broadly useful.
+The third option is to relax the atomicity requirements for a set of
+object updates, and again avoid generating any undo records. This
+assumes that the application cannot use abort, and is willing to
+accept that a prefix of the logged updates will be applied to the page
+file after recovery. These ``transactions'' would still be durable, as
+commit() could force the log to disk. For the benchmarks below, we
+opted for this approach, as it is the most aggressive and would be the
+most difficult to implement in another storage system.

 \subsection{Recovery and Log Truncation}

@ -1865,33 +1892,69 @@ Nothing stops our current scheme from breaking this invariant.

 We have two solutions to this problem.  One solution is to
 implement a cache eviction policy that respects the ordering of object
-updates on a per-page basis.  Instead of interfering with the eviction policy
-of the cache (and keeping with the theme of this paper), we sought a
-solution that leverages \yad's interfaces instead.
+updates on a per-page basis.  
+However, this approach would impose an unnatural restriction on the
+cache replacement policy, and would likely suffer from performance
+impacts resulting from the (arbitrary) manner in which \yad allocates
+objects to pages.

-We can force \yad to ignore page LSN values when considering our
-special update() log entries during the REDO phase of recovery.  This
+The second solution is to 
+force \yad to ignore the page LSN values when considering
+special {\tt update()} log entries during the REDO phase of recovery.  This
 forces \yad to re-apply the diffs in the same order in which the application
 generated them.  This works as intended because we use an
 idempotent diff format that will produce the correct result even if we
 start with a copy of the object that is newer than the first diff that
 we apply.

-The only remaining detail is to implement a custom checkpointing
-algorithm that understands the page cache.  In order to produce a
+To avoid needing to replay the entire log on recovery, we add a custom
+checkpointing algorithm that interacts with the page cache.  
+To produce a
 fuzzy checkpoint, we simply iterate over the object pool, calculating
-the minimum LSN of the objects in the pool.\footnote{This LSN is distinct from
-the one used by flush(); it is the LSN of the object's {\em first}
-call to update() after the object was added to the cache.}  At this
-point, we can invoke a normal ARIES checkpoint with the restriction
+the minimum LSN of the {\em first} call to update() on any object in
+the pool (that has not yet called flush()). 
+We can then invoke a normal ARIES checkpoint with the restriction
 that the log is not truncated past the minimum LSN encountered in the
 object pool.\footnote{We do not yet enfore this checkpoint limitation.}
+A background process that calls flush() for all objects in the cache
+allows efficient log truncation without blocking any high-priority 
+operations.

 \subsection{Evaluation}

-We implemented a \yad plugin for OASYS, a C++ object serialization
-library includes various object serialization backends, including one
-for Berkeley DB.  The \yad plugin makes use of the optimizations
+\begin{figure*}
+\includegraphics[%
+   width=1\columnwidth]{mem-pressure.pdf}
+\includegraphics[%
+   width=1\columnwidth]{mem-pressure.pdf}
+\caption{\label{fig:OASYS} \yad optimizations for object
+serialization. The first graph shows the effectiveness of both the
+diff-based log records and the update/flush optimization as a function
+of the portion of each object that is modified. The second graph
+disables the filesystem buffer cache (via O\_DIRECT) and shows the
+benefits of the update/flush optimization when there is memory
+pressure.}
+\end{figure*}
+
+We implemented a \yad plugin for \oasys, a C++ object serialization
+library that includes various object serialization backends, including
+one for Berkeley DB. We set up an experiment in which objects are
+retrieved from a cache according to a hot-set distribution\footnote{In
+an example hot-set distribution, 10\% of the objects (the hot set) are
+selected 90\% of the time.} and then have certain fields modified. The
+object cache size is set to twice the size of the hot set, and all
+experiments were run with identical cache sizings and random seeds for
+both Berkeley DB and the various \yad configurations.
+
+The first graph in Figure \ref{fig:OASYS} shows the time to perform
+100,000 updates to the object as we vary the fraction of the object
+data that is modified in each update. In the most extreme case, when
+only one integer field from an ~1KB object is modified, the fully
+optimized \yad shows a threefold speedup over Berkeley DB.
+
+and \ref{fig:oasys-mem} 
+
+The \yad plugin makes use of the optimizations
 described in this section, and was used to generate Figure~[TODO].
 For comparison, we also implemented a non-optimized \yad plugin to
 directly measure the effect of our optimizations.
@ -1914,6 +1977,14 @@ complex, the simplicity of the implementation is encouraging.

 \rcs{analyse OASYS data.}

+test 1: small oasys buffer cache (23 pages), O\_DIRECT turned on
+
+The test used 5000 objects, a cache size of 20\% of the objects, and a
+hot set size of 10\% of the objects. Turns out that ratio is actually
+necessary to achieve the desired effects, otherwise you will evict hot
+objects more than you want. 10000 iterations.
+
+
 This section uses:

 \begin{enumerate}
@ -2168,6 +2239,9 @@ and reliable.

 \section{Conclusion}

+\mjd{need to search and replace for ``lladd'' and ``oasys''}
+
+
 \rcs{write conclusion section}

 \begin{thebibliography}{99}