Added experimental setup description.

2005-03-21 02:40:00 +00:00 · 2005-03-21 02:40:00 +00:00 · 88a3d2aaf3
commit 88a3d2aaf3
parent 5cd520e9ac
1 changed files with 190 additions and 0 deletions
--- a/doc/paper2/LLADD.tex
+++ b/doc/paper2/LLADD.tex
@ -822,6 +822,196 @@ LLADD's linear hash table uses linked lists of overflow buckets.
 \section{Validation}
 \subsection{Conventional workloads}
 Existing database servers and transactional libraries are tuned to
 support OLTP (Online Transaction Processing) workloads well.  Roughly
 speaking, the workload of these systems is dominated by short
 transactions and response time is important.  We are confident that a
 sophisticated system based upon our approach to transactional storage
 will compete well in this area, as our algorithm is based upon ARIES,
 which is the foundation of IBM's DB/2 database.  However, our current
 implementation is geared toward simpler, specialized applications, so
 we cannot verify this directly.  Instead, we present a number of
 microbenchmarks that compare our system against Berkeley DB, the most
 popular transactional library.  Berkeley DB is a mature product and is
 actively maintained.  While it currently provides more functionality
 than our current implementation, we believe that our architecture
 could support a broader range of features than provided by BerkeleyDB,
 which is a monolithic system.
 The first test measures the throughput of a single long running
 transaction that generates an loads a synthetic data set into the
 library.  For comparison, we provide throughput for many different
 LLADD operations, and BerkeleyDB's DB\_HASH hashtable implementation,
 and lower level DB\_RECNO record number based interface.
@todo fill in numbers here.
 The second test measures the two library's ability to exploit
 concurrent transactions to reduce logging overhead.  Both systems
 implement a simple optimization that allows multiple calls to commit()
 to be serviced by a single synchronous disk request.  
@todo analysis
 The final test measures the maximum number of sustainable transactions
 per second for the two libraries.  In these cases, we generate a
 uniform number of transactions per second by spawning a fixed nuber of
 threads, and varying the number of requests each thread issues per
 second, and report the cumulative density of the distribution of
 response times for each case.
@todo analysis / come up with a more sane graph format.
 \subsection{Object Serialization}
 Object serialization performance is extremely important in modern web
 service systems such as EJB.  Object serialization is also a
 convenient way of adding persistant storage to an existing application
 without developing an explicit file format or dealing with low level
 I/O interfaces.
 A simple object serialization scheme would bulk-write and bulk-read
 sets of application objects to an operating system file.  These
 schemes suffer from high read and write latency, and do not handle
 small updates well.  More sophisticated schemes store each object in a
 seperate randomly accessible record, such as a database tuple, or
 Berkeley DB hashtable entry.  These schemes allow for fast single
 object reads and writes, and are typically the solutions used by
 application services.
 Unfortunately, most of these schemes ``double buffer'' application
 data.  Typically, the application maintains a set of in-memory objects
 which may be accessed with low latency.  The backing data store
 maintains a seperate buffer pool which contains serialized versions of
 the objects in memory, and corresponds to the on-disk representation
 of the data.  Accesses to objects that are only present in the buffer
 pool incur ``medium latency,'' as they must be deserialized before the
 application may access them.  Finally, some objects may only reside on
 disk, and may only be accessed with high latency.
 Since these applications are typically data-centric, it is important
 to make efficient use of system memory in order to reduce hardware
 costs.  A straightforward solution to this problem would be to bound
 the amount of memory the application may consume by preventing it from
 caching deserialized objects.  This scheme conserves memory, but it
 incurs the cost of an in-memory deserialization to read the object,
 and an in-memory deserialization/serialization cycle to write to an
 object.
 Alternatively, the amount of memory consumed by the buffer pool could
 be bounded to some small value, and the application could maintain a
 large object cache.  This scheme would incur no overhead for a read
 request.  However, it would incur the overhead of a disk-based
 serialization in order to service a write request.\footnote{In
 practice, the transactional backing store would probably fetch the
 page that contains the object from disk, causing two disk I/O's to be
 issued.}
 LLADD's architecture allows us to apply two interesting optimizations
 to such object serialization schemes.  First, since LLADD supports
 custom log entries, it is trivial to have it store diffs of objcts to
 the log instead of writing the entire object to log during an update.
 Such an optimization would be difficult to achieve with Berkeley DB,
 but could be performed by a database server if the fields of the
 objects were broken into database table columns.  It is unclear if
 this optimization would outweigh the overheads associated with an SQL
 based interface.
 % @todo WRITE SQL OASYS BENCHMARK!!
 The second optimization is a bit more sophisticated, but still easy to
 implement in LLADD.  We do not believe that it would be possible to
 achieve using existing relational database systems, or with Berkeley
 DB.  
 LLADD services a request to write to a record by pinning (and possibly
 reading in) the applicable page, generating a log entry, writing the
 new value of the record to the in-memory page, and unpinning the page.
 If LLADD knows that the client will not ask to read the record, then
 there is no real reason to update the version of the record in the
 page file.  In fact, if diff does not need to be generated,
 there is no need to have the page in memory at all.  We can think of
 two plausible reasons why a diff would be unnecessary.  
 First, the application may not be interested in transaction atomicity.
 In this case, by writing no-op undo records instead of real undo
 records, LLADD could guarantee that some prefix of the log will be
 applied to the page file after recovery.  The redo information is
 already available; the object is in the application's cache.
 ``Transactions'' could still be durable, as commit() could be used to
 force the log to disk.
 Second, the application could provide the undo record for LLADD.  This
 could be implemented in a straightforward manner by adding special
 accessor methods to the object which generate undo information as the
 object is updated in memory.
 We have removed the need to use the on-disk version of the object to
 generate log entries, but still need to guarantee that the application
 will not attempt to read a stale record from the page file.  This
 problem also has a simple solution.  In order to service a write
 request made by the application, the cache calls a special
 ``update()'' method.  This method only writes a log entry.  If the
 cache must evict an object from cache, it issues a special ``flush()''
 method.  This method writes the object to the buffer pool (and
 probably incurs the cost of disk I/O), using a LSN recorded by the
 most recent update() call that was associated with the object.  Since
 LLADD implements no-force, it does not matter to recovery if the
 version of the object in the page file is stale.
 An observant reader may have noticed a subtle problem with this
 scheme.  More than one object may reside on a page, and we do not
 constrain the order in which the cache calls flush() to evict objects.
 Recall that the version of the LSN on the page implies that all
 updates {\em up to} and including the page LSN have been applied.
 Nothing stops our current scheme from breaking this invariant.  
 We have two potential solutions to this problem.  One solution is to
 implement a cache eviction policy that respects the ordering of object
 updates on a per-page basis and could be implemented using one or
 more priority queues.  Instead of interfering with the eviction policy
 of the cache (and keeping with the theme of this paper), we sought a
 solution that leverages LLADD's interfaces instead.
 We can force LLADD to ignore page LSN values when considering our
 special update() log entries during the REDO phase of recovery.  This
 forces LLADD to re-apply the diffs in the same order the application
 generated them in.  This works as intended because we use an
 idempotent diff format that will produce the correct result even if we
 start with a copy of the object that is newer than the first diff that
 we apply.
 The only remaining detail is to implement a custom checkpointing
 algorithm that understands the page cache.  In order to produce a
 fuzzy checkpoint, we simply iterate over the object pool, calculating
 the minimum lsn of the objects in the pool.\footnote{This LSN is distinct from
 the one used by flush(); it is the lsn of the object's {\em first}
 call to update() after the object was added to the cache.}  At this
 point, we can invoke a normal ARIES checkpoint, with the restriction
 that the log is not truncated past the minimum LSN encountered in the
 object pool.\footnote{Because LLADD does not yet implement
 checkpointing, we have not implemented this checkpointing scheme.}
 We implemented a LLADD plugin for OASYS, a C++ object serialization
 library.  The plugin makes use of all of the optimizations mentioned
 in this section, and was used to generate Figure~[TODO].  Ignoring the
 checkpointing scheme and a small change needed in the recovery
 algorithm, the operations required for these two optimizations are
 roughly 150 lines of C code, including whitespace, comments and
 boilerplate function registrations.  While the reasoning required to
 ensure the correctness of this code was complex, the simplicity of the
 implementation is encouraging.
@todo analyse OASYS data.
 \subsection{Transitive closure}
@todo implement transitive closu....
 \begin{enumerate}
  \item {\bf Comparison of transactional primatives (best case for each operator)}