Added experimental setup description.
This commit is contained in:
parent
5cd520e9ac
commit
88a3d2aaf3
1 changed files with 190 additions and 0 deletions
|
@ -822,6 +822,196 @@ LLADD's linear hash table uses linked lists of overflow buckets.
|
||||||
|
|
||||||
\section{Validation}
|
\section{Validation}
|
||||||
|
|
||||||
|
|
||||||
|
\subsection{Conventional workloads}
|
||||||
|
|
||||||
|
Existing database servers and transactional libraries are tuned to
|
||||||
|
support OLTP (Online Transaction Processing) workloads well. Roughly
|
||||||
|
speaking, the workload of these systems is dominated by short
|
||||||
|
transactions and response time is important. We are confident that a
|
||||||
|
sophisticated system based upon our approach to transactional storage
|
||||||
|
will compete well in this area, as our algorithm is based upon ARIES,
|
||||||
|
which is the foundation of IBM's DB/2 database. However, our current
|
||||||
|
implementation is geared toward simpler, specialized applications, so
|
||||||
|
we cannot verify this directly. Instead, we present a number of
|
||||||
|
microbenchmarks that compare our system against Berkeley DB, the most
|
||||||
|
popular transactional library. Berkeley DB is a mature product and is
|
||||||
|
actively maintained. While it currently provides more functionality
|
||||||
|
than our current implementation, we believe that our architecture
|
||||||
|
could support a broader range of features than provided by BerkeleyDB,
|
||||||
|
which is a monolithic system.
|
||||||
|
|
||||||
|
The first test measures the throughput of a single long running
|
||||||
|
transaction that generates an loads a synthetic data set into the
|
||||||
|
library. For comparison, we provide throughput for many different
|
||||||
|
LLADD operations, and BerkeleyDB's DB\_HASH hashtable implementation,
|
||||||
|
and lower level DB\_RECNO record number based interface.
|
||||||
|
|
||||||
|
@todo fill in numbers here.
|
||||||
|
|
||||||
|
The second test measures the two library's ability to exploit
|
||||||
|
concurrent transactions to reduce logging overhead. Both systems
|
||||||
|
implement a simple optimization that allows multiple calls to commit()
|
||||||
|
to be serviced by a single synchronous disk request.
|
||||||
|
|
||||||
|
@todo analysis
|
||||||
|
|
||||||
|
The final test measures the maximum number of sustainable transactions
|
||||||
|
per second for the two libraries. In these cases, we generate a
|
||||||
|
uniform number of transactions per second by spawning a fixed nuber of
|
||||||
|
threads, and varying the number of requests each thread issues per
|
||||||
|
second, and report the cumulative density of the distribution of
|
||||||
|
response times for each case.
|
||||||
|
|
||||||
|
@todo analysis / come up with a more sane graph format.
|
||||||
|
|
||||||
|
\subsection{Object Serialization}
|
||||||
|
|
||||||
|
Object serialization performance is extremely important in modern web
|
||||||
|
service systems such as EJB. Object serialization is also a
|
||||||
|
convenient way of adding persistant storage to an existing application
|
||||||
|
without developing an explicit file format or dealing with low level
|
||||||
|
I/O interfaces.
|
||||||
|
|
||||||
|
A simple object serialization scheme would bulk-write and bulk-read
|
||||||
|
sets of application objects to an operating system file. These
|
||||||
|
schemes suffer from high read and write latency, and do not handle
|
||||||
|
small updates well. More sophisticated schemes store each object in a
|
||||||
|
seperate randomly accessible record, such as a database tuple, or
|
||||||
|
Berkeley DB hashtable entry. These schemes allow for fast single
|
||||||
|
object reads and writes, and are typically the solutions used by
|
||||||
|
application services.
|
||||||
|
|
||||||
|
Unfortunately, most of these schemes ``double buffer'' application
|
||||||
|
data. Typically, the application maintains a set of in-memory objects
|
||||||
|
which may be accessed with low latency. The backing data store
|
||||||
|
maintains a seperate buffer pool which contains serialized versions of
|
||||||
|
the objects in memory, and corresponds to the on-disk representation
|
||||||
|
of the data. Accesses to objects that are only present in the buffer
|
||||||
|
pool incur ``medium latency,'' as they must be deserialized before the
|
||||||
|
application may access them. Finally, some objects may only reside on
|
||||||
|
disk, and may only be accessed with high latency.
|
||||||
|
|
||||||
|
Since these applications are typically data-centric, it is important
|
||||||
|
to make efficient use of system memory in order to reduce hardware
|
||||||
|
costs. A straightforward solution to this problem would be to bound
|
||||||
|
the amount of memory the application may consume by preventing it from
|
||||||
|
caching deserialized objects. This scheme conserves memory, but it
|
||||||
|
incurs the cost of an in-memory deserialization to read the object,
|
||||||
|
and an in-memory deserialization/serialization cycle to write to an
|
||||||
|
object.
|
||||||
|
|
||||||
|
Alternatively, the amount of memory consumed by the buffer pool could
|
||||||
|
be bounded to some small value, and the application could maintain a
|
||||||
|
large object cache. This scheme would incur no overhead for a read
|
||||||
|
request. However, it would incur the overhead of a disk-based
|
||||||
|
serialization in order to service a write request.\footnote{In
|
||||||
|
practice, the transactional backing store would probably fetch the
|
||||||
|
page that contains the object from disk, causing two disk I/O's to be
|
||||||
|
issued.}
|
||||||
|
|
||||||
|
LLADD's architecture allows us to apply two interesting optimizations
|
||||||
|
to such object serialization schemes. First, since LLADD supports
|
||||||
|
custom log entries, it is trivial to have it store diffs of objcts to
|
||||||
|
the log instead of writing the entire object to log during an update.
|
||||||
|
Such an optimization would be difficult to achieve with Berkeley DB,
|
||||||
|
but could be performed by a database server if the fields of the
|
||||||
|
objects were broken into database table columns. It is unclear if
|
||||||
|
this optimization would outweigh the overheads associated with an SQL
|
||||||
|
based interface.
|
||||||
|
|
||||||
|
% @todo WRITE SQL OASYS BENCHMARK!!
|
||||||
|
|
||||||
|
The second optimization is a bit more sophisticated, but still easy to
|
||||||
|
implement in LLADD. We do not believe that it would be possible to
|
||||||
|
achieve using existing relational database systems, or with Berkeley
|
||||||
|
DB.
|
||||||
|
|
||||||
|
LLADD services a request to write to a record by pinning (and possibly
|
||||||
|
reading in) the applicable page, generating a log entry, writing the
|
||||||
|
new value of the record to the in-memory page, and unpinning the page.
|
||||||
|
|
||||||
|
If LLADD knows that the client will not ask to read the record, then
|
||||||
|
there is no real reason to update the version of the record in the
|
||||||
|
page file. In fact, if diff does not need to be generated,
|
||||||
|
there is no need to have the page in memory at all. We can think of
|
||||||
|
two plausible reasons why a diff would be unnecessary.
|
||||||
|
|
||||||
|
First, the application may not be interested in transaction atomicity.
|
||||||
|
In this case, by writing no-op undo records instead of real undo
|
||||||
|
records, LLADD could guarantee that some prefix of the log will be
|
||||||
|
applied to the page file after recovery. The redo information is
|
||||||
|
already available; the object is in the application's cache.
|
||||||
|
``Transactions'' could still be durable, as commit() could be used to
|
||||||
|
force the log to disk.
|
||||||
|
|
||||||
|
Second, the application could provide the undo record for LLADD. This
|
||||||
|
could be implemented in a straightforward manner by adding special
|
||||||
|
accessor methods to the object which generate undo information as the
|
||||||
|
object is updated in memory.
|
||||||
|
|
||||||
|
We have removed the need to use the on-disk version of the object to
|
||||||
|
generate log entries, but still need to guarantee that the application
|
||||||
|
will not attempt to read a stale record from the page file. This
|
||||||
|
problem also has a simple solution. In order to service a write
|
||||||
|
request made by the application, the cache calls a special
|
||||||
|
``update()'' method. This method only writes a log entry. If the
|
||||||
|
cache must evict an object from cache, it issues a special ``flush()''
|
||||||
|
method. This method writes the object to the buffer pool (and
|
||||||
|
probably incurs the cost of disk I/O), using a LSN recorded by the
|
||||||
|
most recent update() call that was associated with the object. Since
|
||||||
|
LLADD implements no-force, it does not matter to recovery if the
|
||||||
|
version of the object in the page file is stale.
|
||||||
|
|
||||||
|
An observant reader may have noticed a subtle problem with this
|
||||||
|
scheme. More than one object may reside on a page, and we do not
|
||||||
|
constrain the order in which the cache calls flush() to evict objects.
|
||||||
|
Recall that the version of the LSN on the page implies that all
|
||||||
|
updates {\em up to} and including the page LSN have been applied.
|
||||||
|
Nothing stops our current scheme from breaking this invariant.
|
||||||
|
|
||||||
|
We have two potential solutions to this problem. One solution is to
|
||||||
|
implement a cache eviction policy that respects the ordering of object
|
||||||
|
updates on a per-page basis and could be implemented using one or
|
||||||
|
more priority queues. Instead of interfering with the eviction policy
|
||||||
|
of the cache (and keeping with the theme of this paper), we sought a
|
||||||
|
solution that leverages LLADD's interfaces instead.
|
||||||
|
|
||||||
|
We can force LLADD to ignore page LSN values when considering our
|
||||||
|
special update() log entries during the REDO phase of recovery. This
|
||||||
|
forces LLADD to re-apply the diffs in the same order the application
|
||||||
|
generated them in. This works as intended because we use an
|
||||||
|
idempotent diff format that will produce the correct result even if we
|
||||||
|
start with a copy of the object that is newer than the first diff that
|
||||||
|
we apply.
|
||||||
|
|
||||||
|
The only remaining detail is to implement a custom checkpointing
|
||||||
|
algorithm that understands the page cache. In order to produce a
|
||||||
|
fuzzy checkpoint, we simply iterate over the object pool, calculating
|
||||||
|
the minimum lsn of the objects in the pool.\footnote{This LSN is distinct from
|
||||||
|
the one used by flush(); it is the lsn of the object's {\em first}
|
||||||
|
call to update() after the object was added to the cache.} At this
|
||||||
|
point, we can invoke a normal ARIES checkpoint, with the restriction
|
||||||
|
that the log is not truncated past the minimum LSN encountered in the
|
||||||
|
object pool.\footnote{Because LLADD does not yet implement
|
||||||
|
checkpointing, we have not implemented this checkpointing scheme.}
|
||||||
|
|
||||||
|
We implemented a LLADD plugin for OASYS, a C++ object serialization
|
||||||
|
library. The plugin makes use of all of the optimizations mentioned
|
||||||
|
in this section, and was used to generate Figure~[TODO]. Ignoring the
|
||||||
|
checkpointing scheme and a small change needed in the recovery
|
||||||
|
algorithm, the operations required for these two optimizations are
|
||||||
|
roughly 150 lines of C code, including whitespace, comments and
|
||||||
|
boilerplate function registrations. While the reasoning required to
|
||||||
|
ensure the correctness of this code was complex, the simplicity of the
|
||||||
|
implementation is encouraging.
|
||||||
|
|
||||||
|
@todo analyse OASYS data.
|
||||||
|
|
||||||
|
\subsection{Transitive closure}
|
||||||
|
|
||||||
|
@todo implement transitive closu....
|
||||||
|
|
||||||
\begin{enumerate}
|
\begin{enumerate}
|
||||||
|
|
||||||
\item {\bf Comparison of transactional primatives (best case for each operator)}
|
\item {\bf Comparison of transactional primatives (best case for each operator)}
|
||||||
|
|
Loading…
Reference in a new issue