This commit is contained in:
Eric Brewer 2006-04-24 23:22:46 +00:00
parent c0d143529c
commit 67a0295a6b

View file

@ -78,10 +78,8 @@ write-ahead-logging algorithms. Our partial implementation of these
ideas already provides specialized (and cleaner) semantics to applications. ideas already provides specialized (and cleaner) semantics to applications.
We evaluate the performance of a traditional transactional storage We evaluate the performance of a traditional transactional storage
system based on \yad, and show that it performs comparably to existing system based on \yad, and show that it performs favorably relative to existing
systems. systems. We present examples that make use of custom access methods, modifed
We present examples that make use of custom access methods, modifed
buffer manager semantics, direct log file manipulation, and LSN-free buffer manager semantics, direct log file manipulation, and LSN-free
pages that facilitate zero-copy optimizations, and discuss the pages that facilitate zero-copy optimizations, and discuss the
composability of these extensions. Many of these optimizations are composability of these extensions. Many of these optimizations are
@ -548,7 +546,7 @@ Thus, the single-page transactions of \yad work as follows. An {\em
operation} consists of both a redo and an undo function, both of which operation} consists of both a redo and an undo function, both of which
take one argument. An update is always the redo function applied to take one argument. An update is always the redo function applied to
the page (there is no ``do'' function), and it always ensures that the the page (there is no ``do'' function), and it always ensures that the
redo log entry (with its LSN and argument) reach the disk before redo log entry (with its LSN and argument) reaches the disk before
commit. Similarly, an undo log entry, with its LSN and argument, commit. Similarly, an undo log entry, with its LSN and argument,
always reaches the disk before a page is stolen. ARIES works always reaches the disk before a page is stolen. ARIES works
essentially the same way, but hard-codes recommended page essentially the same way, but hard-codes recommended page
@ -607,8 +605,6 @@ assigned a new LSN so the page LSN will be different. Also, each undo
is also written to the log. is also written to the log.
} }
\eab{describe recovery?}
This section very briefly described how a simplified This section very briefly described how a simplified
write-ahead-logging algorithm might work, and glossed over many write-ahead-logging algorithm might work, and glossed over many
details. Like ARIES, \yad actually implements recovery in three details. Like ARIES, \yad actually implements recovery in three
@ -707,7 +703,7 @@ each data structure until the end of the transaction. Releasing the
lock after the modification, but before the end of the transaction, lock after the modification, but before the end of the transaction,
increases concurrency. However, it means that follow-on transactions that use increases concurrency. However, it means that follow-on transactions that use
that data may need to abort if a current transaction aborts ({\em that data may need to abort if a current transaction aborts ({\em
cascading aborts}. These issues are studied in great detail in terms of optimistic concurrency control~\cite{optimisticConcurrencyControl, optimisticConcurrenctPerformance}. cascading aborts}). These issues are studied in great detail in terms of optimistic concurrency control~\cite{optimisticConcurrencyControl, optimisticConcurrenctPerformance}.
Unfortunately, the long locks held by total isolation cause bottlenecks when applied to key Unfortunately, the long locks held by total isolation cause bottlenecks when applied to key
data structures. data structures.
@ -736,10 +732,11 @@ implements nested top actions. The extension may be used as follows:
nested top action'' right before the mutex is released. nested top action'' right before the mutex is released.
\end{enumerate} \end{enumerate}
If the transaction that encloses the operation aborts, the logical \noindent If the transaction thata encloses the operation aborts, the logical
undo will {\em compensate} for its effects, leaving the structural undo will {\em compensate} for its effects, leaving the structural
changes intact. Note that this recipe does not ensure transactional changes intact.
consistency and is largely orthogonol to the use of a lock manager. % Note that this recipe does not ensure iso transactional
%consistency and is largely orthogonol to the use of a lock manager.
We have found that it is easy to protect operations that make We have found that it is easy to protect operations that make
structural changes to data structures with this recipe. structural changes to data structures with this recipe.
@ -769,7 +766,7 @@ We say that such operations perform ``blind writes.''
If all If all
operations that modify a page have this property, then we can remove operations that modify a page have this property, then we can remove
the LSN field, and have recovery conservatively assume that it is the LSN field, and have recovery conservatively assume that it is
dealing with a version of the page that is at least as old on the one dealing with a version of the page that is at least as old as the one
on disk. on disk.
\eat{ \eat{
@ -964,7 +961,7 @@ multiple machines and file systems.
\begin{figure} \begin{figure}
\includegraphics[% \includegraphics[%
width=1\columnwidth]{figs/structure.pdf} width=1\columnwidth]{figs/structure.pdf}
\caption{\sf\label{fig:structure} The portions of \yad that new operations directly interact with.} \caption{\sf\label{fig:structure} The portions of \yad that interact with new operations directly.}
\end{figure} \end{figure}
\yad allows application developers to easily add new operations to the \yad allows application developers to easily add new operations to the
system. Many of the customizations described below can be implemented system. Many of the customizations described below can be implemented
@ -981,7 +978,7 @@ a new set of log interfaces is to decide upon an interface that these log
interfaces will export to callers outside of \yad. interfaces will export to callers outside of \yad.
The externally visible interface is implemented by wrapper functions The externally visible interface is implemented by wrapper functions
and read only access methods. The wrapper function modifies the state and read-only access methods. The wrapper function modifies the state
of the page file by packaging the information that will be needed for of the page file by packaging the information that will be needed for
undo and redo into a data format of its choosing. This data structure undo and redo into a data format of its choosing. This data structure
is passed into Tupdate(). Tupdate() copies the data to the log, and is passed into Tupdate(). Tupdate() copies the data to the log, and
@ -998,13 +995,12 @@ implementation must obey a few more invariants:
\begin{itemize} \begin{itemize}
\item Pages should only be updated inside REDO and UNDO functions. \item Pages should only be updated inside REDO and UNDO functions.
\item Page updates atomically update page LSN's by pinning the page. \item Page updates atomically update the page's LSN by pinning the page.
\item If the data seen by a wrapper function must match data seen \item If the data seen by a wrapper function must match data seen
during REDO, then the wrapper should use a latch to protect against during REDO, then the wrapper should use a latch to protect against
concurrent attempts to update the sensitive data (and against concurrent attempts to update the sensitive data (and against
concurrent attempts to allocate log entries that update the data). concurrent attempts to allocate log entries that update the data).
\item Nested top actions (and logical undo), or ``big locks'' (which \item Nested top actions (and logical undo), or ``big locks'' (total isolation but lower concurrency) should be used to implement multi-page updates. (Section~\ref{sec:nta})
reduce concurrency) should be used to implement multi-page updates. (Section~\ref{sec:nta})
\end{itemize} \end{itemize}
\subsection{Linear hash table} \subsection{Linear hash table}
@ -1049,23 +1045,23 @@ The hand-tuned hashtable also uses a linear hash
function. However, it is monolithic and uses carefully ordered writes to function. However, it is monolithic and uses carefully ordered writes to
reduce runtime overheads such as log bandwidth. Berkeley DB's reduce runtime overheads such as log bandwidth. Berkeley DB's
hashtable is a popular, commonly deployed implementation, and serves hashtable is a popular, commonly deployed implementation, and serves
as a baseline for our experiements. as a baseline for our experiments.
Both of our hashtables outperform Berkeley DB on a workload that Both of our hashtables outperform Berkeley DB on a workload that
bulk loads the tables by repeatedly inserting (key, value) pairs. bulk loads the tables by repeatedly inserting (key, value) pairs,
We do not claim that our partial implementation of \yad although we do not wish to imply this is always the case.
generally outperforms, or is a robust alternative %We do not claim that our partial implementation of \yad
to Berkeley DB. Instead, this test shows that \yad is comparable to %generally outperforms, or is a robust alternative
existing systems, and that its modular design does not introduce gross %to Berkeley DB. Instead, this test shows that \yad is comparable to
inefficiencies at runtime. %existing systems, and that its modular design does not introduce gross
%inefficiencies at runtime.
The comparison between the \yad implementations is more The comparison between the \yad implementations is more
enlightening. The performance of the simple hash table shows that enlightening. The performance of the simple hash table shows that
straightfoward datastructure implementations composed from straightfoward datastructure implementations composed from
simpler structures can perform as well as the implementations included simpler structures can perform as well as the implementations included
in existing monolithic systems. The hand-tuned in existing monolithic systems. The hand-tuned
implementation shows that \yad allows application developers to implementation shows that \yad allows application developers to
optimize the primitives they build their applications upon. optimize key primitives.
% I cut this because berkeley db supports custom data structures.... % I cut this because berkeley db supports custom data structures....
@ -1130,8 +1126,8 @@ modules that implement persistant storage, and includes plugins
for Berkeley DB and MySQL. for Berkeley DB and MySQL.
This section will describe how the \yad This section will describe how the \yad
\oasys plugin reduces the runtime serialization/deserialization cpu \oasys plugin reduces the runtime serialization/deserialization CPU
overhead of write intensive workloads, while using half as much system overhead of write-intensive workloads, while using half as much system
memory as the other two systems. memory as the other two systems.
We present three variants of the \yad plugin here. The first treats \yad like We present three variants of the \yad plugin here. The first treats \yad like
@ -1149,7 +1145,7 @@ CPU utilization, and it also allows us to drastically decrease the
size of the page file. In turn this allows us to increase the size of size of the page file. In turn this allows us to increase the size of
the application's cache of live objects. the application's cache of live objects.
We implemented the \yad buffer pool optimization by adding two new We implemented the \yad buffer-pool optimization by adding two new
operations, update(), which only updates the log, and flush(), which operations, update(), which only updates the log, and flush(), which
updates the page file. updates the page file.
@ -1194,7 +1190,7 @@ manager optimizations. However, it only writes the changed portions of
objects to the log. Because of \yad's support for custom log entry objects to the log. Because of \yad's support for custom log entry
formats, this optimization is straightforward. formats, this optimization is straightforward.
In addition to the buffer pool optimizations, \yad provides several In addition to the buffer-pool optimizations, \yad provides several
options to handle UNDO records in the context options to handle UNDO records in the context
of object serialization. The first is to use a single transaction for of object serialization. The first is to use a single transaction for
each object modification, avoiding the cost of generating or logging each object modification, avoiding the cost of generating or logging
@ -1309,7 +1305,7 @@ cluster hash table, we have not yet implemented networking primitives for logica
Therefore, we implemented a single node log reordering scheme that increases request locality Therefore, we implemented a single node log reordering scheme that increases request locality
during the traversal of a random graph. The graph traversal system during the traversal of a random graph. The graph traversal system
takes a sequence of (read) requests, and partitions them using some takes a sequence of (read) requests, and partitions them using some
function. It then proceses each partition in isolation from the function. It then processes each partition in isolation from the
others. We considered two partitioning functions. The first divides the page file others. We considered two partitioning functions. The first divides the page file
into equally sized contiguous regions, which increases locality. The second takes the hash into equally sized contiguous regions, which increases locality. The second takes the hash
of the page's offset in the file, which enables load balancing. of the page's offset in the file, which enables load balancing.
@ -1337,9 +1333,9 @@ one edge from each node has good locality while the others generally
have poor locality. have poor locality.
The second experiment explicitly measures the effect of graph locality The second experiment explicitly measures the effect of graph locality
on our optimization. (Figure~\ref{fig:hotGraph}) It extends the idea on our optimization (Figure~\ref{fig:hotGraph}). It extends the idea
of a hot set to graph generation. Each node has a distinct hot set of a hot set to graph generation. Each node has a distinct hot set
which includes the 10\% of the nodes that are closest to it in ring that includes the 10\% of the nodes that are closest to it in ring
order. The remaining nodes are in the cold set. We use random edges order. The remaining nodes are in the cold set. We use random edges
instead of ring edges for this test. This does not ensure graph instead of ring edges for this test. This does not ensure graph
connectivity, but we used the same random seeds for the two systems. connectivity, but we used the same random seeds for the two systems.