diff --git a/doc/paper3/LLADD.tex b/doc/paper3/LLADD.tex index 64d24d3..1536b84 100644 --- a/doc/paper3/LLADD.tex +++ b/doc/paper3/LLADD.tex @@ -52,7 +52,7 @@ %make title bold and 14 pt font (Latex default is non-bold, 16 pt) -\title{\Large \bf \yad: System for Adaptable, Transactional Storage} +\title{\Large \bf \yad: Flexible Transactional Storage} %for single author (just remove % characters) \author{ @@ -210,6 +210,7 @@ customized to implement many existing (and some new) write-ahead logging variants. We present implementations of some of these variants and benchmark them against popular real-world systems. We conclude with a survey of related and future work. + An (early) open-source implementation of the ideas presented here is available (see Section~\ref{sec:avail}). @@ -256,11 +257,11 @@ top of it. A conceptual mapping based on the relational model might translate a relation into a set of keyed tuples. If the database were going to be used for short, write-intensive and high-concurrency transactions -(OLTP), the physical model would probably translate sets of tuples +(e.g. banking), the physical model would probably translate sets of tuples into an on-disk B-tree. In contrast, if the database needed to -support long-running, read-only aggregation queries (OLAP) over high-dimensional data, a physical model that stores the data in a sparse +support long-running, read-only aggregation queries over high-dimensional data (e.g. data warehousing), a physical model that stores the data in a sparse array format would be more appropriate~\cite{OLAP,molap}. Although both -OLTP and OLAP databases are based upon the relational model they make +kinds of databases are based upon the relational model they make use of different physical models in order to serve different classes of applications efficiently. @@ -269,7 +270,7 @@ efficiently support the wide range of conceptual mappings that are in use today. In addition to sets, objects, and XML, such a model would need to cover search engines, version-control systems, work-flow applications, and scientific computing, as examples. Similarly, a -recent database paper argues that the "one size fits all" approach of +recent database paper argues that the ``one size fits all'' approach of DBMSs no longer works~\cite{oneSizeFitsAll}. Instead of attempting to create such a unified model after decades of @@ -279,7 +280,7 @@ efficiently. This makes it easy for system designers to implement most data models that the underlying hardware can support, or to abandon the database approach entirely, and forgo %structured physical models and abstract conceptual mappings. -a top down model. +a top-down model. \subsection{The Systems View} \label{sec:systems} @@ -336,13 +337,12 @@ sections. As with other systems, \yads transactions have a multi-level structure. Multi-layered transactions were originally proposed as a -concurrency control strategy for database servers that support high -level, application specific extensions~\cite{multiLayeredSystems}. +concurrency control strategy for database servers that support high-level, application-specific extensions~\cite{multiLayeredSystems}. In \yad, the lower level of an operation provides atomic updates to regions of the disk. These updates do not have to deal with concurrency, but must update the page file atomically, even if the system crashes. -Higher level operations span multiple pages by +Higher-level operations span multiple pages by atomically applying sets of operations to the page file, recording their actions in the log and coping with concurrency issues. The loose coupling of these layers lets \yads users compose and reuse @@ -448,7 +448,7 @@ is that \yad allows user-defined operations, while ARIES defines a set of operations that support relational database systems. An {\em operation} consists of an undo and a redo function. Each time an operation is invoked, a corresponding log entry is generated. We -describe operations in more detail in Section~\ref{sec:operations} +describe operations in more detail in Section~\ref{sec:operations}. %\subsection{Multi-page Transactions} @@ -697,7 +697,7 @@ code they service, both implement deadlock avoidance, and both are transparent to higher layers. General-purpose database lock managers provide none of these features, supporting the idea that special-purpose lock managers are a useful abstraction. Locking -schemes that interact well with object oriented programming +schemes that interact well with object-oriented programming schemes~\cite{sharedAbstractTypes} and exception handling~\cite{omtt} extend these ideas to larger systems. @@ -750,7 +750,7 @@ use of state stored in the page. As described above, \yad operations may make use of page contents to compute the updated value, and \yad ensures that each operation is applied exactly once in the right order. The recovery scheme described -in this section does not guarantee that such operations will be +in this section does not guarantee that operations will be applied exactly once, or even that they will be presented with a self-consistent version of a page during recovery. @@ -810,8 +810,8 @@ blobs}. If a large object is stored in pages that contain LSNs, then it is not In contrast, modern file systems allow applications to perform a DMA copy of the data into memory, allowing the CPU to be used for more productive purposes. Furthermore, modern operating systems allow -network services to use DMA and network adaptor hardware to read data -from disk, and send it over a network socket without passing it +network services to use DMA and network-interface cards to read data +from disk, and send it over the network without passing it through the CPU. Again, this frees the CPU, allowing it to perform other tasks. @@ -915,7 +915,7 @@ Overwritten sectors are shaded.} \end{figure} Figure~\ref{fig:torn} describes a page that is torn during crash, and the actions performed by redo that repair it. Assume that the initial version -of the page, with LSN $0$, is on disk, and the disk is in the process +of the page, with LSN $0$, is on disk, and the OS is in the process of writing out the version with LSN $2$ when the system crashes. When recovery reads the page from disk, it may encounter any combination of sectors from these two versions. @@ -1059,8 +1059,8 @@ function~\cite{lht}, allowing it to increase capacity incrementally. It is based on a number of modular subcomponents. Notably, the physical location of each bucket is stored in a growable array of fixed-length entries. The bucket lists can be provided by either of -\yads linked list implementations. One provides fixed-length entries, -yielding a hash table with fixed-length keys and values. The list +\yads two linked list implementations. One provides fixed-length entries, +yielding a hash table with fixed-length keys and values. The second list (and therefore hash table) used in our experiments provides variable-length entries. The hand-tuned hash table is also built on \yad and also uses a linear hash @@ -1111,7 +1111,7 @@ second,%\endnote{The concurrency test was run without lock managers, and the % transactions obeyed the A, C, and D properties. Since each % transaction performed exactly one hash table write and no reads, they also % obeyed I (isolation) in a trivial sense.} -and provided roughly + and provided roughly double Berkeley DB's throughput (up to 50 threads). Although not shown here, we found that the latencies of Berkeley DB and \yad were similar. @@ -1129,7 +1129,7 @@ similar. clip, width=1\columnwidth]{figs/mem-pressure.pdf}} \caption{\label{fig:OASYS} -The effect of \yad object persistence optimizations under low and high memory pressure.} +The effect of \yad object-persistence optimizations under low and high memory pressure.} \vspace{-12pt} \end{figure*} @@ -1250,8 +1250,7 @@ Figure~\ref{fig:OASYS} presents the performance of the three \yad variants, and the \oasys plugins implemented on top of other systems. In this test, none of the systems were memory bound. As we can see, \yad performs better than the baseline systems, which is -not surprising, since it is not providing the A property of ACID -transactions. +not surprising, since it exploits the weaker durability requirements. In non-memory bound systems, the optimizations nearly double \yads performance by reducing the CPU overhead of marshalling and @@ -1301,7 +1300,7 @@ has poor locality.} \end{figure} We are interested in enabling \yad to manipulate sequences of -application requests. By translating these requests into the logical +application requests. By translating these requests into logical operations (such as those used for logical undo), we can manipulate and optimize such requests. Because logical operations generally correspond to application-level operations, application developers can easily determine whether @@ -1329,10 +1328,10 @@ the growable array implementation that is used as our linear hash table's bucket list. The first experiment (Figure~\ref{fig:oo7}) is loosely based on the OO7 database benchmark~\cite{oo7}. We -hard-code the out-degree of each node, and use a directed graph. Like OO7, we +hard-code the out-degree of each node and use a directed graph. Like OO7, we construct graphs by first connecting nodes together into a ring. -We then randomly add edges until the desired -out-degree is obtained. This structure ensures graph connectivity. +We then randomly add edges until obtaining the desired +out-degree. This structure ensures graph connectivity. Nodes are laid out in ring order on disk so at least one edge from each node is local. @@ -1411,15 +1410,13 @@ Streaming applications face many of the problems that RISC databases could address. However, it is unclear whether a single interface or conceptual mapping would meet their needs. Based on experiences with their system, the authors of StreamBase argue that ``one size fits -all'' interfaces are no longer appropriate. Instead, they argue that -the manual composition of a small number of relatively straightforward -primitives leads to cleaner, more scalable -systems~\cite{oneSizeFitsAll}. This is in contrast to the RISC +all'' database engines are no longer appropriate. Instead, they argue that +the market will ``fracture into a collection of independent ... engines''~\cite{oneSizeFitsAll}. This is in contrast to the RISC approach, which attempts to build a database in terms of interchangeable parts. We agree with the motivations behind RISC databases and StreamBase, -and believe they complement each other (and \yad) well. However, or +and believe they complement each other and \yad well. However, or goal differs from these systems; we want to support applications that are a poor fit for database systems. However, as \yad matures we we hope that it will enable a wide range of transactional systems, @@ -1507,9 +1504,8 @@ atomic updates.) Camelot provides two logging modes: redo only (no-steal, no-force) and undo/redo (steal, no-force). It uses facilities of Mach to provide recoverable virtual memory. It supports Avalon, which uses Camelot to provide a -higher-level (C++) programming model. Camelot provides a lower-level -C interface that allows other programming models to be -implemented. It provides a limited form of closed nested transactions +higher-level (C++) programming model; Camelot provides a lower-level +C interface that enables other programming models as well. It provides a limited form of closed nested transactions where parents are suspended while children are active. Camelot also provides mechanisms for distributed transactions and transactional RPC. Although Camelot does allow applications to provide their own lock @@ -1526,7 +1522,7 @@ distributed transaction. For example, X/Open DTP provides a standard networking protocol that allows multiple transactional systems to be controlled by a single transaction manager~\cite{dtp}. Enterprise Java Beans is a standard for developing transactional -middle ware on top of heterogeneous storage. Its +middleware on top of heterogeneous storage. Its transactions may not be nested. This simplifies its semantics, and leads to many, short transactions, improving concurrency. However, flat transactions are somewhat rigid, and lead to @@ -1546,22 +1542,22 @@ hard-code log format or recovery algorithms, and supports a number of interesting optimizations such as distributed logging~\cite{recoveryInQuickSilver}. The QuickSilver project found that transactions meet the demands of most -applications, provided that long running transactions do not exhaust +applications, provided that long-running transactions do not exhaust system resources, and that flexible concurrency control policies are available. In QuickSilver, nested transactions would be most useful when a series of program invocations form a larger logical unit~\cite{experienceWithQuickSilver}. -Clouds is an object oriented, distributed transactional operating -system. It made use of shared abstract +Clouds is an object-oriented, distributed transactional operating +system. It uses shared abstract types~\cite{sharedAbstractTypes} to provide concurrency control -between the objects in the system~\cite{clouds}. With the aid of +among the objects in the system~\cite{clouds}. With the aid of per-method atomicity specifications, it provides higher concurrency than QuickSilver, but is not designed for legacy applications. \subsection{Data Structure Frameworks} -As mentioned in Section~\ref{sec:systems}, Berkeley DB is a system +As mentioned in Sections~\ref{sec:systems} and~\ref{experiments}, Berkeley DB is a system quite similar to \yad, and gives application programmers raw access to transactional data structures such as a single-node B-Tree and hash table~\cite{libtp}. @@ -1595,7 +1591,7 @@ Record-oriented allocation, such as in VMS Record Management Services~\cite{vms} Write-optimized file systems lay files out in the order they were written rather than in logically sequential order~\cite{lfs}. -Schemes to improve locality between small +Schemes to improve locality among small objects exist as well. Relational databases allow users to specify the order in which tuples will be laid out, and often leave portions of pages unallocated to reduce fragmentation as new records are allocated. @@ -1639,7 +1635,7 @@ shrunk as functionality has moved into extensions. We expect this trend to continue as development progresses. A resource manager is a common pattern in system software design, and -manages dependencies and ordering constraints between sets of +manages dependencies and ordering constraints among sets of components. Over time, we hope to shrink \yads core to the point where it is simply a resource manager that coordinates interchangeable implementations of the other components. diff --git a/doc/paper3/figs/bulk-load.pdf b/doc/paper3/figs/bulk-load.pdf index f748e62..4319073 100644 Binary files a/doc/paper3/figs/bulk-load.pdf and b/doc/paper3/figs/bulk-load.pdf differ diff --git a/doc/paper3/figs/mem-pressure.pdf b/doc/paper3/figs/mem-pressure.pdf index 08039e3..0b9c7a9 100644 Binary files a/doc/paper3/figs/mem-pressure.pdf and b/doc/paper3/figs/mem-pressure.pdf differ diff --git a/doc/paper3/figs/object-diff.pdf b/doc/paper3/figs/object-diff.pdf index e7a3827..53c9438 100644 Binary files a/doc/paper3/figs/object-diff.pdf and b/doc/paper3/figs/object-diff.pdf differ diff --git a/doc/paper3/figs/oo7.pdf b/doc/paper3/figs/oo7.pdf index 0f16c55..87f54e2 100644 Binary files a/doc/paper3/figs/oo7.pdf and b/doc/paper3/figs/oo7.pdf differ diff --git a/doc/paper3/figs/tps-extended.pdf b/doc/paper3/figs/tps-extended.pdf index e7d7b00..0912f1c 100644 Binary files a/doc/paper3/figs/tps-extended.pdf and b/doc/paper3/figs/tps-extended.pdf differ diff --git a/doc/paper3/figs/trans-closure-hotset.pdf b/doc/paper3/figs/trans-closure-hotset.pdf index 40ff5c3..919408d 100644 Binary files a/doc/paper3/figs/trans-closure-hotset.pdf and b/doc/paper3/figs/trans-closure-hotset.pdf differ