Updated section 8 (mostly wording and shortening)

This commit is contained in:
Sears Russell 2006-08-18 21:40:47 +00:00
parent 330d1dc4d1
commit cdcdba1099

View file

@ -1418,9 +1418,6 @@ algorithm's outperforms the naive traversal.
\section{Related Work} \section{Related Work}
\label{related-work} \label{related-work}
\eab{moved text here from section 2 to make it smaller and less technical}
\subsection{Database Variations} \subsection{Database Variations}
\label{sec:otherDBs} \label{sec:otherDBs}
@ -1428,14 +1425,14 @@ This section discusses transaction systems with goals
similar to ours. Although these projects were successful in many similar to ours. Although these projects were successful in many
respects, they fundamentally aimed to extend the range of their respects, they fundamentally aimed to extend the range of their
abstract data model, which in the end still has limited overall range. abstract data model, which in the end still has limited overall range.
In contrast, \yad follows a bottom-up approach that enables can In contrast, \yad follows a bottom-up approach that can support (in
implement (in theory) any of these abstract models and their extensions. theory) any of these abstract models and their extensions.
\subsubsection{Extensible databases} \subsubsection{Extensible databases}
Genesis~\cite{genesis}, an early database toolkit was explicitly Genesis is an early database toolkit that was explicitly
structured in terms of the physical data models and conceptual structured in terms of the physical data models and conceptual
mappings described above. mappings described above~\cite{genesis}.
It is designed to allow database implementors to easily swap out It is designed to allow database implementors to easily swap out
implementations of the various components defined by its framework. implementations of the various components defined by its framework.
Like subsequent systems (including \yad), it allows its users to Like subsequent systems (including \yad), it allows its users to
@ -1461,9 +1458,8 @@ a database toolkit, new types are defined when the database server is
compiled. In today's object-relational database systems, new types compiled. In today's object-relational database systems, new types
are defined at runtime. Each approach has its advantages. However, are defined at runtime. Each approach has its advantages. However,
both types of systems aim to extend a high-level data model with new both types of systems aim to extend a high-level data model with new
abstract data types, and are thus limited in the range of new abstract data types. This is of limited use to applications that are
applications they support, which remain essentially queries over sets. not naturally structured in terms of queries over sets.
\subsubsection{Modular databases} \subsubsection{Modular databases}
@ -1522,17 +1518,13 @@ Special-purpose languages for transaction processing allow programmers
to express transactional operations naturally. However, programs to express transactional operations naturally. However, programs
written in these languages are generally limited to a particular written in these languages are generally limited to a particular
concurrency model and transactional storage system. Therefore, these concurrency model and transactional storage system. Therefore, these
systems are complementary to \yad; they provide a specialized systems are complementary to our work; \yad provides a substrate that makes
high-level interface that hard-codes a particular programming model
and specialized storage infrastructure. In contrast, \yad is a
general-purpose storage infrastructure that avoids hardcoding
programming model assumptions. \yad provides a substrate that makes
it easier to implement transactional programming models. it easier to implement transactional programming models.
\subsubsection{Nested Transactions} \subsubsection{Nested Transactions}
{\em Nested transactions} form trees of transactions, where children {\em Nested transactions} form trees of transactions, where children
were spawned by their parents. They can be used to increase are spawned by their parents. They can be used to increase
concurrency, provide partial rollback, and improve fault tolerance. concurrency, provide partial rollback, and improve fault tolerance.
{\em Linear} nesting occurs when transactions are nested to arbitrary {\em Linear} nesting occurs when transactions are nested to arbitrary
depths, but have at most one child. In {\em closed} nesting, child depths, but have at most one child. In {\em closed} nesting, child
@ -1543,15 +1535,18 @@ transactions are not rolled back if the parent aborts.
Closed nesting aids in intra-transaction concurrency and fault Closed nesting aids in intra-transaction concurrency and fault
tolerance. Increased fault tolerance is achieved by isolating each tolerance. Increased fault tolerance is achieved by isolating each
child transaction from the others, and automatically retrying failed child transaction from the others, and automatically retrying failed
transactions. This technique is similar to the one used by MapReduce, transactions. This technique is similar to the one used by MapReduce
which isolates subtasks by restricting the data that each unit of work to provide exactly-once execution on very large computing
may read and write, and which provides atomicity by ensuring clusters~\cite{mapReduce}.
exactly-once execution of each unit of work~\cite{mapReduce}.
\yads nested top actions, and support for custom lock managers also %which isolates subtasks by restricting the data that each unit of work
%may read and write, and which provides atomicity by ensuring
%exactly-once execution of each unit of work~\cite{mapReduce}.
\yads nested top actions, and support for custom lock managers
allow for inter-transaction concurrency. In some respect, nested top allow for inter-transaction concurrency. In some respect, nested top
actions implement a form of open, linear nesting. Actions performed actions implement a form of open, linear nesting. Actions performed
inside the nested top are not rolled back when the parent aborts. inside the nested top action are not rolled back when the parent aborts.
However, the logical undo gives the programmer the option to However, the logical undo gives the programmer the option to
compensate for the nested top action in aborted transactions. We expect compensate for the nested top action in aborted transactions. We expect
that nested transactions that nested transactions
@ -1559,18 +1554,6 @@ could be implemented as a layer on top of \yad.
\subsubsection{Distributed Programming Models} \subsubsection{Distributed Programming Models}
%\rcs{ I think Argus makes use of shadow copies for durability, and for
%in-memory transactions~\cite{argusImplementation}. A tree of shadow
%copies exists, and is handled as follows (I think): All transaction
%locks are commit duration, per object. There are read locks and write
%locks, and it uses strict 2PL. Each transaction is a tree of
%``subactions'' that can get R/W locks according to the 2PL rules. Two
%subactions in the same action cannot get a write lock on the same
%object because each one gets its own copy of the object to write to.
%If a subaction or transaction abort their local copy is simply
%discarded. At commit, the local copy replaces the global copy.}
%System R was one of the first relational database implementations, and %System R was one of the first relational database implementations, and
%defined a clean separation between its query processor and its storage %defined a clean separation between its query processor and its storage
%subsystem. In fact, it supported a simple navigational interface to %subsystem. In fact, it supported a simple navigational interface to
@ -1587,161 +1570,75 @@ rolled back and retried due to node failure.
Argus is a language for reliable distributed applications. An Argus Argus is a language for reliable distributed applications. An Argus
program consists of guardians, which are essentially objects that program consists of guardians, which are essentially objects that
encapsulate persistent and atomic data. Persistent data allows encapsulate persistent and atomic data. Accesses to atomic data are
concurrent operations to be implemented, while accesses to atomic data serializable; persistent data is not protected by the lock manager,
are serializable~\cite{argus}. Typically, the data structure that is being and is used to implement concurrent data structures~\cite{argus}.
implemented is stored in persistent storage, but is agumented with Typically, the data structure is stored in persistent storage, but is agumented with
extra information in atomic storage. This extra data tracks the extra information in atomic storage. This extra data tracks the
status of each item stored in the structure. Conceptually, in a hash status of each item stored in the structure. Conceptually, atomic
table, atomic storage would contain the values ``Not present'', storage used by a hashtable would contain the values ``Not present'',
``Committed'' or ``Aborted; Old Value = x'' for each key in (or ``Committed'' or ``Aborted; Old Value = x'' for each key in (or
missing from) the hash. Before accessing the hash, the operation missing from) the hash. Before accessing the hash, the operation
implementation would consult the appropriate piece of atomic data, and implementation would consult the appropriate piece of atomic data, and
update the persitent storage if necessary. Because the atomic data is update the persitent storage if necessary. Because the atomic data is
protected by a lock manager, attempts to update the hashtable are serializable. protected by a lock manager, attempts to update the hashtable are serializable.
Therefore, clever use of atomic storage can be used to provide logical locking~\rcs{Double check this} Therefore, clever use of atomic storage can be used to provide logical locking.
Note that implementation of efficient data structures using this Note that operations that implement concurrent data structures using
method forces each operation implementation to track a great deal of this method must track a great deal of extra state. Efficiently
extra state (they suggest implementing a log structure to support a tracking such state is not straightforward. For example, the Argus
concurrent hash table), and to set policies regarding the granularity hashtable implementation made use of its own log structure to
with which the data structures should be written to efficiently track the status of each key that had been touched by an
disk~\cite{argusImplementation}. \yad avoids these problems by active transaction. Also, the hashtable is responsible for setting
forcing operation implementors to provide logical undos, and by policies regarding when, and with what granularity it would be written
leaving lock managment to higher-level code. We argue that logical back to disk~\cite{argusImplementation}. \yad operations avoid this
undos are easily provided in most circumstances, while higher-level complexity by providing logical undos, and by leaving lock managment
lock management decouples data structure implementations from to higher-level code. This also separates write-back and concurrency
application concurrency models. control policies from data structure implementations.
%The Argus designers assumed that only a few core concurrent %The Argus designers assumed that only a few core concurrent
%transactional data structures would be implemented, and that higher %transactional data structures would be implemented, and that higher
%level code would make use of these structures. Also, Argus assumed %level code would make use of these structures. Also, Argus assumed
%that transactions should be serializable. %that transactions should be serializable.
Camelot, a successor to Argus made a number of important Camelot made a number of important
contributions, both in system design, and in algorithms for contributions, both in system design, and in algorithms for
distributed transactions~\cite{camelot}. It left locking to application level code, distributed transactions~\cite{camelot}. It leaves locking to application level code,
and updated data in place. (Argus used shadow copies to provide and updates data in place. (Argus uses shadow copies to provide
atomic updates.) Camelot provided two logging modes: Redo only atomic updates.) Camelot provides two logging modes: Redo only
(no-Steal,no-Force) and Undo/Redo (Steal, no-Force). It was (no-Steal,no-Force) and Undo/Redo (Steal, no-Force). It uses
implemented using Mach, and provided recoverable virtual memory. It facilities of Mach to provide recoverable virtual memory. It
was decoupled from Avalon, which used Camelot to provide a is decoupled from Avalon, which uses Camelot to provide a
higher-level (C++) programming model. Camelot provided a lower-level higher-level (C++) programming model. Camelot provides a lower-level
C interface that allowed other programming models to be C interface that allows other programming models to be
implemented. It provided a limited form of closed nested transactions implemented. It provides a limited form of closed nested transactions
where parents are suspended while children are active. Camelot also where parents are suspended while children are active. Camelot also
provided mechanisms for distributed transactions and transactional provides mechanisms for distributed transactions and transactional
RPC. However, concurrent operations in Camelot were similar to those RPC. While Camelot does allow appliactions to provide their own lock
in Argus since Camelot did not provide logical undo. Camelot's focus managers, implementation strategies for concurrent operations
was upon support for distributed transactions, therefore, it hardcoded in Camelot are similar to those
in Argus since Camelot does not provide logical undo. Camelot focuses
on distributed transactions, and hardcodes
assumptions regarding the structure of nested transactions, consensus assumptions regarding the structure of nested transactions, consensus
algorithms, communication mechanisms, and so on. In contrast, \yads algorithms, communication mechanisms, and so on. In contrast, \yads
goal is to efficiently support a wide range of such mechanisms. goal is to efficiently support a wide range of such mechanisms without
providing any built in support for distributed transactions.
More recent transactional programming schemes allow for more multiple More recent transactional programming schemes allow for multiple
transaction implementations to cooperate as part of the same transaction implementations to cooperate as part of the same
distributed transaction. For example, X/Open DTP provides a standard distributed transaction. For example, X/Open DTP provides a standard
networking protocol that allows multiple transactional systems to be networking protocol that allows multiple transactional systems to be
controlled by a single transaction manager~\cite{something}. controlled by a single transaction manager~\cite{something}.
Enterprise Java Beans is a standard for developing transactional Enterprise Java Beans is a standard for developing transactional
middleware that may make use of heterogenous storage. Its middleware on top of heterogenous storage. Its
transactions may not be nested~\cite{something}. This simplifies its transactions may not be nested~\cite{something}. This simplifies its
semantics somewhat, and leads to many, short transactions, which semantics somewhat, and leads to many, short transactions,
improves concurrency. However, it is somewhat rigid, and may lead to improving concurrency. However, flat transactions are somewhat rigid, and lead to
situations where committed transactions have to be manually rolled situations where committed transactions have to be manually rolled
back by other transactions after the fact~\cite{ejbCritique}. Open back by other transactions after the fact~\cite{ejbCritique}. The Open
Multithreaded Transactions provide a model for nested transactions Multithreaded Transactions model is based on nested transactions,
that incorporates exception handling, and allows parents to execute incorporates exception handling, and allows parents to execute
concurrently with their children. concurrently with their children~\cite{omtt}.
%Argus transactions use shadow copies to provide atomic updates.
%Instead of making use of logical undo, concurrent guardians make use
%of two types of persistant state. One type behaves transactionally,
%and will be rolled back at abort, while the other type can be
%atomically written to disk, but is not automatically modified at
%commit or abort. The transactional portions of the state can be
%provided by built-in atomic types, or by another guardian.
%A transactional Argus hashtable could consist of a simple,
%non-transactional, hashtable that is written back to disk atomically
%each time it is updated and a set of transactional flags that are
%automatically updated each time a transaction accesses the table,
%commits or aborts. During a lookup, the hashtable would consult these
%flags to determine the status of the key in question. To minimize the
%amount of data written to disk, one could use a log to emulate
%explicit per-key flags, and partition the hashtable and logfile into
%multiple atomically updated regions~\cite{argusImplementation}.
%While this approach does allow the layout and implementation of the
%data structure to be completely independent from the mechanisms used
%for transactional updates, it forces the operation implementor to
%provide a module that explicitly tracks the relationship between
%object states and transactions. Some of this information is required
%for locking, making it easier to provide a logical lock mananger.
%However, taking that approach couples the data structure
%implementation to the application's concurrency model.
%The Argus also work provides high-level models for atomicity,
%reconfiguration, and other issues faced by developers of transactional
%systems. These models do not depend on the low-level Argus
%implementation, and may be useful to applications built on top of
%\yad.~\rcs{citations here?}
%Camelot is a distributed transaction processing system. It provides
%two physical logging modes; redo only (no-Steal, no-Force), and
%redo-undo (Steal, no-Force), but does not contain provisions for
%logical logging or compensations. It supports nested transactions,
%which makes it possible to implement concurrent data structures in a
%style similar to concurrent guardians in Argus.
%Therefore, commit duration locks are required to protect data
%structures from concurrent transactions, \rcs{This sentence is
%problematic for two reasons: (1) Camelot allowed hybrid atomicity and
%other schemes in addition to 2PL. (2) According to \cite{camelot}, pg
%433 ``Logical locks, implemented within servers, and support for
%hybrid atomicity provide the possibilty of high concurrency.'' I
%think this is a mistake in their paper; logical locking isn't very
%helpful when ``This [Camelot's Nested Transaction] model states that
%if one transaction modifies a region, the region cannot be modified by
%another transacion unless that transaction is an active descendant of
%original transaction or the original transaction compeletes... If
%comodification does occur, no guarantees concerning data integrity are
%given'' (Camelot + Avalon book, pg 117)'' I think the same mistake is
%repeated in the RVM paper, when they discuss multi-threaded code.
%Also, see the discussion on Argus; you could do concurrency that way
%on Camelot...} limiting the applicability of Camelot to
%high-concurrency applications or its scalability to multi-processor
%systems.
%Camelot makes use of a nested transaction model that allows
%concurrency within a single transaction. In Camelot, nested
%transactions can run in parallel and make use of locks acquired by the
%transaction that spawned them. Parent transactions are suspended
%until children transactions complete, and children are protected from
%each other using locks, or other similar methods. We beleive that
%\yads support for logical undo would allow it to support such
%transactions with more concurrency than Camelot allowed. Camelot is
%an early example of a C library that provides transactional semantics
%over custom data types. Also, it introduced a number of features,
%such as distributed logging and commit semantics, and transactional
%RPC that we plan to integrate into \yad as we add support for
%multi-node transactions. Avalon, which was built on top of Camelot is
%a persistent version of C++ that introduced the idea of persistent
%programming language types.
%Both Argus and Camelot make use of {\em closed} nested transactions.
%In this context, ``closed'' means that subtransactions must abort if
%their parents abort. In contrast, \yads nested transactions provide a
%limited form of {\em open} nested transactions, in that they are able
%to commit even if their parents abort. Currently, \yad limits each
%transaction (or nested top action) to have a single child (although
%these may be nested to arbitrary depths). This limitation is sometimes
%called {\em linear nesting}. Schemes to naturally integrate linear
%and open nesting of transactions with modern languages such as Java
%have recently been been proposed~\cite{nestedTransactionPoster}.
%\rcs{More information on nested transactions is available in this book
%(which I haven't looked at yet)\cite{nestedTransactionBook}.}
\subsection{Berkeley DB} \subsection{Berkeley DB}