cleanup
This commit is contained in:
parent
bb2713ba5e
commit
8bf2cb65ef
1 changed files with 22 additions and 24 deletions
|
@ -366,9 +366,9 @@ issues in more detail.
|
||||||
The lower level of a \yad operation provides atomic
|
The lower level of a \yad operation provides atomic
|
||||||
updates to regions of the disk. These updates do not have to deal
|
updates to regions of the disk. These updates do not have to deal
|
||||||
with concurrency, but the portion of the page file that they read and
|
with concurrency, but the portion of the page file that they read and
|
||||||
write must be atomically updated, even if the system crashes.
|
write must be updated atomically, even if the system crashes.
|
||||||
|
|
||||||
The higher level provides operations that span multiple pages by
|
The higher-level provides operations that span multiple pages by
|
||||||
atomically applying sets of operations to the page file and coping
|
atomically applying sets of operations to the page file and coping
|
||||||
with concurrency issues. Surprisingly, the implementations of these
|
with concurrency issues. Surprisingly, the implementations of these
|
||||||
two layers are only loosely coupled.
|
two layers are only loosely coupled.
|
||||||
|
@ -379,7 +379,7 @@ locks and discusses the alternatives \yad provides to application developers.
|
||||||
\subsection{Atomic page file operations}
|
\subsection{Atomic page file operations}
|
||||||
|
|
||||||
Transactional storage algorithms work because they are able to
|
Transactional storage algorithms work because they are able to
|
||||||
atomically update portions of durable storage. These small atomic
|
update atomically portions of durable storage. These small atomic
|
||||||
updates are used to bootstrap transactions that are too large to be
|
updates are used to bootstrap transactions that are too large to be
|
||||||
applied atomically. In particular, write-ahead logging (and therefore
|
applied atomically. In particular, write-ahead logging (and therefore
|
||||||
\yad) relies on the ability to atomically write entries to the log
|
\yad) relies on the ability to atomically write entries to the log
|
||||||
|
@ -405,8 +405,8 @@ shortening recovery time.
|
||||||
|
|
||||||
For simplicity, this section ignores mechanisms that detect
|
For simplicity, this section ignores mechanisms that detect
|
||||||
and restore torn pages, and assumes that page writes are atomic.
|
and restore torn pages, and assumes that page writes are atomic.
|
||||||
While the techniques described in this section rely on the ability to
|
Although the techniques described in this section rely on the ability to
|
||||||
atomically update disk pages, this restriction is relaxed by other
|
update disk pages atomically, this restriction is relaxed by other
|
||||||
recovery mechanisms.
|
recovery mechanisms.
|
||||||
|
|
||||||
|
|
||||||
|
@ -450,7 +450,7 @@ limiting each transaction to a single operation, and by forcing the
|
||||||
page that each operation updates to disk in order. If we ignore torn
|
page that each operation updates to disk in order. If we ignore torn
|
||||||
pages and failed sectors, this does not
|
pages and failed sectors, this does not
|
||||||
require any sort of logging, but is quite inefficient in practice, as
|
require any sort of logging, but is quite inefficient in practice, as
|
||||||
it foces the disk to perform a potentially random write each time the
|
it forces the disk to perform a potentially random write each time the
|
||||||
page file is updated. The rest of this section describes how recovery
|
page file is updated. The rest of this section describes how recovery
|
||||||
can be extended, first to efficiently support multiple operations per
|
can be extended, first to efficiently support multiple operations per
|
||||||
transaction, and then to allow more than one transaction to modify the
|
transaction, and then to allow more than one transaction to modify the
|
||||||
|
@ -461,9 +461,9 @@ same data before committing.
|
||||||
Recovery relies upon the fact that each log entry is assigned a {\em
|
Recovery relies upon the fact that each log entry is assigned a {\em
|
||||||
Log Sequence Number (LSN)}. The LSN is monitonically increasing and
|
Log Sequence Number (LSN)}. The LSN is monitonically increasing and
|
||||||
unique. The LSN of the log entry that was most recently applied to
|
unique. The LSN of the log entry that was most recently applied to
|
||||||
each page is stored with the page, allowing recovery to selectively
|
each page is stored with the page, which allows recovery to selectively
|
||||||
replay log entries. This only works if log entries change exactly one
|
replay log entries. This only works if log entries change exactly one
|
||||||
page, and if they are applied to the page atomically.
|
page and if they are applied to the page atomically.
|
||||||
|
|
||||||
Recovery occurs in three phases, Analysis, Redo and Undo.
|
Recovery occurs in three phases, Analysis, Redo and Undo.
|
||||||
``Analysis'' is beyond the scope of this paper. ``Redo'' plays the
|
``Analysis'' is beyond the scope of this paper. ``Redo'' plays the
|
||||||
|
@ -491,7 +491,7 @@ Note that CLRs only cause Undo to skip log entries. Redo will apply
|
||||||
log entries protected by the CLR, guaranteeing that those updates are
|
log entries protected by the CLR, guaranteeing that those updates are
|
||||||
applied to the page file.
|
applied to the page file.
|
||||||
|
|
||||||
There are many other schemes for page level recovery that we could
|
There are many other schemes for page-level recovery that we could
|
||||||
have chosen. The scheme desribed above has two particularly nice
|
have chosen. The scheme desribed above has two particularly nice
|
||||||
properties. First, pages that were modified by active transactions
|
properties. First, pages that were modified by active transactions
|
||||||
may be {\em stolen}; they may be written to disk before a transaction
|
may be {\em stolen}; they may be written to disk before a transaction
|
||||||
|
@ -565,9 +565,9 @@ aborts.
|
||||||
|
|
||||||
The key idea is to distinguish between the {\em logical operations} of a
|
The key idea is to distinguish between the {\em logical operations} of a
|
||||||
data structure, such as inserting a key, and the {\em physical operations}
|
data structure, such as inserting a key, and the {\em physical operations}
|
||||||
such as splitting tree nodes or or rebalancing a tree. The physical
|
such as splitting tree nodes or rebalancing a tree. The physical
|
||||||
operations do not need to be undone if the containing logical operation
|
operations do not need to be undone if the containing logical operation
|
||||||
(insert) aborts. \diff{We record such operations using {\em logical
|
(e.g. {\em insert}) aborts. \diff{We record such operations using {\em logical
|
||||||
logging} and {\em physical logging}, respectively.}
|
logging} and {\em physical logging}, respectively.}
|
||||||
|
|
||||||
\diff{Each nested top action performs a single logical operation by
|
\diff{Each nested top action performs a single logical operation by
|
||||||
|
@ -581,7 +581,7 @@ even after other transactions manipulate the data structure. If the
|
||||||
nested transaction does not complete, physical UNDO can safely roll
|
nested transaction does not complete, physical UNDO can safely roll
|
||||||
back the changes. Therefore, nested transactions can always be rolled
|
back the changes. Therefore, nested transactions can always be rolled
|
||||||
back as long as the physical updates are protected from other
|
back as long as the physical updates are protected from other
|
||||||
transactions and complete nested transactions perserve the integrity
|
transactions and complete nested transactions preserve the integrity
|
||||||
of the structures they manipulate.}
|
of the structures they manipulate.}
|
||||||
|
|
||||||
This leads to a mechanical approach that converts non-reentrant
|
This leads to a mechanical approach that converts non-reentrant
|
||||||
|
@ -636,8 +636,8 @@ higher-level constructs such as unique key requirements. \yad
|
||||||
supports this by distinguishing between {\em latches} and {\em locks}.
|
supports this by distinguishing between {\em latches} and {\em locks}.
|
||||||
Latches are provided using operating system mutexes, and are held for
|
Latches are provided using operating system mutexes, and are held for
|
||||||
short periods of time. \yads default data structures use latches in a
|
short periods of time. \yads default data structures use latches in a
|
||||||
way that avoids deadlock. This section will describe the latching
|
way that avoids deadlock. This section will describe \yads latching
|
||||||
protocols that \yad makes use of, and describes two custom lock
|
protocols and describes two custom lock
|
||||||
managers that \yads allocation routines use to implement layout
|
managers that \yads allocation routines use to implement layout
|
||||||
policies and provide deadlock avoidance. Applications that want
|
policies and provide deadlock avoidance. Applications that want
|
||||||
conventional transactional isolation (serializability) can make
|
conventional transactional isolation (serializability) can make
|
||||||
|
@ -650,22 +650,20 @@ reentrant data structure library. It is the application's
|
||||||
responsibility to provide locking, whether it be via a database-style
|
responsibility to provide locking, whether it be via a database-style
|
||||||
lock manager, or an application-specific locking protocol. Note that
|
lock manager, or an application-specific locking protocol. Note that
|
||||||
locking schemes may be layered. For example, when \yad allocates a
|
locking schemes may be layered. For example, when \yad allocates a
|
||||||
record, it first calls a region allocator that allocates contiguous
|
record, it first calls a region allocator, which allocates contiguous
|
||||||
sets of pages, and then it allocates a record on one of those pages.
|
sets of pages, and then it allocates a record on one of those pages.
|
||||||
|
|
||||||
The record allocator and the region allocator each contain custom lock
|
The record allocator and the region allocator each contain custom lock
|
||||||
management. If transaction A frees some storage, transaction B reuses
|
management. If transaction A frees some storage, transaction B reuses
|
||||||
the storage and commits, and then transaction A aborts, then the
|
the storage and commits, and then transaction A aborts, then the
|
||||||
storage would be double allocated. The region allocator (which is
|
storage would be double allocated. The region allocator, which allocates large chunks infrequently, records the id
|
||||||
infrequently called, and not concerned with locality) records the id
|
|
||||||
of the transaction that created a region of freespace, and does not
|
of the transaction that created a region of freespace, and does not
|
||||||
coalesce or reuse any storage associated with an active transaction.
|
coalesce or reuse any storage associated with an active transaction.
|
||||||
|
|
||||||
On the other hand, the record allocator is called frequently, and is
|
In contrast, the record allocator is called frequently and must enable locality. Therefore, it associates a set of pages with
|
||||||
concerned with locality. Therefore, it associates a set of pages with
|
|
||||||
each transaction, and keeps track of deallocation events, making sure
|
each transaction, and keeps track of deallocation events, making sure
|
||||||
that space on a page is never over reserved. Providing each
|
that space on a page is never over reserved. Providing each
|
||||||
transaction with a seperate pool of freespace should increase
|
transaction with a separate pool of freespace should increase
|
||||||
concurrency and locality. This allocation strategy was inspired by
|
concurrency and locality. This allocation strategy was inspired by
|
||||||
Hoard, a malloc implementation for SMP machines~\cite{hoard}.
|
Hoard, a malloc implementation for SMP machines~\cite{hoard}.
|
||||||
|
|
||||||
|
@ -861,7 +859,7 @@ persistent storage must be either:
|
||||||
\end{enumerate}
|
\end{enumerate}
|
||||||
|
|
||||||
Modern drives provide these properties at a sector level: Each sector
|
Modern drives provide these properties at a sector level: Each sector
|
||||||
is atomically updated, or it fails a checksum when read, triggering an
|
is updated atomically, or it fails a checksum when read, triggering an
|
||||||
error. If a sector is found to be corrupt, then media recovery can be
|
error. If a sector is found to be corrupt, then media recovery can be
|
||||||
used to restore the sector from the most recent backup.
|
used to restore the sector from the most recent backup.
|
||||||
|
|
||||||
|
@ -1070,8 +1068,8 @@ obtaining reasonable performance in such a system under \yad is
|
||||||
straightforward. We then compare our simple, straightforward
|
straightforward. We then compare our simple, straightforward
|
||||||
implementation to our hand-tuned version and Berkeley DB's implementation.
|
implementation to our hand-tuned version and Berkeley DB's implementation.
|
||||||
|
|
||||||
The simple hash table uses nested top actions to atomically update its
|
The simple hash table uses nested top actions to update its
|
||||||
internal structure. It uses a {\em linear} hash function~\cite{lht}, allowing
|
internal structure atomically. It uses a {\em linear} hash function~\cite{lht}, allowing
|
||||||
it to incrementally grow its buffer list. It is based on a number of
|
it to incrementally grow its buffer list. It is based on a number of
|
||||||
modular subcomponents. Notably, its bucket list is a growable array
|
modular subcomponents. Notably, its bucket list is a growable array
|
||||||
of fixed length entries (a linkset, in the terms of the physical
|
of fixed length entries (a linkset, in the terms of the physical
|
||||||
|
@ -1381,7 +1379,7 @@ constructs graphs by first connecting nodes together into a ring.
|
||||||
It then randomly adds edges between the nodes until the desired
|
It then randomly adds edges between the nodes until the desired
|
||||||
out-degree is obtained. This structure ensures graph connectivity.
|
out-degree is obtained. This structure ensures graph connectivity.
|
||||||
If the nodes are laid out in ring order on disk then it also ensures that
|
If the nodes are laid out in ring order on disk then it also ensures that
|
||||||
one edge from each node has good locality while the others generally
|
one edge from each node has good locality, while the others generally
|
||||||
have poor locality.
|
have poor locality.
|
||||||
|
|
||||||
The second experiment explicitly measures the effect of graph locality
|
The second experiment explicitly measures the effect of graph locality
|
||||||
|
|
Loading…
Reference in a new issue