Lecture notes and the PhD thesis related to stasis.

This commit is contained in:
Gregory Burd 2012-05-23 14:30:23 +01:00
parent 3a559a3b3c
commit a564e8c220
2 changed files with 257 additions and 0 deletions

BIN
doc/EECS-2010-2.pdf Normal file

Binary file not shown.

257
doc/stasis-262a-lecture.txt Normal file
View file

@ -0,0 +1,257 @@
Stasis Lecture Notes
Outline:
(0) What is Stasis?
Storage manager; one level below the RSS, MySQL storage engine, BDB, etc...
- Transactions that are agnostic to data layout.
Provides mechanisms without policy
- WAL recovery mechanisms:
- ARIES style
- Shadow Page style (for blobs, log-structured indices)
LFS is a log structured file system. Stasis can support
log structured things (have implemented a log structured
index)
- LSN-free (to store data in native formats)
- Concurrency
- Multiple app threads
- CPU / IO concurrency
- I/O amortization (eg: group commit, write-back cache)
- Data layout tools (page formats)
- Allocation
Rest of this lecture: Applying ARIES primitives to your own systems
- Plug: If you want to do anything in this space for your
project, let me know; Stasis encodes these ideas!
(1) Programming models for concurrency + error handling
A: Record broken invariants, unwind stack.
NTA: Needed for concurrency! What is "concurrency" here?
(without lock manager)
Consistency, Isolation
- Isolation: App/system specific! -> Policy; punt
- Consistency: Some is app/system specific (eg: referential
integrity, objects have valid state)
Some is inherant to the storage manager
Seems to be only a few ways to deal with error conditions.
One common approach: each action that breaks an invariant
should be pushed onto a stack. On error, pop things of the
stack, repairing each invariant in order.
Aside: This is why C++ does not have a "finally" block.
Design pattern there is RAII (Resource Acquisition Is
Initiailization). C++ programs stack allocate things like locks:
{ Lock("foo") l;
// now I hold the lock
}
// lock released when stack frame exits
Nested Top Actions let transactional data structures protect
themselves against concurrent aborting transactions.
(Prerequisite for recovery)
Concurrent code (that [tries to] handle out of memory)
work through w/o error handling first.
move(item,treeA,treeB) {
try {
lock(item)
try {
lock(treeA)
//mess with tree pointers, allocation, etc...
} catch (e) {
//fix up tree structure somehow.
throw(e)
} finally {
unlock(treeA)
}
try {
lock(treeB)
//...
} catch (e) {
//fix up treeB structure
unlock(treeB)
try {
lock(treeA)
// put item back into treeA
} catch(e) {
// make sure this can't happen
} finally {
unlock(treeA)
}
throw(e)
}
unlock(treeB)
} finally {
unlock(item)
}
}
(1.5) Quick ARIES review
Undo traverses linked list.
CLR: Recovery generates long regions of the log that are
no-op's. Need to prevent these from being executed more
than once, even if recovery crashes.
< write example on board >
Nested Top Action: Same mechanisim, different idea (gives concurrency)
[ Tree example ]
Pseudo code:
xid = begin_transaction();
lock(tree_mutex);
nta = BeginNestedTopAction(xid, "tree insert", tree, item);
// update tree entries as normal
// crash inside here does physical undo
EndNestedTopAction(nta);
unlock(tree_mutex);
// more stuff happens
// crash / abort here does logical undo
end_transaction(xid);
B: Make copy + atomic swap
safe writes: rename() is atomic for a reason.
write_new_copy()
sync()
rename new version on top of old version()
Shadow pages: Same trick.
Functional programming: input to f() is immutable, output of f() is
immutable
Tradeoffs?
Complexity
Update in place: data structure must support update in place;
non-trivial for many apps
Copy + swap: data must fit in RAM, or algorithm must be space efficient
Performance:
What % of object being updated? (copy+swap writes whole object every
time)
Synchronization overhead (difficult to parallelize update in place)
Update in place suffers from fragmentation / seeks
This is where Stasis comes in.
System developer has control over on-disk represenatation of data
-> app-specific storage algorithms
Can switch between update in place, copy + swap, and more exotic
recovery mechanisms
Also, buffer manager, log manager, etc can be replaced / modified
to suit specific apps.
Example:
ROSE: Motivation: database replication environment, avoid all
disk seeks, use compression for performance
Draw LSM-Tree on board, mention compression, recovery techniques.
(2) One way to think about ARIES:
Given atomic updates to the page file, provide durable transactions.
But disk writes aren't atomic! Torn page handling:
At least three approaches to ensure atomic writes:
(1) canary bits. Each disk page (512 bytes) contains a bit that
will be flipped each time a page is written back. If the
bits don't match, the page is torn
(2) crcs: Checksum the page on writeback, store checksum in page.
(This finds silent data corruption, which is commonplace in
modern hard drives)
(3) double write buffer: Keep a log of all I/O operations sent to
disk. Replay it at recovery. (Q: what's the overhead of this?)
Common "silent" drive failure modes:
(0) Arbitrary subset of the page's sectors reach disk.
(1) Wrong bits are sent to drive, checksummed, written correctly
(2) Correct bits sent to drive, checksummed, written correctly,
*but to the wrong track*
Q: do any of these work?
A: no.
Q: Can we fix them up so we know when data is corrupted?
A: add page number to crc, double write buffer
(3) Extending ARIES recovery
Plenty of sources of atomic redo are available.
- FS metadata
- SQL databases, BDB, etc.
If we have an LSN for each atomic object, then redo need only be
deterministic:
f(x) = f(x)
If not, we need a special property (a bit more than idempotency)
idempotency: f(x) = f(f(x)
LSN-free updates:
blind writes: f(x) = f(x')
We get this with hard drives (modulo silent data corruption, need for media
recovery...)!
Can think of each bit (or byte) on a page as a seperate, versioned entity.
During REDO, need to make sure that each byte is the newest version in the
log.
- if byte is not updated in REDO log, then it must contain the correct
value before recovery starts -> OK
- if it is, then it will eventually be overwritten with newest log entry
Q: What about torn pages?
- works, but doesn't handle silent data corruption
Q: What about slotted pages? (Where slot contents can be reshuffled at any
time?)
- need full physical redo for reshuffling