Add new docs/corfurl/notes/README.md stuff

and also: Add CORFU papers section Merge corfurl.md and CONCEPTS.md Add one more CORFU-related paper Delete prototype/corfurl/docs/CONCEPTS.md
2014-03-01 20:33:13 +09:00 · 2014-03-01 20:33:13 +09:00 · c9764bf5f6
commit c9764bf5f6
parent 8b105672b1
5 changed files with 240 additions and 1 deletions
--- a/prototype/corfurl/README.md
+++ b/prototype/corfurl/README.md
@ -0,0 +1,17 @@
 This is a repo that has other stuff that Greg Burd was noodling
 around with wrt distributed indexing.  I haven't bothered weeding
 any of it out, sorry!
 The corfurl code is in the 'src' and 'include' directories.  In
 addition, there are docs here:
 https://github.com/basho/corfurl/blob/master/docs/corfurl.md
 This is a README-style collection of CORFU-related papers,
 building instructions, and testing instructions.
 https://github.com/basho/corfurl/tree/master/docs/corfurl/notes
 https://github.com/basho/corfurl/tree/master/docs/corfurl/notes#two-clients-try-to-write-the-exact-same-data-at-the-same-time-to-the-same-lpn
 The above are some notes about testing problems & solutions that
 I was/am/?? hoping might find their way into a paper someday.
--- a/prototype/corfurl/docs/corfurl.md
+++ b/prototype/corfurl/docs/corfurl.md
@ -1,3 +1,88 @@
 ## CORFU papers
 I recommend the "5 pages" paper below first, to give a flavor of
 what the CORFU is about.  When Scott first read the CORFU paper
 back in 2011 (and the Hyder paper), he thought it was insanity.
 He recommends waiting before judging quite so hastily.  :-)
 After that, then perhaps take a step back are skim over the
 Hyder paper.  Hyder started before CORFU, but since CORFU, the
 Hyder folks at Microsoft have rewritten Hyder to use CORFU as
 the shared log underneath it.  But the Hyder paper has lots of
 interesting bits about how you'd go about creating a distributed
 DB where the transaction log *is* the DB.
 ### "CORFU: A Distributed Shared LogCORFU: A Distributed Shared Log"
 MAHESH BALAKRISHNAN, DAHLIA MALKHI, JOHN D. DAVIS, and VIJAYAN
 PRABHAKARAN, Microsoft Research Silicon Valley, MICHAEL WEI,
 University of California, San Diego, TED WOBBER, Microsoft Research
 Silicon Valley
 Long version of introduction to CORFU (~30 pages)
 http://www.snookles.com/scottmp/corfu/corfu.a10-balakrishnan.pdf
 ### "CORFU: A Shared Log Design for Flash Clusters"
 Same authors as above
 Short version of introduction to CORFU paper above (~12 pages)
 http://www.snookles.com/scottmp/corfu/corfu-shared-log-design.nsdi12-final30.pdf
 ### "From Paxos to CORFU: A Flash-Speed Shared Log"
 Same authors as above
 5 pages, a short summary of CORFU basics and some trial applications
 that have been implemented on top of it.
 http://www.snookles.com/scottmp/corfu/paxos-to-corfu.malki-acmstyle.pdf
 ### "Beyond Block I/O: Implementing a Distributed Shared Log in Hardware"
 Wei, Davis, Wobber, Balakrishnan, Malkhi
 Summary report of implmementing the CORFU server-side in
 FPGA-style hardware. (~11 pages)
 http://www.snookles.com/scottmp/corfu/beyond-block-io.CameraReady.pdf
 ### "Tango: Distributed Data Structures over a Shared Log"
 Balakrishnan, Malkhi, Wobber, Wu, Brabhakaran, Wei, Davis, Rao, Zou, Zuck
 Describes a framework for developing data structures that reside
 persistently within a CORFU log: the log *is* the database/data
 structure store.
 http://www.snookles.com/scottmp/corfu/Tango.pdf
 ### "Dynamically Scalable, Fault-Tolerant Coordination on a Shared Logging Service"
 Wei, Balakrishnan, Davis, Malkhi, Prabhakaran, Wobber
 The ZooKeeper inter-server communication is replaced with CORFU.
 Faster, fewer lines of code than ZK, and more features than the
 original ZK code base.
 http://www.snookles.com/scottmp/corfu/zookeeper-techreport.pdf
 ### "Hyder – A Transactional Record Manager for Shared Flash"
 Bernstein, Reid, Das
 Describes a distributed log-based DB system where the txn log is
 treated quite oddly: a "txn intent" record is written to a
 shared common log All participants read the shared log in
 parallel and make commit/abort decisions in parallel, based on
 what conflicts (or not) that they see in the log.  Scott's first
 reading was "No way, wacky" ... and has since changed his mind.
 http://www.snookles.com/scottmp/corfu/CIDR2011Proceedings.pdf
 pages 9-20
 ## Fiddling with PULSE
--- a/prototype/corfurl/docs/corfurl/notes/2014-02-27.chain-repair-need-write-twice.mscgen
+++ b/prototype/corfurl/docs/corfurl/notes/2014-02-27.chain-repair-need-write-twice.mscgen
@ -0,0 +1,35 @@
 msc {
    client1, FLU1, FLU2, client2, client3;
    client1 box client3  [label="Epoch #1: chain = FLU1 -> FLU2"];
    client1 -> FLU1      [label="{write,epoch1,<<Page YYY>>}"];
    client1 <- FLU1      [label="ok"];
    client1 box client1  [label="Client crash", textcolour="red"];
    FLU1 box FLU1        [label="FLU crash", textcolour="red"];
    client1 box client3  [label="Epoch #2: chain = FLU2"];
    client2 -> FLU2      [label="{write,epoch2,<<Page ZZZ>>}"];
    client2 <- FLU2      [label="ok"];
    client3 box client3  [label="Read repair starts", textbgcolour="aqua"];
    client3 -> FLU2      [label="{read,epoch2}"];
    client3 <- FLU2      [label="{ok,<<Page ZZZ>>}"];
    client3 -> FLU1      [label="{write,epoch2,<<Page ZZZ>>}"];
    FLU1 box FLU1        [label="What do we do here?  Our current value is <<Page YYY>>.", textcolour="red"] ;
    FLU1 box FLU1        [label="If we do not accept the repair value, then we are effectively UNREPAIRABLE.", textcolour="red"] ;
    FLU1 box FLU1        [label="If we do accept the repair value, then we are mutating an already-written value.", textcolour="red"] ;
    FLU1 -> client3      [label="I'm sorry, Dave, I cannot do that."];
    FLU1 box FLU1        [label = "In theory, while repair is still happening, nobody will ever ask FLU1 for its value.", textcolour="black"] ;
    client3 -> FLU1      [label="{write,epoch2,<<Page ZZZ>>,repair,witnesses=[FLU2]}",  textbgcolour="silver"];
    FLU1 box FLU1        [label="Start an async process to ask the witness list to corroborate this repair."];
    FLU1 -> FLU2         [label="{read,epoch2}", textbgcolour="aqua"];
    FLU1 <- FLU2         [label="{ok,<<Page ZZ>>}", textbgcolour="aqua"];
    FLU1 box FLU1        [label="Overwrite local storage with repair page.",  textbgcolour="silver"];
    client3 <- FLU1      [label="Async proc replies: ok",  textbgcolour="silver"];
 }
--- a/prototype/corfurl/docs/corfurl/notes/README.md
+++ b/prototype/corfurl/docs/corfurl/notes/README.md
@ -20,4 +20,73 @@ substantially to make it clearer what is happening.
 Also for commit 087c2605ab.
 I believe that I have a fix for the silver-colored
-`error-overwritten`, but the correctness of it remains to be seen.
+`error-overwritten` ... and it was indeed added to the code soon
 afterward, but it turns out that it doesn't solve the entire problem
 of "two clients try to write the exact same data at the same time to
 the same LPN".
 ## "Two Clients Try to Write the Exact Same Data at the Same Time to the Same LPN"
 This situation is something that CORFU cannot protect against, IMO.
 I have been struggling for a while, to try to find a way for CORFU
 clients to know *always* when there is a conflict with another
 writer.  It usually works: the basic nature of write-once registers is
 very powerful.  However, in the case where two clients are trying to
 write the same page data to the same LPN, it looks impossible to
 resolve.
 How do you tell the difference between:
 1. A race between a client A writing page P at address LPN and
   read-repair fixing P.  P *is* A's data and no other's, so this race
   doesn't confuse anyone.
 1. A race between a client A writing page P at address LPN and client
   B writing the exact same page data P at the same LPN.
   A's page P = B's page P, but clients A & B don't know that.
   If CORFU tells both A & B that they were successful, A & B assume
   that the CORFU log has two new pages appended to it, but in truth
   only one new page was appended.
 If we try to solve this by always avoiding the same LPN address
 conflict, we are deluding ourselves.  If we assume that the sequencer
 is 100% correct in that it never assigns the same LPN twice, and if we
 assume that a client must never write a block without an assignment
 from the sequencer, then the problem is solved.  But the problem has a
 _heavy_ price: the log is only available when the sequencer is
 available, and only when never more than one sequencer running at a
 time.
 The CORFU base system promises correct operation, even if:
 * Zero sequencers are running, and clients might choose the same LPN
  to write to.
 * Two more more sequencers are running, and different sequencers
  assign the same LPN to two different clients.
 But CORFU's "correct" behavior does not include detecting the same
 page at the same LPN.  The papers don't specifically say it, alas.
 But IMO it's impossible to guarantee, so all docs ought to explicitly
 say that it's impossible and that clients must not assume it.
 See also
 * two-clients-race.1.png
 ## A scenario of chain repair & write-once registers
 See:
 * 2014-02-27.chain-repair-write-twice.png
 ... for a scenario where write-once registers that are truly only
 write-once-ever-for-the-rest-of-the-future are "inconvenient" when it
 comes to chain repair.  Client 3 is attempting to do chain repair ops,
 bringing FLU1 back into sync with FLU2.
 The diagram proposes one possible idea for making overwriting a
 read-once register a bit safer: ask another node in the chain to
 verify that the page you've been asked to repair is exactly the same
 as that other FLU's page.
--- a/prototype/corfurl/docs/corfurl/notes/two-clients-race.1.mscgen
+++ b/prototype/corfurl/docs/corfurl/notes/two-clients-race.1.mscgen
@ -0,0 +1,33 @@
 msc {
    client1, FLU1, FLU2, client2, client3;
    client1 -> FLU1      [label="{write,epoch1,<<Not unique page>>}"];
    client1 <- FLU1      [label="ok"];
    client3 -> FLU2      [label="{seal,epoch1}"];
    client3 <- FLU2      [label="{ok,...}"];
    client3 -> FLU1      [label="{seal,epoch1}"];
    client3 <- FLU1      [label="{ok,...}"];
    client2 -> FLU1      [label="{write,epoch1,<<Not unique page>>}"];
    client2 <- FLU1      [label="error_epoch"];
    client2 abox client2 [label="Ok, get the new epoch info....", textbgcolour="silver"];
    client2 -> FLU1      [label="{write,epoch2,<<Not unique page>>}"];
    client2 <- FLU1      [label="error_overwritten"];
    client1 -> FLU2      [label="{write,epoch1,<<Not unique page>>}"];
    client1 <- FLU2      [label="error_epoch"];
    client1 abox client1 [label="Ok, hrm.", textbgcolour="silver"];
    client3 abox client3 [ label = "Start read repair", textbgcolour="aqua"] ;
    client3 -> FLU1      [label="{read,epoch2}"];
    client3 <- FLU1      [label="{ok,<<Not unique page>>}"];
    client3 -> FLU2      [label="{write,epoch2,<<Not unique page>>}"];
    client3 <- FLU2      [label="ok"];
    client3 abox client3 [ label = "End read repair", textbgcolour="aqua"] ;
    client3 abox client3 [ label = "We saw <<Not unique page>>", textbgcolour="silver"] ;
    client1 -> FLU2      [label="{write,epoch2,<<Not unique page>>}"];
    client1 <- FLU2      [label="error_overwritten"];
 }