From c9764bf5f6d6f5a250f1f1d35177d2cd911677bf Mon Sep 17 00:00:00 2001
From: Scott Lystig Fritchie <slfritchie@snookles.com>
Date: Sat, 1 Mar 2014 20:33:13 +0900
Subject: [PATCH] Add new docs/corfurl/notes/README.md stuff

and also:

Add CORFU papers section
Merge corfurl.md and CONCEPTS.md
Add one more CORFU-related paper
Delete prototype/corfurl/docs/CONCEPTS.md
---
 prototype/corfurl/README.md                   | 17 ++++
 prototype/corfurl/docs/corfurl.md             | 85 +++++++++++++++++++
 ...02-27.chain-repair-need-write-twice.mscgen | 35 ++++++++
 .../corfurl/docs/corfurl/notes/README.md      | 71 +++++++++++++++-
 .../corfurl/notes/two-clients-race.1.mscgen   | 33 +++++++
 5 files changed, 240 insertions(+), 1 deletion(-)
 create mode 100644 prototype/corfurl/README.md
 create mode 100644 prototype/corfurl/docs/corfurl/notes/2014-02-27.chain-repair-need-write-twice.mscgen
 create mode 100644 prototype/corfurl/docs/corfurl/notes/two-clients-race.1.mscgen

diff --git a/prototype/corfurl/README.md b/prototype/corfurl/README.md
new file mode 100644
index 0000000..95f10aa
--- /dev/null
+++ b/prototype/corfurl/README.md
@@ -0,0 +1,17 @@
+This is a repo that has other stuff that Greg Burd was noodling
+around with wrt distributed indexing.  I haven't bothered weeding
+any of it out, sorry!
+
+The corfurl code is in the 'src' and 'include' directories.  In
+addition, there are docs here:
+
+https://github.com/basho/corfurl/blob/master/docs/corfurl.md
+
+This is a README-style collection of CORFU-related papers,
+building instructions, and testing instructions.
+
+https://github.com/basho/corfurl/tree/master/docs/corfurl/notes
+https://github.com/basho/corfurl/tree/master/docs/corfurl/notes#two-clients-try-to-write-the-exact-same-data-at-the-same-time-to-the-same-lpn
+
+The above are some notes about testing problems & solutions that
+I was/am/?? hoping might find their way into a paper someday.
diff --git a/prototype/corfurl/docs/corfurl.md b/prototype/corfurl/docs/corfurl.md
index fd02134..08960dc 100644
--- a/prototype/corfurl/docs/corfurl.md
+++ b/prototype/corfurl/docs/corfurl.md
@@ -1,3 +1,88 @@
+## CORFU papers
+
+I recommend the "5 pages" paper below first, to give a flavor of
+what the CORFU is about.  When Scott first read the CORFU paper
+back in 2011 (and the Hyder paper), he thought it was insanity.
+He recommends waiting before judging quite so hastily.  :-)
+
+After that, then perhaps take a step back are skim over the
+Hyder paper.  Hyder started before CORFU, but since CORFU, the
+Hyder folks at Microsoft have rewritten Hyder to use CORFU as
+the shared log underneath it.  But the Hyder paper has lots of
+interesting bits about how you'd go about creating a distributed
+DB where the transaction log *is* the DB.
+
+### "CORFU: A Distributed Shared Log￼CORFU: A Distributed Shared Log"
+
+MAHESH BALAKRISHNAN, DAHLIA MALKHI, JOHN D. DAVIS, and VIJAYAN
+PRABHAKARAN, Microsoft Research Silicon Valley, MICHAEL WEI,
+University of California, San Diego, TED WOBBER, Microsoft Research
+Silicon Valley
+
+Long version of introduction to CORFU (~30 pages)
+http://www.snookles.com/scottmp/corfu/corfu.a10-balakrishnan.pdf
+
+### "CORFU: A Shared Log Design for Flash Clusters"
+
+Same authors as above
+
+Short version of introduction to CORFU paper above (~12 pages)
+
+http://www.snookles.com/scottmp/corfu/corfu-shared-log-design.nsdi12-final30.pdf
+
+### "From Paxos to CORFU: A Flash-Speed Shared Log"
+
+Same authors as above
+
+5 pages, a short summary of CORFU basics and some trial applications
+that have been implemented on top of it.
+
+http://www.snookles.com/scottmp/corfu/paxos-to-corfu.malki-acmstyle.pdf
+
+### "Beyond Block I/O: Implementing a Distributed Shared Log in Hardware"
+
+Wei, Davis, Wobber, Balakrishnan, Malkhi
+
+Summary report of implmementing the CORFU server-side in
+FPGA-style hardware. (~11 pages)
+
+http://www.snookles.com/scottmp/corfu/beyond-block-io.CameraReady.pdf
+
+### "Tango: Distributed Data Structures over a Shared Log"
+
+Balakrishnan, Malkhi, Wobber, Wu, Brabhakaran, Wei, Davis, Rao, Zou, Zuck
+
+Describes a framework for developing data structures that reside
+persistently within a CORFU log: the log *is* the database/data
+structure store.
+
+http://www.snookles.com/scottmp/corfu/Tango.pdf
+
+### "Dynamically Scalable, Fault-Tolerant Coordination on a Shared Logging Service"
+
+Wei, Balakrishnan, Davis, Malkhi, Prabhakaran, Wobber
+
+The ZooKeeper inter-server communication is replaced with CORFU.
+Faster, fewer lines of code than ZK, and more features than the
+original ZK code base.
+
+http://www.snookles.com/scottmp/corfu/zookeeper-techreport.pdf
+
+### "Hyder – A Transactional Record Manager for Shared Flash"
+
+Bernstein, Reid, Das
+
+Describes a distributed log-based DB system where the txn log is
+treated quite oddly: a "txn intent" record is written to a
+shared common log All participants read the shared log in
+parallel and make commit/abort decisions in parallel, based on
+what conflicts (or not) that they see in the log.  Scott's first
+reading was "No way, wacky" ... and has since changed his mind.
+
+http://www.snookles.com/scottmp/corfu/CIDR2011Proceedings.pdf
+pages 9-20
+
+
 
 ## Fiddling with PULSE
 
diff --git a/prototype/corfurl/docs/corfurl/notes/2014-02-27.chain-repair-need-write-twice.mscgen b/prototype/corfurl/docs/corfurl/notes/2014-02-27.chain-repair-need-write-twice.mscgen
new file mode 100644
index 0000000..3e01ac1
--- /dev/null
+++ b/prototype/corfurl/docs/corfurl/notes/2014-02-27.chain-repair-need-write-twice.mscgen
@@ -0,0 +1,35 @@
+msc {
+    client1, FLU1, FLU2, client2, client3;
+
+    client1 box client3  [label="Epoch #1: chain = FLU1 -> FLU2"];
+    client1 -> FLU1      [label="{write,epoch1,<<Page YYY>>}"];
+    client1 <- FLU1      [label="ok"];
+    client1 box client1  [label="Client crash", textcolour="red"];
+
+    FLU1 box FLU1        [label="FLU crash", textcolour="red"];
+
+    client1 box client3  [label="Epoch #2: chain = FLU2"];
+
+    client2 -> FLU2      [label="{write,epoch2,<<Page ZZZ>>}"];
+    client2 <- FLU2      [label="ok"];
+
+    client3 box client3  [label="Read repair starts", textbgcolour="aqua"];
+
+    client3 -> FLU2      [label="{read,epoch2}"];
+    client3 <- FLU2      [label="{ok,<<Page ZZZ>>}"];
+    client3 -> FLU1      [label="{write,epoch2,<<Page ZZZ>>}"];
+    FLU1 box FLU1        [label="What do we do here?  Our current value is <<Page YYY>>.", textcolour="red"] ;
+    FLU1 box FLU1        [label="If we do not accept the repair value, then we are effectively UNREPAIRABLE.", textcolour="red"] ;
+    FLU1 box FLU1        [label="If we do accept the repair value, then we are mutating an already-written value.", textcolour="red"] ;
+    FLU1 -> client3      [label="I'm sorry, Dave, I cannot do that."];
+
+    FLU1 box FLU1        [label = "In theory, while repair is still happening, nobody will ever ask FLU1 for its value.", textcolour="black"] ;
+
+    client3 -> FLU1      [label="{write,epoch2,<<Page ZZZ>>,repair,witnesses=[FLU2]}",  textbgcolour="silver"];
+    FLU1 box FLU1        [label="Start an async process to ask the witness list to corroborate this repair."];
+    FLU1 -> FLU2         [label="{read,epoch2}", textbgcolour="aqua"];
+    FLU1 <- FLU2         [label="{ok,<<Page ZZ>>}", textbgcolour="aqua"];
+    FLU1 box FLU1        [label="Overwrite local storage with repair page.",  textbgcolour="silver"];
+    client3 <- FLU1      [label="Async proc replies: ok",  textbgcolour="silver"];
+
+}
diff --git a/prototype/corfurl/docs/corfurl/notes/README.md b/prototype/corfurl/docs/corfurl/notes/README.md
index 337a34b..b5757aa 100644
--- a/prototype/corfurl/docs/corfurl/notes/README.md
+++ b/prototype/corfurl/docs/corfurl/notes/README.md
@@ -20,4 +20,73 @@ substantially to make it clearer what is happening.
 Also for commit 087c2605ab.
 
 I believe that I have a fix for the silver-colored
-`error-overwritten`, but the correctness of it remains to be seen.
+`error-overwritten` ... and it was indeed added to the code soon
+afterward, but it turns out that it doesn't solve the entire problem
+of "two clients try to write the exact same data at the same time to
+the same LPN".
+
+
+## "Two Clients Try to Write the Exact Same Data at the Same Time to the Same LPN"
+
+This situation is something that CORFU cannot protect against, IMO.
+
+I have been struggling for a while, to try to find a way for CORFU
+clients to know *always* when there is a conflict with another
+writer.  It usually works: the basic nature of write-once registers is
+very powerful.  However, in the case where two clients are trying to
+write the same page data to the same LPN, it looks impossible to
+resolve.
+
+How do you tell the difference between:
+
+1. A race between a client A writing page P at address LPN and
+   read-repair fixing P.  P *is* A's data and no other's, so this race
+   doesn't confuse anyone.
+
+1. A race between a client A writing page P at address LPN and client
+   B writing the exact same page data P at the same LPN.
+   A's page P = B's page P, but clients A & B don't know that.
+
+   If CORFU tells both A & B that they were successful, A & B assume
+   that the CORFU log has two new pages appended to it, but in truth
+   only one new page was appended.
+
+If we try to solve this by always avoiding the same LPN address
+conflict, we are deluding ourselves.  If we assume that the sequencer
+is 100% correct in that it never assigns the same LPN twice, and if we
+assume that a client must never write a block without an assignment
+from the sequencer, then the problem is solved.  But the problem has a
+_heavy_ price: the log is only available when the sequencer is
+available, and only when never more than one sequencer running at a
+time.
+
+The CORFU base system promises correct operation, even if:
+
+* Zero sequencers are running, and clients might choose the same LPN
+  to write to.
+* Two more more sequencers are running, and different sequencers
+  assign the same LPN to two different clients.
+
+But CORFU's "correct" behavior does not include detecting the same
+page at the same LPN.  The papers don't specifically say it, alas.
+But IMO it's impossible to guarantee, so all docs ought to explicitly
+say that it's impossible and that clients must not assume it.
+
+See also
+* two-clients-race.1.png
+
+## A scenario of chain repair & write-once registers
+
+See:
+* 2014-02-27.chain-repair-write-twice.png
+
+... for a scenario where write-once registers that are truly only
+write-once-ever-for-the-rest-of-the-future are "inconvenient" when it
+comes to chain repair.  Client 3 is attempting to do chain repair ops,
+bringing FLU1 back into sync with FLU2.
+
+The diagram proposes one possible idea for making overwriting a
+read-once register a bit safer: ask another node in the chain to
+verify that the page you've been asked to repair is exactly the same
+as that other FLU's page.
+
diff --git a/prototype/corfurl/docs/corfurl/notes/two-clients-race.1.mscgen b/prototype/corfurl/docs/corfurl/notes/two-clients-race.1.mscgen
new file mode 100644
index 0000000..ce8e614
--- /dev/null
+++ b/prototype/corfurl/docs/corfurl/notes/two-clients-race.1.mscgen
@@ -0,0 +1,33 @@
+msc {
+    client1, FLU1, FLU2, client2, client3;
+
+    client1 -> FLU1      [label="{write,epoch1,<<Not unique page>>}"];
+    client1 <- FLU1      [label="ok"];
+
+    client3 -> FLU2      [label="{seal,epoch1}"];
+    client3 <- FLU2      [label="{ok,...}"];
+    client3 -> FLU1      [label="{seal,epoch1}"];
+    client3 <- FLU1      [label="{ok,...}"];
+
+    client2 -> FLU1      [label="{write,epoch1,<<Not unique page>>}"];
+    client2 <- FLU1      [label="error_epoch"];
+    client2 abox client2 [label="Ok, get the new epoch info....", textbgcolour="silver"];
+    client2 -> FLU1      [label="{write,epoch2,<<Not unique page>>}"];
+    client2 <- FLU1      [label="error_overwritten"];
+
+    client1 -> FLU2      [label="{write,epoch1,<<Not unique page>>}"];
+    client1 <- FLU2      [label="error_epoch"];
+    client1 abox client1 [label="Ok, hrm.", textbgcolour="silver"];
+
+    client3 abox client3 [ label = "Start read repair", textbgcolour="aqua"] ;
+    client3 -> FLU1      [label="{read,epoch2}"];
+    client3 <- FLU1      [label="{ok,<<Not unique page>>}"];
+    client3 -> FLU2      [label="{write,epoch2,<<Not unique page>>}"];
+    client3 <- FLU2      [label="ok"];
+    client3 abox client3 [ label = "End read repair", textbgcolour="aqua"] ;
+    client3 abox client3 [ label = "We saw <<Not unique page>>", textbgcolour="silver"] ;
+
+    client1 -> FLU2      [label="{write,epoch2,<<Not unique page>>}"];
+    client1 <- FLU2      [label="error_overwritten"];
+
+}