Commit graph

228 commits

Author SHA1 Message Date
Scott Lystig Fritchie
c69a206039 WIP: stuck, need to add even MORE repairing list, before continuing 2015-03-02 20:20:19 +09:00
Scott Lystig Fritchie
a36f23ee7a WIP: stuck, need to add repairing list before continuing with projection sanity check 2015-03-02 20:20:19 +09:00
Scott Lystig Fritchie
8d9cabd214 Bring flowchart & code back into sync, yay! 2015-03-02 20:20:19 +09:00
Scott Lystig Fritchie
30f5a84cea Clean up cruft, add more comments 2015-03-02 20:20:19 +09:00
Scott Lystig Fritchie
32cfcccf34 First part of larger sanity test is now prototyped.
This is some brute-force-and-not-subtle hackery, but it looks like I've
got the basis for a test that a model checker (QuickCheck or Concuerror
or something else) can use for a good/bad check.

The following properties are examined (but not enforced):

* At each epoch, are each of the chains disjoint?  I.e. no single FLU
  is a member of different chains at the same epoch.

  This is a safety/sanity check.

* For each unique chain UPI list at each epoch, are all of the FLUs in that
  chain unanimous in their agreement:
    agreed_membership: all UPI FLUs agree about the UPI list
    not_agreed: the membership algorithm has not yet agreed on
                the UPI list

  This is not a safety/sanity check per se, but it can be useful input
  into a good safety check.

Some examples:

* At epoch 0, there is no agreement on UPI membership of the one [a,b,c]
  chain.
* At epoch 1, there is full agreement,
* At epoch 4, we're back to no agreement.
* At epoch 17, there's agreement on a small chain with UPI list=[a].
  (This agreement continues until epoch 216, but that history is omitted
  here.)

   [{0,
     {ok_disjoint,[{[a,b,c],
                    not_unique,0,
                    [<<159,215,105,140,29,151,142,2,162,90,225,209,10,102,119,
                       193,110,72,75,245>>,
                     <<213,46,129,248,23,50,210,247,145,68,65,112,232,101,28,56,
                       239,12,78,227>>,
                     <<230,146,66,183,10,218,57,29,233,166,108,176,118,109,
                       226,186,190,56,174,108>>]}]}},
    {1,{ok_disjoint,[{agreed_membership,[a,b,c]}]}},
    {4,
     {ok_disjoint,[{not_unique,[a,b,c],
                               [not_in_this_epoch,
                                <<208,227,221,233,254,160,36,134,252,106,
                                  124,192,101,171,168,68,169,55,2,54>>]}]}},
    {6,
     {ok_disjoint,[{not_unique,[a,b,c],
                               [not_in_this_epoch,
                                <<191,47,203,143,195,230,71,162,39,132,188,
                                  128,64,39,18,9,73,148,207,220>>]}]}},
    {17,{ok_disjoint,[{agreed_membership,[a]}]}},
    {24,{ok_disjoint,[{agreed_membership,[a]}]}},
    [...]

Starting at epoch 419, the network stabilized, but not fully,
into two "islands" of servers, a alone and b&c together.
At epoch 486, the network is fully stabilized with the same network
partition.  We see rapid convergence to two chains, [a] and [b,c].

    {419,{ok_disjoint,[{agreed_membership,[a]}]}},
    {425,{ok_disjoint,[{agreed_membership,[b]}]}},
    {436,{ok_disjoint,[{agreed_membership,[b]}]}},
    {442,{ok_disjoint,[{agreed_membership,[b]}]}},
    {444,{ok_disjoint,[{agreed_membership,[b]}]}},
    {454,{ok_disjoint,[{agreed_membership,[b]}]}},
    {456,{ok_disjoint,[{agreed_membership,[b]}]}},
    {458,{ok_disjoint,[{agreed_membership,[b]}]}},
    {463,{ok_disjoint,[{agreed_membership,[b]}]}},
    {468,{ok_disjoint,[{agreed_membership,[b]}]}},
    {479,{ok_disjoint,[{agreed_membership,[b]}]}},
    {482,{ok_disjoint,[{agreed_membership,[b]}]}},
    {486,{ok_disjoint,[{agreed_membership,[a]}]}},
    {488,{ok_disjoint,[{agreed_membership,[b]}]}},
    {490,{ok_disjoint,[{agreed_membership,[b,c]}]}},
    {492,{ok_disjoint,[{agreed_membership,[b,c]}]}}]

foo
2015-03-02 20:20:19 +09:00
Scott Lystig Fritchie
4def1ad026 Move test code from machi_chain_manager1.erl -> machi_chain_manager1_test.erl 2015-03-02 20:20:19 +09:00
Scott Lystig Fritchie
a2f087181e Change C210's sleep time to be proportional/ranked to member's rank 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
c0ef199c6f Hey, I think this is finally working, hooray! 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
258bce84d1 WIP: remove lots of debugging cruft 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
4fac90f5a9 Fix end-of-repair logic by querying repair target for epoch sync 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
0a6b8268fb Fix major error in rank_projection(), silly me 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
fdca511385 Fix broken machi_partition_simulator.erl, derp 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
0b88a12c16 WIP: Debugging cruft, egadz, but improving (see below)
So, this still pops up occasionally:

    % rebar skip_deps=true -v eunit suites=machi_flu0_test,machi_chain_manager1
    [...]
    a private: [{epoch,223},{author,a},{upi,[a,b]},{repair,[]},{down,[c]},{d,[{author_proc,react},{nodes_up,[a,b]}]},{d2,[{up_nodz,[a,b]},{hooray,{v2,{2014,11,3},{20,19,57}}}]}]
    b private: [{epoch,224},{author,b},{upi,[b,a]},{repair,[]},{down,[c]},{d,[{author_proc,react},{nodes_up,[a,b]}]},{d2,[{up_nodz,[a,b]},{hooray,{v2,{2014,11,3},{20,19,57}}}]}]
    c private: [{epoch,191},{author,c},{upi,[c]},{repair,[]},{down,[a,b]},{d,[{author_proc,react},{nodes_up,[c]}]},{d2,[{up_nodz,[c]},{hooray,{v2,{2014,11,3},{20,19,57}}}]}]

The mis-ordering between [a,b] and [b,a] happens after the partition settled
on the islands of [a,b] and [c].

    { c100 , ? LINE , _AnyOtherReturnValue } {c100,734,
                                          {err,error,
                                           {badmatch,[a,b]},
                                           from,
                                           [{epoch,70},
                                            {author,a},
                                            {upi,[a]},
                                            {repair,[b]},
                                            {down,[c]},
                                            {d,
                                             [{author_proc,react},
                                              {nodes_up,[a,b]}]},
                                            {d2,[]}],
                                           to,
                                           [{epoch,194},
                                            {author,b},
                                            {upi,[b,a]},
                                            {repair,[]},
                                            {down,[c]},
                                            {d,
                                             [{author_proc,react},
                                              {nodes_up,[a,b]}]},
                                            {d2,[]}],
                                           relative_to,a,stack,[...]
2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
5d0eed865a Duh, fix really stupid think-o bug in perhaps_call(), oi oi oi 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
a94374cc8c Add machi_partition_simulator.erl + refactor to use it 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
4d3a9ed757 WIP: Per notes: change unanimous test @ A20. Improvement! 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
201108ec5b Temp WIP: Change network partitions to be bi-directional only 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
6ef46a3464 Temp WIP: I am going to sever the connection between the flowchart and the code. TODO
That diagram is really valuable, but it also takes a long time
to make any kind of edit; the process is too slow.  This is a todo
item a reminder that the flowchart is important documentation and
must be brought back into sync with the code soon.
2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
9404e954e7 WIP: chain mgr clutter, trying to debug infinite loop 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
fd1b4363b9 WIP: chain manager getting better, but occasionally gets infinite loop (II) 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
67f94d1cff WIP: chain manager getting better, but occasionally gets infinite loop 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
2f54525422 Fix chain mgmt flowchart A40 conditions (II) 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
c1fd3df35d Fix chain mgmt flowchart A40 conditions 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
12d8a94497 Add reminder about chain manager init bootstrapping TODO 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
064b637d81 Remove docs/machi/flowchart-machi-chain-mgmt1.jpg 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
83e4937658 Chain manager projection store flowchart implemented & passes smoke test! 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
8faa1404c6 Remove unused prev_epoch_num and prev_epoch_csum 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
9af576d753 WIP: broken, don't use 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
cbc5260e93 WIP: Chain manager projection store flowchart goop draft 2 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
dbcc87b4a4 WIP: Chain manager projection store flowchart goop 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
ca5ddb2cf1 WIP: chain mgmt prototype scaffolding 9: before start of next simulator stage 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
0a77c09779 Fix non-TEST compilation problem 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
67b661494e WIP: chain mgmt prototype scaffolding 8: basic read repair done 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
932b6afb76 WIP: chain mgmt prototype scaffolding 7: inching better 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
616a11e230 WIP: chain mgmt prototype scaffolding 6: refactoring 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
e5b9230af0 WIP: chain mgmt prototype scaffolding 5: before refactor & continuing 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
b757878c81 WIP: chain mgmt prototype scaffolding 4: uncompileable at the moment 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
dfbbaf6bfe WIP: chain mgmt prototype scaffolding 3 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
63d23330b2 WIP: chain mgmt prototype scaffolding 2 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
add6f421aa WIP: chain mgmt prototype scaffolding 2 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
9c04537497 WIP: chain mgmt prototype scaffolding 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
3e499e241a WIP: Fix flu0 name registration 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
500a13a01d WIP: Machi chain management PULSE prototype work 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
b41dbffe95 Cruft cleanup 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
5e49bd6c29 WIP: Machi chain management PULSE prototype work 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
e9ea20e941 Move to private proj store for eunit tests 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
ddce145bfb Add public/private split in projection store of machi_flu0.erl 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
fd7dad0714 Coverage is about as good as it's going to get 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
da2bad564f Getting closer to understanding why test coverage appears so poor, part 2 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
342a972543 Getting closer to understanding why test coverage appears so poor 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
b4f2d314c7 More single chain manager simulation tests 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
e717d797b3 Move almost all test code to test/* modules 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
01e3325b81 Tiny refactoring of random number gen 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
d2f93e919e Single chain manager simulation test: no bad projection transitions! 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
057f958bb1 WIP: chain manager simulation test 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
410c8ff7ce WIP: chain manager simulation test 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
b8c87b23ad WIP: chain manager simulation test 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
4ebc80dc39 Add src/machi_util.erl 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
a81552ed82 Makefile un-derp'ing 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
a5dc72834f Fix proj0_test for concuerror, yay! 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
4969e019b2 Fix proj0_test for concuerror, yay! 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
e50e669b79 TODO left off here 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
97c5789b44 WIP: eunit tests pass, but Concuerror loops forever then errs on max retries on proj0_test 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
f7447e8953 WIP: done (I hope) adding Lamport clocks 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
ee7bc2645b WIP: in the middle of adding Lamport clocks 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
b443a15542 register op name sanity: write and _read_ 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
921d90a69b WIP: enforce wedging and new projection writes 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
bebce51ab9 WIP: minimal write-once projection store in FLU 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
34c8c6490a WIP: add Name arg to start_link() 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
2d3a29471d Minimal FLU0 single register, plus Concuerror tests 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
f378204a91 Add fledgling log implementation based on CORFU papers (corfurl stuff) 2015-03-02 20:20:07 +09:00
Scott Lystig Fritchie
370f70303d Merge branch 'merge/tango-prototype' 2015-03-02 20:07:25 +09:00
Scott Lystig Fritchie
94ebd4bb6f Rename prototype/tango-prototype -> prototype/tango 2015-03-02 20:06:45 +09:00
Scott Lystig Fritchie
3f3f3e4f5d Update README.tango.md with latest checkpoint implementation fix notes 2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie
c5ed355dac Rename tango readme 2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie
8da46f78fe BAH! Checkpoint is quite broken, see new README.tango.md 2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie
7bf98fa648 All tests pass, but checkpointing does not truncate history 2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie
fed2f43783 WIP: all but queue checkpointing now passes 2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie
0b3bb3ee7c WIP: tango_oid_test now passes 2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie
a0bb7ee23d WIP: tango_oid refactoring, all broken: infinite loop 2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie
9a3ac02413 WIP: first round of tango_oid refactoring, all broken horribly 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
03f071316c Gadz, more sequencer cleanup. corfurl_test now passes 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
30fc62ab22 Gadz, more sequencer cleanup. corfurl_sequencer_test now passes 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
b8c051c89f Fix broken sequencer semantics.
It occurred to me today that I implemented the sequencer incorrectly and
hadn't yet noticed because I don't have any tests that are
complex/interleaved/perhaps-non-deterministic to find the problem.
The problem is that the sequencer's current implementation only keeps
track of the last LPN for any Tango stream.

The fix is to do what the paper actually says: the sequencer keeps a
*list* of the last $K$ LPNs for each stream.  Derp.  Yes, that's really
necessary to avoid a pretty simple race condition with 2 actors
simultaneously updating a single Tango stream.

1st commit: fix the implementation and the smoke test.  The
broken-everything-else will be repaired in later commits.
2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
940012cef1 Add checkpoint support for tango_dt_map 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
4cf8ac7ed8 Add checkpoint support for tango_dt_queue 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
970eb263db Fix bug in backpointer handling, derp! 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
004a18d948 Add checkpoint support for tango_dt_register 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
7b9c94553c Add skeleton support for single-page checkpointing 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
1c1e1368dd Added src/tango_dt_queue.erl plus test 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
6caeaeb6b5 Ha! Damn quick and easy to add tango_dt_map.erl 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
df53ec0a4e Refactor register DT into tango_dt.erl and tango_dt_register.erl 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
c068057c96 Add missing func corfurl_client:append_page/3, then fix tango_dt_register_test 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
cdeddbb582 Heh, demonstrate a concurrency bug that I knew was there, yay, fixit time! 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
18b38c249e First draft of tango_dt_register 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
6067e26201 Change semantics of OID map, silly me, to match what's needed 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
436c6ac14b Minor type fixup 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
4fe4758d7a Generic parameterization of the map, done badly, part 1 2015-03-02 20:03:44 +09:00
Scott Lystig Fritchie
9c73872d20 Fix TEST vs PULSE tests 2015-03-02 20:03:44 +09:00
Scott Lystig Fritchie
e9f16d7b1b Dialyzer clean 2015-03-02 20:03:44 +09:00