Commit graph

72 commits

Author SHA1 Message Date
Scott Lystig Fritchie
a36f23ee7a WIP: stuck, need to add repairing list before continuing with projection sanity check 2015-03-02 20:20:19 +09:00
Scott Lystig Fritchie
8d9cabd214 Bring flowchart & code back into sync, yay! 2015-03-02 20:20:19 +09:00
Scott Lystig Fritchie
30f5a84cea Clean up cruft, add more comments 2015-03-02 20:20:19 +09:00
Scott Lystig Fritchie
32cfcccf34 First part of larger sanity test is now prototyped.
This is some brute-force-and-not-subtle hackery, but it looks like I've
got the basis for a test that a model checker (QuickCheck or Concuerror
or something else) can use for a good/bad check.

The following properties are examined (but not enforced):

* At each epoch, are each of the chains disjoint?  I.e. no single FLU
  is a member of different chains at the same epoch.

  This is a safety/sanity check.

* For each unique chain UPI list at each epoch, are all of the FLUs in that
  chain unanimous in their agreement:
    agreed_membership: all UPI FLUs agree about the UPI list
    not_agreed: the membership algorithm has not yet agreed on
                the UPI list

  This is not a safety/sanity check per se, but it can be useful input
  into a good safety check.

Some examples:

* At epoch 0, there is no agreement on UPI membership of the one [a,b,c]
  chain.
* At epoch 1, there is full agreement,
* At epoch 4, we're back to no agreement.
* At epoch 17, there's agreement on a small chain with UPI list=[a].
  (This agreement continues until epoch 216, but that history is omitted
  here.)

   [{0,
     {ok_disjoint,[{[a,b,c],
                    not_unique,0,
                    [<<159,215,105,140,29,151,142,2,162,90,225,209,10,102,119,
                       193,110,72,75,245>>,
                     <<213,46,129,248,23,50,210,247,145,68,65,112,232,101,28,56,
                       239,12,78,227>>,
                     <<230,146,66,183,10,218,57,29,233,166,108,176,118,109,
                       226,186,190,56,174,108>>]}]}},
    {1,{ok_disjoint,[{agreed_membership,[a,b,c]}]}},
    {4,
     {ok_disjoint,[{not_unique,[a,b,c],
                               [not_in_this_epoch,
                                <<208,227,221,233,254,160,36,134,252,106,
                                  124,192,101,171,168,68,169,55,2,54>>]}]}},
    {6,
     {ok_disjoint,[{not_unique,[a,b,c],
                               [not_in_this_epoch,
                                <<191,47,203,143,195,230,71,162,39,132,188,
                                  128,64,39,18,9,73,148,207,220>>]}]}},
    {17,{ok_disjoint,[{agreed_membership,[a]}]}},
    {24,{ok_disjoint,[{agreed_membership,[a]}]}},
    [...]

Starting at epoch 419, the network stabilized, but not fully,
into two "islands" of servers, a alone and b&c together.
At epoch 486, the network is fully stabilized with the same network
partition.  We see rapid convergence to two chains, [a] and [b,c].

    {419,{ok_disjoint,[{agreed_membership,[a]}]}},
    {425,{ok_disjoint,[{agreed_membership,[b]}]}},
    {436,{ok_disjoint,[{agreed_membership,[b]}]}},
    {442,{ok_disjoint,[{agreed_membership,[b]}]}},
    {444,{ok_disjoint,[{agreed_membership,[b]}]}},
    {454,{ok_disjoint,[{agreed_membership,[b]}]}},
    {456,{ok_disjoint,[{agreed_membership,[b]}]}},
    {458,{ok_disjoint,[{agreed_membership,[b]}]}},
    {463,{ok_disjoint,[{agreed_membership,[b]}]}},
    {468,{ok_disjoint,[{agreed_membership,[b]}]}},
    {479,{ok_disjoint,[{agreed_membership,[b]}]}},
    {482,{ok_disjoint,[{agreed_membership,[b]}]}},
    {486,{ok_disjoint,[{agreed_membership,[a]}]}},
    {488,{ok_disjoint,[{agreed_membership,[b]}]}},
    {490,{ok_disjoint,[{agreed_membership,[b,c]}]}},
    {492,{ok_disjoint,[{agreed_membership,[b,c]}]}}]

foo
2015-03-02 20:20:19 +09:00
Scott Lystig Fritchie
4def1ad026 Move test code from machi_chain_manager1.erl -> machi_chain_manager1_test.erl 2015-03-02 20:20:19 +09:00
Scott Lystig Fritchie
a2f087181e Change C210's sleep time to be proportional/ranked to member's rank 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
c0ef199c6f Hey, I think this is finally working, hooray! 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
258bce84d1 WIP: remove lots of debugging cruft 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
4fac90f5a9 Fix end-of-repair logic by querying repair target for epoch sync 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
0a6b8268fb Fix major error in rank_projection(), silly me 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
fdca511385 Fix broken machi_partition_simulator.erl, derp 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
0b88a12c16 WIP: Debugging cruft, egadz, but improving (see below)
So, this still pops up occasionally:

    % rebar skip_deps=true -v eunit suites=machi_flu0_test,machi_chain_manager1
    [...]
    a private: [{epoch,223},{author,a},{upi,[a,b]},{repair,[]},{down,[c]},{d,[{author_proc,react},{nodes_up,[a,b]}]},{d2,[{up_nodz,[a,b]},{hooray,{v2,{2014,11,3},{20,19,57}}}]}]
    b private: [{epoch,224},{author,b},{upi,[b,a]},{repair,[]},{down,[c]},{d,[{author_proc,react},{nodes_up,[a,b]}]},{d2,[{up_nodz,[a,b]},{hooray,{v2,{2014,11,3},{20,19,57}}}]}]
    c private: [{epoch,191},{author,c},{upi,[c]},{repair,[]},{down,[a,b]},{d,[{author_proc,react},{nodes_up,[c]}]},{d2,[{up_nodz,[c]},{hooray,{v2,{2014,11,3},{20,19,57}}}]}]

The mis-ordering between [a,b] and [b,a] happens after the partition settled
on the islands of [a,b] and [c].

    { c100 , ? LINE , _AnyOtherReturnValue } {c100,734,
                                          {err,error,
                                           {badmatch,[a,b]},
                                           from,
                                           [{epoch,70},
                                            {author,a},
                                            {upi,[a]},
                                            {repair,[b]},
                                            {down,[c]},
                                            {d,
                                             [{author_proc,react},
                                              {nodes_up,[a,b]}]},
                                            {d2,[]}],
                                           to,
                                           [{epoch,194},
                                            {author,b},
                                            {upi,[b,a]},
                                            {repair,[]},
                                            {down,[c]},
                                            {d,
                                             [{author_proc,react},
                                              {nodes_up,[a,b]}]},
                                            {d2,[]}],
                                           relative_to,a,stack,[...]
2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
5d0eed865a Duh, fix really stupid think-o bug in perhaps_call(), oi oi oi 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
a94374cc8c Add machi_partition_simulator.erl + refactor to use it 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
4d3a9ed757 WIP: Per notes: change unanimous test @ A20. Improvement! 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
201108ec5b Temp WIP: Change network partitions to be bi-directional only 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
6ef46a3464 Temp WIP: I am going to sever the connection between the flowchart and the code. TODO
That diagram is really valuable, but it also takes a long time
to make any kind of edit; the process is too slow.  This is a todo
item a reminder that the flowchart is important documentation and
must be brought back into sync with the code soon.
2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
9404e954e7 WIP: chain mgr clutter, trying to debug infinite loop 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
fd1b4363b9 WIP: chain manager getting better, but occasionally gets infinite loop (II) 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
67f94d1cff WIP: chain manager getting better, but occasionally gets infinite loop 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
2f54525422 Fix chain mgmt flowchart A40 conditions (II) 2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie
c1fd3df35d Fix chain mgmt flowchart A40 conditions 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
12d8a94497 Add reminder about chain manager init bootstrapping TODO 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
064b637d81 Remove docs/machi/flowchart-machi-chain-mgmt1.jpg 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
83e4937658 Chain manager projection store flowchart implemented & passes smoke test! 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
8faa1404c6 Remove unused prev_epoch_num and prev_epoch_csum 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
9af576d753 WIP: broken, don't use 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
cbc5260e93 WIP: Chain manager projection store flowchart goop draft 2 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
dbcc87b4a4 WIP: Chain manager projection store flowchart goop 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
ca5ddb2cf1 WIP: chain mgmt prototype scaffolding 9: before start of next simulator stage 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
0a77c09779 Fix non-TEST compilation problem 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
67b661494e WIP: chain mgmt prototype scaffolding 8: basic read repair done 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
932b6afb76 WIP: chain mgmt prototype scaffolding 7: inching better 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
616a11e230 WIP: chain mgmt prototype scaffolding 6: refactoring 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
e5b9230af0 WIP: chain mgmt prototype scaffolding 5: before refactor & continuing 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
b757878c81 WIP: chain mgmt prototype scaffolding 4: uncompileable at the moment 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
dfbbaf6bfe WIP: chain mgmt prototype scaffolding 3 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
63d23330b2 WIP: chain mgmt prototype scaffolding 2 2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie
add6f421aa WIP: chain mgmt prototype scaffolding 2 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
9c04537497 WIP: chain mgmt prototype scaffolding 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
3e499e241a WIP: Fix flu0 name registration 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
500a13a01d WIP: Machi chain management PULSE prototype work 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
b41dbffe95 Cruft cleanup 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
5e49bd6c29 WIP: Machi chain management PULSE prototype work 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
e9ea20e941 Move to private proj store for eunit tests 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
ddce145bfb Add public/private split in projection store of machi_flu0.erl 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
fd7dad0714 Coverage is about as good as it's going to get 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
da2bad564f Getting closer to understanding why test coverage appears so poor, part 2 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
342a972543 Getting closer to understanding why test coverage appears so poor 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
b4f2d314c7 More single chain manager simulation tests 2015-03-02 20:20:16 +09:00