There are a couple of weird things in the snippet below (AP mode):
22:32:58.209 b uses inner: [{epoch,136},{author,c},{mode,ap_mode},{witnesses,[]},{upi,[b,c]},{repair,[]},{down,[a]},{flap,undefined},{d,[d_foo1,{ps,[{a,b}]},{nodes_up,[b,c]}]},{d2,[]}] (outer flap epoch 136: {flap_i,{{{epk,115},{1439,904777,11627}},28},[a,{a,problem_with,b},{b,problem_with,a}],[{a,{{{epk,126},{1439,904777,149865}},16}},{b,{{{epk,115},{1439,904777,11627}},28}},{c,{{{epk,121},{1439,904777,134392}},15}}]}) (my flap {{epk,115},{1439,904777,11627}} 29 [{a,{{{epk,126},{1439,904777,149865}},28}},{b,{{{epk,115},{1439,904777,11627}},29}},{c,{{{epk,121},{1439,904777,134392}},26}}])
22:32:58.224 c uses inner: [{epoch,136},{author,c},{mode,ap_mode},{witnesses,[]},{upi,[b,c]},{repair,[]},{down,[a]},{flap,undefined},{d,[d_foo1,{ps,[{a,b}]},{nodes_up,[b,c]}]},{d2,[]}] (outer flap epoch 136: {flap_i,{{{epk,115},{1439,904777,11627}},28},[a,{a,problem_with,b},{b,problem_with,a}],[{a,{{{epk,126},{1439,904777,149865}},16}},{b,{{{epk,115},{1439,904777,11627}},28}},{c,{{{epk,121},{1439,904777,134392}},15}}]}) (my flap {{epk,121},{1439,904777,134392}} 28 [{a,{{{epk,126},{1439,904777,149865}},28}},{b,{{{epk,115},{1439,904777,11627}},28}},{c,{{{epk,121},{1439,904777,134392}},28}}])
CONFIRM by epoch inner 136 <<103,64,252,...>> at [b,c] []
Priv1 [{a,{{132,<<"Cï|ÿzKX:Á"...>>},[a],[c],[b],[],false}},
{b,{{127,<<185,139,3,2,96,189,...>>},[b,c],[],[a],[],false}},
{c,{{133,<<145,71,223,6,177,...>>},[b,c],[a],[],[],false}}] agree false
Pubs: [{a,136},{b,136},{c,136}]
DoIt,
1. Both the "uses inner" messages and also the "CONFIRM by epoch inner 136"
show that B & C are using the same inner projection.
However, the 'Priv1' output shows b & c on different epochs, 127 & 133.
Weird.
2. I've added an infinite loop, probably in this commit. :-(
If we use verbose output from:
machi_chain_manager1_converge_demo:t(3, [{private_write_verbose,true}, {consistency_mode, cp_mode}, {witnesses, [a]}]).
And use:
tail -f typescript_file | egrep --line-buffered 'SET|attempted|CONFIRM'
... then we can clearly see a chain safety violation when moving from
epoch 81 -> 83. I need to add more smarts to the safety checking,
both at the individual transition sanity check and at the converge_demo
overall rolling sanity check.
Key to output: CONFIRM by epoch {num} {csum} at {UPI} {Repairing}
SET # of FLUs = 3 members [a,b,c]).
CONFIRM by epoch 1 <<96,161,96,...>> at [a,b] [c]
CONFIRM by epoch 5 <<134,243,175,...>> at [b,c] []
CONFIRM by epoch 7 <<207,93,225,...>> at [b,c] []
CONFIRM by epoch 47 <<60,142,248,...>> at [b,c] []
SET partitions = [{c,b},{c,a}] (1 of 2) at {22,3,34}
CONFIRM by epoch 81 <<223,58,184,...>> at [a,b] []
SET partitions = [{b,c},{b,a}] (2 of 2) at {22,3,38}
CONFIRM by epoch 83 <<33,208,224,...>> at [a,c] []
SET partitions = []
CONFIRM by epoch 85 <<173,179,149,...>> at [a,c] [b]
So, the problem is that the chain manager isn't finishing repair
because UPI=[a], and a is a witness, and a can't do the list files etc etc
repair stuff that repairer FLUs need to do.
The best (?) way forward is to add some advance smarts to the
chain manager so that it doesn't propose a UPI of 100% witnesses?