I am treating the original write-once branch as a prototype
which I am now throwing away. I had too much work interleved
in there, so I felt like the best thing to do would be to cut
a new clean branch and pull the files over and start over
against a recent-ish master.
We will have to refactor the other things in FLU in a more
piecemeal fashion.
Last night we hit a rare case of failed convergence.
f was out of sync with the rest of the world.
f: upi=[b,g,f] repairing=[a,c]
The "rest of the world" used a larger chain at:
*: upi=[c,b,g,a], repairing=[f]
And f refused to join the larger chain because of the way that
IsRelevantToMe_p was being calculated before this commit.
Hrrrm, though, I'm not convinced that this particular problem
is fixed 100% by this patch. What if the chain lengths were
the same but also UPI incompatible? e.g. if I remove 'a' from
the "real world (in the partition simulator)" example above:
f: upi=[b,g,f] repairing=[c]
*: upi=[c,b,g], repairing=[f]
Hrmmmmm, I may need to reintroduce the my-recent-adopted-projection-
flapping-like-counter thingie to try to break this kind of
incompatible deadlock.
See comments added in this commit at A40.
So far, I've been doing CP mode testing with a handful of (very useful)
network partition combinations using:
machi_chain_manager1_converge_demo:t(3, [{private_write_verbose,true}, {consistency_mode, cp_mode}, {witnesses, [a]}]).
Next steps:
* Expand number & types of partitions
* Expand to chain lengths of 5 and beyond
So, I'm 50% sure this is a good idea for CP mode: if there's
a later public projection than P_current, then who knows what
we might have missed. So, call make_zerf() to find out the
absolute latest. Problem: flapping state appears to be lost,
booo.
If we use verbose output from:
machi_chain_manager1_converge_demo:t(3, [{private_write_verbose,true}, {consistency_mode, cp_mode}, {witnesses, [a]}]).
And use:
tail -f typescript_file | egrep --line-buffered 'SET|attempted|CONFIRM'
... then we can clearly see a chain safety violation when moving from
epoch 81 -> 83. I need to add more smarts to the safety checking,
both at the individual transition sanity check and at the converge_demo
overall rolling sanity check.
Key to output: CONFIRM by epoch {num} {csum} at {UPI} {Repairing}
SET # of FLUs = 3 members [a,b,c]).
CONFIRM by epoch 1 <<96,161,96,...>> at [a,b] [c]
CONFIRM by epoch 5 <<134,243,175,...>> at [b,c] []
CONFIRM by epoch 7 <<207,93,225,...>> at [b,c] []
CONFIRM by epoch 47 <<60,142,248,...>> at [b,c] []
SET partitions = [{c,b},{c,a}] (1 of 2) at {22,3,34}
CONFIRM by epoch 81 <<223,58,184,...>> at [a,b] []
SET partitions = [{b,c},{b,a}] (2 of 2) at {22,3,38}
CONFIRM by epoch 83 <<33,208,224,...>> at [a,c] []
SET partitions = []
CONFIRM by epoch 85 <<173,179,149,...>> at [a,c] [b]
So, the problem is that the chain manager isn't finishing repair
because UPI=[a], and a is a witness, and a can't do the list files etc etc
repair stuff that repairer FLUs need to do.
The best (?) way forward is to add some advance smarts to the
chain manager so that it doesn't propose a UPI of 100% witnesses?
PULSE managed to create a situation where machi_proxy_flu_client1
would appear to fail a remote attempt to write_projection. The
client would retry, but the 1st attempt really did get through to
the server. So, if we hit this case, we try to read the projection,
and if it's exactly equal to what we tried to write, we consider the
op a success.
Ditto for write_chunk.
Fix up eunit test to accomodate the change of semantics.
Due to changes by slf/chain-manager/cp-mode branch, there are
no longer extraneous epoch changes by "larger" authors that
re-suggest the same UPI+Repairing just because their author rank
is very slightly higher than the current epoch. Thus the
partial_stop_restart2() test only needs to deal with one epoch
change instead of the original two.
Run via:
env PULSE_NOSHRINK=yes PULSE_SKIP_NEW=yes PULSE_TIME=900 make pulse
So, this one hangs here:
tick-<0.991.0>,dump_state(){prop,machi_chain_manager1_pulse,358,<0.891.0>}
At machi_chain_manager1_pulse.erl line 358, that's after the return
of run_commands(). The next verbose message should come from line
362, after the return of pulse:run(), but that message never appears.
My laptop CPU is really busy (fans running, case is hot), but both
console & disterl aren't available now, so no idea why, alas.
Ah, when I run with a console available and then run Redbug, there is
zero activity calling both machi_chain_manager1_pulse:'_' and
machi_chain_manager1:'_'
This may be related to a bad/ugly shutdown? In both hang cases,
I see at least one SASL error message such as the one below ...
BUT! There should be erlang:display() messages from the shutdown_hard()
function, which does some exit(Pid, kill) calls, but there is no output
from them! So, the killing is coming from some kind of PULSE-initiated
process shutdown/cleanup/??
=SUPERVISOR REPORT==== 16-Jul-2015::20:24:31 ===
Supervisor: {local,machi_sup}
Context: shutdown_error
Reason: killed
Offender: [{pid,<0.200.0>},
{name,machi_flu_sup},
{mfargs,{machi_flu_sup,start_link,[]}},
{restart_type,permanent},
{shutdown,5000},
{child_type,supervisor}]
Also, add more misc details to the 'react' breadcrumb trail. Also,
save get(react) results into dbg2 whenever we write a private projection,
very valuable for debugging.
Also: cleanup PULSE code, add regression commands as option and
controls with some new environment variables. These regression
sequences were responsbile for several fruitful debugging sessions,
so we keep them for posterity and for their ability (with new seeds
and PULSE) to find new interleavings.
So, the PULSE test is failing, which is good. However, I believe
that the failures are all due to the model now being *too strict*.
The model is now catching failures which are now benign, I think.
{bummer_NOT_DISJOINT,{[a,b,b,c,d],
[{a,not_in_this_epoch},
{b,not_in_this_epoch},
{c,"[{epoch,1546},{author,c},{upi,[c]},{repair,[b]},{down,[a,d]},{d,[{ps,[{a,c},{c,a},{a,d},{b,d},{c,d}]},{nodes_up,[b,c]}]},{d2,[]}]"},
{d,"[{epoch,1546},{author,d},{upi,[d]},{repair,[a,b]},{down,[c]},{d,[{ps,[{c,b},{d,c}]},{nodes_up,[a,b,d]}]},{d2,[]}]"}]}}},
In this and all other examples, the UPIs are disjoint but the
repairs are not disjoint. I believe the model ought to be
ignoring the repair list.
{bummer_NOT_DISJOINT,{[a,a,b],
[{a,"[{epoch,1174},{author,a},{upi,[a]},{repair,[]},{down,[b]},{d,[{ps,[{a,b},{b,a}]},{nodes_up,[a]}]},{d2,[]}]"},
{b,"[{epoch,1174},{author,b},{upi,[b]},{repair,[a]},{down,[]},{d,[{ps,[]},{nodes_up,[a,b]}]},{d2,[]}]"}]}}},
or
{bummer_NOT_DISJOINT,{[c,c,e],
[{a,not_in_this_epoch},
{b,not_in_this_epoch},
{c,"[{epoch,1388},{author,c},{upi,[c]},{repair,[]},{down,[a,b,d,e]},{d,[{ps,[{a,b},{a,c},{c,a},{a,d},{d,a},{e,a},{c,b},{b,e},{e,b},{c,d},{e,c},{e,d}]},{nodes_up,[c]}]},{d2,[]}]"},
{d,not_in_this_epoch},
{e,"[{epoch,1388},{author,e},{upi,[e]},{repair,[c]},{down,[a,b,d]},{d,[{ps,[{a,b},{b,a},{a,c},{c,a},{a,d},{d,a},{a,e},{e,a},{b,c},{c,b},{b,d},{b,e},{e,b},{c,d},{d,c},{d,e},{e,d}]},{nodes_up,[c,e]}]},{d2,[]}]"}]}}},
Ha, famous last words, amirite?
%% The chain sequence/order checks at the bottom of this function aren't
%% as easy-to-read as they ought to be. However, I'm moderately confident
%% that it isn't buggy. TODO: refactor them for clarity.
So, now machi_chain_manager1:projection_transition_is_sane() is using
newer, far less buggy code to make sanity decisions.
TODO: Add support for Retrospective mode. TODO is it really needed?
Examples of how the old code sucks and the new code sucks less.
138> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))).
xxxxxxxxxxxx..x.xxxxxx..x.x....x..xx........................................................Failed! After 69 tests.
[a,b,c]
{c,[a,b,c],[c,b],b,[b,a],[b,a,c]}
Old_res ([335,192,166,160,153,139]): true
New_res: false (why line [1936])
Shrinking xxxxxxxxxxxx.xxxxxxx.xxx.xxxxxxxxxxxxxxxxx(3 times)
[a,b,c]
%% {Author1,UPI1, Repair1,Author2,UPI2, Repair2} %%
{c, [a,b,c],[], a, [b,a],[]}
Old_res ([338,185,160,153,147]): true
New_res: false (why line [1936])
false
Old code is wrong: we've swapped order of a & b, which is bad.
139> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))).
xxxxxxxxxx..x...xx..........xxx..x..............x......x............................................(x10)...(x1)........Failed! After 120 tests.
[b,c,a]
{c,[c,a],[c],a,[a,b],[b,a]}
Old_res ([335,192,185,160,153,123]): true
New_res: false (why line [1936])
Shrinking xx.xxxxxx.x.xxxxxxxx.xxxxxxxxxxx(4 times)
[b,a,c]
%% {Author1,UPI1,Repair1,Author2,UPI2, Repair2} %%
{a, [c], [], c, [c,b],[]}
Old_res ([338,185,160,153,147]): true
New_res: false (why line [1936])
false
Old code is wrong: b wasn't repairing in the previous state.
150> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))).
xxxxxxxxxxx....x...xxxxx..xx.....x.......xxx..x.......xxx...................x................x......(x10).....(x1)........xFailed! After 130 tests.
[c,a,b]
{b,[c],[b,a,c],c,[c,a,b],[b]}
Old_res ([335,214,185,160,153,147]): true
New_res: false (why line [1936])
Shrinking xxxx.x.xxx.xxxxxxx.xxxxxxxxx(4 times)
[c,b,a]
%% {Author1,UPI1,Repair1,Author2,UPI2, Repair2} %%
{c, [c], [a,b], c, [c,b,a],[]}
Old_res ([335,328,185,160,153,111]): true
New_res: false (why line [1981,1679])
false
Old code is wrong: a & b were repairing but UPI2 has a & b in the wrong order.
So, the PB style encoding of the Mpb_LL_WriteProjectionReq message
is about 35-36 times slower than using Erlang's term_to_binary()
and binary_to_term(). {sigh}
So, there's some cheating going on, because some of the parts of
the #projection_v1{} and #p_srvr{} records aren't fully specified.
Those parts are being specified as "opaque" in the field names, e.g.
optional bytes opaque_flap = 10;
optional bytes opaque_inner = 11;
required bytes opaque_dbg = 12;
required bytes opaque_dbg2 = 13;
The serialization that's being used is erlang term sexprs. That isn't
portable. So if/when we really need to deal with a non-Erlang
language, we'll have to straighten this out further.