So, the PULSE test is failing, which is good. However, I believe
that the failures are all due to the model now being *too strict*.
The model is now catching failures which are now benign, I think.
{bummer_NOT_DISJOINT,{[a,b,b,c,d],
[{a,not_in_this_epoch},
{b,not_in_this_epoch},
{c,"[{epoch,1546},{author,c},{upi,[c]},{repair,[b]},{down,[a,d]},{d,[{ps,[{a,c},{c,a},{a,d},{b,d},{c,d}]},{nodes_up,[b,c]}]},{d2,[]}]"},
{d,"[{epoch,1546},{author,d},{upi,[d]},{repair,[a,b]},{down,[c]},{d,[{ps,[{c,b},{d,c}]},{nodes_up,[a,b,d]}]},{d2,[]}]"}]}}},
In this and all other examples, the UPIs are disjoint but the
repairs are not disjoint. I believe the model ought to be
ignoring the repair list.
{bummer_NOT_DISJOINT,{[a,a,b],
[{a,"[{epoch,1174},{author,a},{upi,[a]},{repair,[]},{down,[b]},{d,[{ps,[{a,b},{b,a}]},{nodes_up,[a]}]},{d2,[]}]"},
{b,"[{epoch,1174},{author,b},{upi,[b]},{repair,[a]},{down,[]},{d,[{ps,[]},{nodes_up,[a,b]}]},{d2,[]}]"}]}}},
or
{bummer_NOT_DISJOINT,{[c,c,e],
[{a,not_in_this_epoch},
{b,not_in_this_epoch},
{c,"[{epoch,1388},{author,c},{upi,[c]},{repair,[]},{down,[a,b,d,e]},{d,[{ps,[{a,b},{a,c},{c,a},{a,d},{d,a},{e,a},{c,b},{b,e},{e,b},{c,d},{e,c},{e,d}]},{nodes_up,[c]}]},{d2,[]}]"},
{d,not_in_this_epoch},
{e,"[{epoch,1388},{author,e},{upi,[e]},{repair,[c]},{down,[a,b,d]},{d,[{ps,[{a,b},{b,a},{a,c},{c,a},{a,d},{d,a},{a,e},{e,a},{b,c},{c,b},{b,d},{b,e},{e,b},{c,d},{d,c},{d,e},{e,d}]},{nodes_up,[c,e]}]},{d2,[]}]"}]}}},
Ha, famous last words, amirite?
%% The chain sequence/order checks at the bottom of this function aren't
%% as easy-to-read as they ought to be. However, I'm moderately confident
%% that it isn't buggy. TODO: refactor them for clarity.
So, now machi_chain_manager1:projection_transition_is_sane() is using
newer, far less buggy code to make sanity decisions.
TODO: Add support for Retrospective mode. TODO is it really needed?
Examples of how the old code sucks and the new code sucks less.
138> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))).
xxxxxxxxxxxx..x.xxxxxx..x.x....x..xx........................................................Failed! After 69 tests.
[a,b,c]
{c,[a,b,c],[c,b],b,[b,a],[b,a,c]}
Old_res ([335,192,166,160,153,139]): true
New_res: false (why line [1936])
Shrinking xxxxxxxxxxxx.xxxxxxx.xxx.xxxxxxxxxxxxxxxxx(3 times)
[a,b,c]
%% {Author1,UPI1, Repair1,Author2,UPI2, Repair2} %%
{c, [a,b,c],[], a, [b,a],[]}
Old_res ([338,185,160,153,147]): true
New_res: false (why line [1936])
false
Old code is wrong: we've swapped order of a & b, which is bad.
139> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))).
xxxxxxxxxx..x...xx..........xxx..x..............x......x............................................(x10)...(x1)........Failed! After 120 tests.
[b,c,a]
{c,[c,a],[c],a,[a,b],[b,a]}
Old_res ([335,192,185,160,153,123]): true
New_res: false (why line [1936])
Shrinking xx.xxxxxx.x.xxxxxxxx.xxxxxxxxxxx(4 times)
[b,a,c]
%% {Author1,UPI1,Repair1,Author2,UPI2, Repair2} %%
{a, [c], [], c, [c,b],[]}
Old_res ([338,185,160,153,147]): true
New_res: false (why line [1936])
false
Old code is wrong: b wasn't repairing in the previous state.
150> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))).
xxxxxxxxxxx....x...xxxxx..xx.....x.......xxx..x.......xxx...................x................x......(x10).....(x1)........xFailed! After 130 tests.
[c,a,b]
{b,[c],[b,a,c],c,[c,a,b],[b]}
Old_res ([335,214,185,160,153,147]): true
New_res: false (why line [1936])
Shrinking xxxx.x.xxx.xxxxxxx.xxxxxxxxx(4 times)
[c,b,a]
%% {Author1,UPI1,Repair1,Author2,UPI2, Repair2} %%
{c, [c], [a,b], c, [c,b,a],[]}
Old_res ([335,328,185,160,153,111]): true
New_res: false (why line [1981,1679])
false
Old code is wrong: a & b were repairing but UPI2 has a & b in the wrong order.
So, the PB style encoding of the Mpb_LL_WriteProjectionReq message
is about 35-36 times slower than using Erlang's term_to_binary()
and binary_to_term(). {sigh}
So, there's some cheating going on, because some of the parts of
the #projection_v1{} and #p_srvr{} records aren't fully specified.
Those parts are being specified as "opaque" in the field names, e.g.
optional bytes opaque_flap = 10;
optional bytes opaque_inner = 11;
required bytes opaque_dbg = 12;
required bytes opaque_dbg2 = 13;
The serialization that's being used is erlang term sexprs. That isn't
portable. So if/when we really need to deal with a non-Erlang
language, we'll have to straighten this out further.
Introduce machi_flu_psup:start_flu_package/4 as a way to start all
related FLU processes
* The projection store
* The chain manager
* The FLU itself
... as well as linked processes.
http://www.snookles.com/scotttmp/flu-tree-20150430.png shows one FLU
running, "a". The process registered "a" is the append server,
"some-prefix" for the sequencer & writer for the current <<"some-prefix">>
file, and a process each for 3 active TCP connections to that FLU.