Commit graph

911 commits

Author SHA1 Message Date
Scott Lystig Fritchie
8ee3377fa7 Fix a state transition bug (chain manager infinite loop, oops)
%% We have a small problem for state transition sanity checking in the
    %% case where we are flapping *and* a repair has finished.  One of the
    %% sanity checks in simple_chain_state_transition_is_sane(() is that
    %% the author of P2 in this case must be the tail of P1's UPI: i.e.,
    %% it's the tail's responsibility to perform repair, therefore the tail
    %% must damn well be the author of any transition that says a repair
    %% finished successfully.
    %%
    %% The problem is that author_server of the inner projection does not
    %% reflect the actual author!  See the comment with the text
    %% "The inner projection will have a fake author" in
    %react_to_env_A30().
    %%
    %% So, there's a special return value that tells us to try to check for
    %% the correct authorship here.
2015-07-05 14:52:50 +09:00
Scott Lystig Fritchie
920c0fc610 WIP: much better structure for inner projection sanity checking 2015-07-04 16:46:02 +09:00
Scott Lystig Fritchie
8241d1f600 WIP: cruft, needs refactoring 2015-07-04 14:57:38 +09:00
Scott Lystig Fritchie
65ee0c23ec Adjust author of inner projections to yield same checksum 2015-07-04 01:58:00 +09:00
Scott Lystig Fritchie
cd026303a0 Unused var cleanup 2015-07-04 00:35:05 +09:00
Scott Lystig Fritchie
9b0a5a1dc3 WIP: 1st part of moving old chain state transtion code to new
Ha, famous last words, amirite?

    %% The chain sequence/order checks at the bottom of this function aren't
    %% as easy-to-read as they ought to be.  However, I'm moderately confident
    %% that it isn't buggy.  TODO: refactor them for clarity.

So, now machi_chain_manager1:projection_transition_is_sane() is using
newer, far less buggy code to make sanity decisions.

TODO: Add support for Retrospective mode. TODO is it really needed?

Examples of how the old code sucks and the new code sucks less.

    138> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))).
    xxxxxxxxxxxx..x.xxxxxx..x.x....x..xx........................................................Failed! After 69 tests.
    [a,b,c]
    {c,[a,b,c],[c,b],b,[b,a],[b,a,c]}
    Old_res ([335,192,166,160,153,139]): true
    New_res: false (why line [1936])
    Shrinking xxxxxxxxxxxx.xxxxxxx.xxx.xxxxxxxxxxxxxxxxx(3 times)
    [a,b,c]
 %% {Author1,UPI1,   Repair1,Author2,UPI2, Repair2} %%
    {c,      [a,b,c],[],     a,      [b,a],[]}
    Old_res ([338,185,160,153,147]): true
    New_res: false (why line [1936])
    false

Old code is wrong: we've swapped order of a & b, which is bad.

    139> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))).
    xxxxxxxxxx..x...xx..........xxx..x..............x......x............................................(x10)...(x1)........Failed! After 120 tests.
    [b,c,a]
    {c,[c,a],[c],a,[a,b],[b,a]}
    Old_res ([335,192,185,160,153,123]): true
    New_res: false (why line [1936])
    Shrinking xx.xxxxxx.x.xxxxxxxx.xxxxxxxxxxx(4 times)
    [b,a,c]
 %% {Author1,UPI1,Repair1,Author2,UPI2, Repair2} %%
    {a,      [c], [],     c,      [c,b],[]}
    Old_res ([338,185,160,153,147]): true
    New_res: false (why line [1936])
    false

Old code is wrong: b wasn't repairing in the previous state.

    150> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))).
    xxxxxxxxxxx....x...xxxxx..xx.....x.......xxx..x.......xxx...................x................x......(x10).....(x1)........xFailed! After 130 tests.
    [c,a,b]
    {b,[c],[b,a,c],c,[c,a,b],[b]}
    Old_res ([335,214,185,160,153,147]): true
    New_res: false (why line [1936])
    Shrinking xxxx.x.xxx.xxxxxxx.xxxxxxxxx(4 times)
    [c,b,a]
 %% {Author1,UPI1,Repair1,Author2,UPI2,   Repair2} %%
    {c,      [c], [a,b],  c,      [c,b,a],[]}
    Old_res ([335,328,185,160,153,111]): true
    New_res: false (why line [1981,1679])
    false

Old code is wrong: a & b were repairing but UPI2 has a & b in the wrong order.
2015-07-04 00:32:28 +09:00
Scott Lystig Fritchie
42fb6dd002 WIP: it's clear that the legacy state transition check is broken, II 2015-07-03 23:37:36 +09:00
Scott Lystig Fritchie
caeb322725 WIP: it's clear that the legacy state transition check is broken 2015-07-03 23:17:34 +09:00
Scott Lystig Fritchie
83015c319d WIP: yeah, now we're going places 2015-07-03 22:05:35 +09:00
Scott Lystig Fritchie
6a706cbfeb WIP: Refactoring and prototyping goop, broken test 2015-07-03 19:21:41 +09:00
Scott Lystig Fritchie
4a09bfa2d1 Merge branch 'slf/flu-cleanup1' 2015-07-03 16:19:10 +09:00
Scott Lystig Fritchie
9b3cd9056a Un-TEST'ify testr_react_to_env() everywhere 2015-07-03 16:18:40 +09:00
Scott Lystig Fritchie
78c81f93b7 Make machi_chain_manager1_pulse max commands length longer 2015-07-03 16:06:33 +09:00
Scott Lystig Fritchie
2b64028bbd Add kick_projection_reaction, implement yo:tell_author_yo() 2015-07-03 04:30:05 +09:00
Scott Lystig Fritchie
c6870a1c86 If FLU is wedged by a newer client epoch ID, kick the chain manager to react 2015-07-03 02:17:01 +09:00
Scott Lystig Fritchie
ff66638eb3 Sequencer changes file sequence number when epoch_id change is detected 2015-07-03 02:04:04 +09:00
Scott Lystig Fritchie
9cf77f4406 WIP: Refactoring and prototyping goop, broken test 2015-07-03 00:59:04 +09:00
Scott Lystig Fritchie
8820a71152 Clean up comment cruft & line wrap yak shaving 2015-07-02 14:44:47 +09:00
Scott Lystig Fritchie
039fd5fb78 Merge branch 'slf/pb-api-experiment3' 2015-07-01 18:33:33 +09:00
Scott Lystig Fritchie
da3a56dd74 Fix epoch checking in eunit tests and enforcement by FLU (always permit list_files()) 2015-07-01 18:12:22 +09:00
Scott Lystig Fritchie
38c1a2ab5d Fix Epoch handling in machi_flu_psup_test.erl 2015-07-01 17:46:35 +09:00
Scott Lystig Fritchie
576d3d76a2 Extend machi_chain_manager1_pulse fudge time factor 2015-07-01 17:46:10 +09:00
Scott Lystig Fritchie
2c869ed598 TODO fix: wedge self 2015-07-01 17:19:11 +09:00
Scott Lystig Fritchie
1e14fe878f Ha, oops! Add bad_epoch code, derp 1 2015-07-01 15:51:25 +09:00
Scott Lystig Fritchie
a658a64482 Cosmetic formatting change 2015-07-01 15:37:53 +09:00
Scott Lystig Fritchie
a0061d6ffa make decode_csum_file_entry() very slightly less brittle 2015-07-01 15:18:57 +09:00
Scott Lystig Fritchie
d710d90ea7 Fix usage of checksum_list by machi_chain_repair.erl 2015-07-01 15:04:22 +09:00
Scott Lystig Fritchie
0321e05b46 Fix usage of checksum_list by machi_basho_bench_driver.erl 2015-07-01 15:03:56 +09:00
Scott Lystig Fritchie
f5ae417b9e Clarify verify_file_checksums_test_ 2015-07-01 14:16:31 +09:00
Scott Lystig Fritchie
670bd2cafc Add some flexibility to machi_chain_manager1_converge_demo:t/1 and t/2 2015-07-01 14:08:17 +09:00
Scott Lystig Fritchie
e3b80c6ac2 Docuemntation updates 2015-06-30 19:04:23 +09:00
Scott Lystig Fritchie
00c8cf0ef7 Rename temporary HTTP server hack functions 2015-06-30 16:19:44 +09:00
Scott Lystig Fritchie
7542fe8225 WIP: all eunit tests are passing again, yay 2015-06-30 16:12:23 +09:00
Scott Lystig Fritchie
e9d50a2128 WIP: Reinstate one eunit test, fix type bugs 2015-06-30 15:51:03 +09:00
Scott Lystig Fritchie
3d2b49b7e5 WIP: refactoring & edoc'ing 2015-06-30 15:20:35 +09:00
Scott Lystig Fritchie
310fdb1f6a Add crude file size check to do_server_checksum_listing() 2015-06-30 14:13:26 +09:00
Scott Lystig Fritchie
2d070bf1e3 Minor refactoring + add demo/exploratory time measurement code
%% Demo/exploratory hackery to check relative speeds of dealing with
%% checksum data in different ways.
%%
%% Summary:
%%
%% * Use compact binary encoding, with 1 byte header for entry length.
%%     * Because the hex-style code is *far* slower just for enc & dec ops.
%%     * For 1M entries of enc+dec: 0.215 sec vs. 15.5 sec.
%% * File sorter when sorting binaries as-is is only 30-40% slower
%%   than an in-memory split (of huge binary emulated by file:read_file()
%%   "big slurp") and sort of the same as-is sortable binaries.
%% * File sorter slows by a factor of about 2.5 if {order, fun compare/2}
%%   function must be used, i.e. because the checksum entry lengths differ.
%% * File sorter + {order, fun compare/2} is still *far* faster than external
%%   sort by OS X's sort(1) of sortable ASCII hex-style:
%%   4.5 sec vs. 21 sec.
%% * File sorter {order, fun compare/2} is faster than in-memory sort
%%   of order-friendly 3-tuple-style: 4.5 sec vs. 15 sec.
2015-06-30 14:08:46 +09:00
Scott Lystig Fritchie
2a4ae1ba52 Merge branch 'slf/pb-api-experiment2' 2015-06-29 17:31:52 +09:00
Scott Lystig Fritchie
34b046acbd Remove machi_pb_wrap.erl 2015-06-29 17:31:07 +09:00
Scott Lystig Fritchie
55db22efff Merge branch 'slf/pb-api-experiment2' 2015-06-29 17:20:35 +09:00
Scott Lystig Fritchie
dba7041929 Change names to indicate we're no longer in PB land 2015-06-29 17:20:17 +09:00
Scott Lystig Fritchie
151e696324 WIP: yank out more unused cruft 2015-06-29 17:14:33 +09:00
Scott Lystig Fritchie
87ec988353 WIP: yank out more unused cruft 2015-06-29 17:06:28 +09:00
Scott Lystig Fritchie
6cd3b8d0ec WIP: yank out lots of unused cruft 2015-06-29 17:02:58 +09:00
Scott Lystig Fritchie
d54c74f58a WIP: yank out io:format 2015-06-29 16:53:41 +09:00
Scott Lystig Fritchie
3089288338 WIP: giant hairball 13: all unit tests are passing again, yay! 2015-06-29 16:48:06 +09:00
Scott Lystig Fritchie
7aff9fca70 WIP: giant hairball 12 2015-06-29 16:42:05 +09:00
Scott Lystig Fritchie
b25ab3b7ac WIP: giant hairball 11 2015-06-29 16:24:57 +09:00
Scott Lystig Fritchie
64817dd7e8 WIP: giant hairball 01 2015-06-29 16:10:43 +09:00
Scott Lystig Fritchie
f45dc7829e WIP: hairball, but: Failed: 6. Skipped: 0. Passed: 13 2015-06-27 00:43:27 +09:00