machi

greg/machi

Author	SHA1	Message	Date
Scott Lystig Fritchie	2060b80830	Keep good refactorings from commit a8390ee2 Also, add more misc details to the 'react' breadcrumb trail. Also, save get(react) results into dbg2 whenever we write a private projection, very valuable for debugging. Also: cleanup PULSE code, add regression commands as option and controls with some new environment variables. These regression sequences were responsbile for several fruitful debugging sessions, so we keep them for posterity and for their ability (with new seeds and PULSE) to find new interleavings.	2015-07-10 15:04:50 +09:00
Scott Lystig Fritchie	297d29c79b	Finish fixups to the chmgr state transition checking	2015-07-07 23:03:14 +09:00
Scott Lystig Fritchie	3aa3e00806	WIP: major fixups to the chmgr state transition checking (more below) So, the PULSE test is failing, which is good. However, I believe that the failures are all due to the model now being too strict. The model is now catching failures which are now benign, I think. {bummer_NOT_DISJOINT,{[a,b,b,c,d], [{a,not_in_this_epoch}, {b,not_in_this_epoch}, {c,"[{epoch,1546},{author,c},{upi,[c]},{repair,[b]},{down,[a,d]},{d,[{ps,[{a,c},{c,a},{a,d},{b,d},{c,d}]},{nodes_up,[b,c]}]},{d2,[]}]"}, {d,"[{epoch,1546},{author,d},{upi,[d]},{repair,[a,b]},{down,[c]},{d,[{ps,[{c,b},{d,c}]},{nodes_up,[a,b,d]}]},{d2,[]}]"}]}}}, In this and all other examples, the UPIs are disjoint but the repairs are not disjoint. I believe the model ought to be ignoring the repair list. {bummer_NOT_DISJOINT,{[a,a,b], [{a,"[{epoch,1174},{author,a},{upi,[a]},{repair,[]},{down,[b]},{d,[{ps,[{a,b},{b,a}]},{nodes_up,[a]}]},{d2,[]}]"}, {b,"[{epoch,1174},{author,b},{upi,[b]},{repair,[a]},{down,[]},{d,[{ps,[]},{nodes_up,[a,b]}]},{d2,[]}]"}]}}}, or {bummer_NOT_DISJOINT,{[c,c,e], [{a,not_in_this_epoch}, {b,not_in_this_epoch}, {c,"[{epoch,1388},{author,c},{upi,[c]},{repair,[]},{down,[a,b,d,e]},{d,[{ps,[{a,b},{a,c},{c,a},{a,d},{d,a},{e,a},{c,b},{b,e},{e,b},{c,d},{e,c},{e,d}]},{nodes_up,[c]}]},{d2,[]}]"}, {d,not_in_this_epoch}, {e,"[{epoch,1388},{author,e},{upi,[e]},{repair,[c]},{down,[a,b,d]},{d,[{ps,[{a,b},{b,a},{a,c},{c,a},{a,d},{d,a},{a,e},{e,a},{b,c},{c,b},{b,d},{b,e},{e,b},{c,d},{d,c},{d,e},{e,d}]},{nodes_up,[c,e]}]},{d2,[]}]"}]}}},	2015-07-07 22:11:19 +09:00
Scott Lystig Fritchie	c8ce99023e	WIP: model checking refactoring TODO	2015-07-07 18:32:04 +09:00
Scott Lystig Fritchie	d5f521f2bd	Various test updates	2015-07-07 15:02:29 +09:00
Scott Lystig Fritchie	009b3f44af	Fix eunit test broken by `3f8982cb`	2015-07-07 15:01:50 +09:00
Scott Lystig Fritchie	badcfa3064	Remove comment cruft	2015-07-07 14:32:02 +09:00
Scott Lystig Fritchie	0f3d11e1bf	Bugfix (part II) rare race between just-finished repair and flapping ending The prior commit wasn't sufficient: the range of transitions is wider than assumed by that commit. So, we take one of two options, with a TODO task of researching the other option.	2015-07-07 14:30:21 +09:00
Scott Lystig Fritchie	96ca7b7082	Bugfix for rare race between just-finished repair and flapping ending Fix for today: We are going to game the system. We know that C100 is going to be checking authorship relative to P_current's UPI's tail. Therefore, we're just going to set it here. Why??? Because we have been using this projection safely for the entire flapping period! ... The only other way I see is to allow C100 to carve out an exception if the repair finished PLUS author_server check fails PLUS if we came from here, but that feels a bit fragile to me: if some code factoring happens in projection_transition_is_saneprojection_transition_is_sane() or elsewhere that causes the author_server check to be something-other-than-the-final-thing-checked, then such a refactoring would likely cause an even harder bug to find & fix. Conditions tested: 5 FLUs plus alternating partitions of: [ [{a,b}], [], [{a,b}], [], [{a,b}], [], [{a,b}], [], [{a,b}], [], [{b,a},{d,e}], [{a,b}], [], [{a,b}], [], [{a,b}], [], [{a,b}], [], [{a,b}], [] ].	2015-07-07 01:29:37 +09:00
Scott Lystig Fritchie	54b5014446	WIP: bugfix in transition, just-in-case commit	2015-07-06 23:56:29 +09:00
Scott Lystig Fritchie	9d4b4b1df6	Bugfix: update inner projection based on previous inner projection	2015-07-06 17:38:15 +09:00
Scott Lystig Fritchie	3f8982cbe1	MAJOR WIP: set author's rank to constant 0? Worthwhile??	2015-07-06 16:12:15 +09:00
Scott Lystig Fritchie	471cde1f2c	WIP: debugging fmt shuffle	2015-07-06 16:11:14 +09:00
Scott Lystig Fritchie	8ee3377fa7	Fix a state transition bug (chain manager infinite loop, oops) %% We have a small problem for state transition sanity checking in the %% case where we are flapping and a repair has finished. One of the %% sanity checks in simple_chain_state_transition_is_sane(() is that %% the author of P2 in this case must be the tail of P1's UPI: i.e., %% it's the tail's responsibility to perform repair, therefore the tail %% must damn well be the author of any transition that says a repair %% finished successfully. %% %% The problem is that author_server of the inner projection does not %% reflect the actual author! See the comment with the text %% "The inner projection will have a fake author" in %react_to_env_A30(). %% %% So, there's a special return value that tells us to try to check for %% the correct authorship here.	2015-07-05 14:52:50 +09:00
Scott Lystig Fritchie	920c0fc610	WIP: much better structure for inner projection sanity checking	2015-07-04 16:46:02 +09:00
Scott Lystig Fritchie	8241d1f600	WIP: cruft, needs refactoring	2015-07-04 14:57:38 +09:00
Scott Lystig Fritchie	65ee0c23ec	Adjust author of inner projections to yield same checksum	2015-07-04 01:58:00 +09:00
Scott Lystig Fritchie	cd026303a0	Unused var cleanup	2015-07-04 00:35:05 +09:00
Scott Lystig Fritchie	9b0a5a1dc3	WIP: 1st part of moving old chain state transtion code to new Ha, famous last words, amirite? %% The chain sequence/order checks at the bottom of this function aren't %% as easy-to-read as they ought to be. However, I'm moderately confident %% that it isn't buggy. TODO: refactor them for clarity. So, now machi_chain_manager1:projection_transition_is_sane() is using newer, far less buggy code to make sanity decisions. TODO: Add support for Retrospective mode. TODO is it really needed? Examples of how the old code sucks and the new code sucks less. 138> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))). xxxxxxxxxxxx..x.xxxxxx..x.x....x..xx........................................................Failed! After 69 tests. [a,b,c] {c,[a,b,c],[c,b],b,[b,a],[b,a,c]} Old_res ([335,192,166,160,153,139]): true New_res: false (why line [1936]) Shrinking xxxxxxxxxxxx.xxxxxxx.xxx.xxxxxxxxxxxxxxxxx(3 times) [a,b,c] %% {Author1,UPI1, Repair1,Author2,UPI2, Repair2} %% {c, [a,b,c],[], a, [b,a],[]} Old_res ([338,185,160,153,147]): true New_res: false (why line [1936]) false Old code is wrong: we've swapped order of a & b, which is bad. 139> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))). xxxxxxxxxx..x...xx..........xxx..x..............x......x............................................(x10)...(x1)........Failed! After 120 tests. [b,c,a] {c,[c,a],[c],a,[a,b],[b,a]} Old_res ([335,192,185,160,153,123]): true New_res: false (why line [1936]) Shrinking xx.xxxxxx.x.xxxxxxxx.xxxxxxxxxxx(4 times) [b,a,c] %% {Author1,UPI1,Repair1,Author2,UPI2, Repair2} %% {a, [c], [], c, [c,b],[]} Old_res ([338,185,160,153,147]): true New_res: false (why line [1936]) false Old code is wrong: b wasn't repairing in the previous state. 150> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))). xxxxxxxxxxx....x...xxxxx..xx.....x.......xxx..x.......xxx...................x................x......(x10).....(x1)........xFailed! After 130 tests. [c,a,b] {b,[c],[b,a,c],c,[c,a,b],[b]} Old_res ([335,214,185,160,153,147]): true New_res: false (why line [1936]) Shrinking xxxx.x.xxx.xxxxxxx.xxxxxxxxx(4 times) [c,b,a] %% {Author1,UPI1,Repair1,Author2,UPI2, Repair2} %% {c, [c], [a,b], c, [c,b,a],[]} Old_res ([335,328,185,160,153,111]): true New_res: false (why line [1981,1679]) false Old code is wrong: a & b were repairing but UPI2 has a & b in the wrong order.	2015-07-04 00:32:28 +09:00
Scott Lystig Fritchie	42fb6dd002	WIP: it's clear that the legacy state transition check is broken, II	2015-07-03 23:37:36 +09:00
Scott Lystig Fritchie	caeb322725	WIP: it's clear that the legacy state transition check is broken	2015-07-03 23:17:34 +09:00
Scott Lystig Fritchie	83015c319d	WIP: yeah, now we're going places	2015-07-03 22:05:35 +09:00
Scott Lystig Fritchie	6a706cbfeb	WIP: Refactoring and prototyping goop, broken test	2015-07-03 19:21:41 +09:00
Scott Lystig Fritchie	9cf77f4406	WIP: Refactoring and prototyping goop, broken test	2015-07-03 00:59:04 +09:00
Scott Lystig Fritchie	8820a71152	Clean up comment cruft & line wrap yak shaving	2015-07-02 14:44:47 +09:00
Scott Lystig Fritchie	039fd5fb78	Merge branch 'slf/pb-api-experiment3'	2015-07-01 18:33:33 +09:00
Scott Lystig Fritchie	da3a56dd74	Fix epoch checking in eunit tests and enforcement by FLU (always permit list_files())	2015-07-01 18:12:22 +09:00
Scott Lystig Fritchie	38c1a2ab5d	Fix Epoch handling in machi_flu_psup_test.erl	2015-07-01 17:46:35 +09:00
Scott Lystig Fritchie	576d3d76a2	Extend machi_chain_manager1_pulse fudge time factor	2015-07-01 17:46:10 +09:00
Scott Lystig Fritchie	2c869ed598	TODO fix: wedge self	2015-07-01 17:19:11 +09:00
Scott Lystig Fritchie	1e14fe878f	Ha, oops! Add bad_epoch code, derp 1	2015-07-01 15:51:25 +09:00
Scott Lystig Fritchie	a658a64482	Cosmetic formatting change	2015-07-01 15:37:53 +09:00
Scott Lystig Fritchie	a0061d6ffa	make decode_csum_file_entry() very slightly less brittle	2015-07-01 15:18:57 +09:00
Scott Lystig Fritchie	d710d90ea7	Fix usage of checksum_list by machi_chain_repair.erl	2015-07-01 15:04:22 +09:00
Scott Lystig Fritchie	0321e05b46	Fix usage of checksum_list by machi_basho_bench_driver.erl	2015-07-01 15:03:56 +09:00
Scott Lystig Fritchie	f5ae417b9e	Clarify verify_file_checksums_test_	2015-07-01 14:16:31 +09:00
Scott Lystig Fritchie	670bd2cafc	Add some flexibility to machi_chain_manager1_converge_demo:t/1 and t/2	2015-07-01 14:08:17 +09:00
Scott Lystig Fritchie	e3b80c6ac2	Docuemntation updates	2015-06-30 19:04:23 +09:00
Scott Lystig Fritchie	00c8cf0ef7	Rename temporary HTTP server hack functions	2015-06-30 16:19:44 +09:00
Scott Lystig Fritchie	7542fe8225	WIP: all eunit tests are passing again, yay	2015-06-30 16:12:23 +09:00
Scott Lystig Fritchie	e9d50a2128	WIP: Reinstate one eunit test, fix type bugs	2015-06-30 15:51:03 +09:00
Scott Lystig Fritchie	3d2b49b7e5	WIP: refactoring & edoc'ing	2015-06-30 15:20:35 +09:00
Scott Lystig Fritchie	310fdb1f6a	Add crude file size check to do_server_checksum_listing()	2015-06-30 14:13:26 +09:00
Scott Lystig Fritchie	2d070bf1e3	Minor refactoring + add demo/exploratory time measurement code %% Demo/exploratory hackery to check relative speeds of dealing with %% checksum data in different ways. %% %% Summary: %% %% * Use compact binary encoding, with 1 byte header for entry length. %% * Because the hex-style code is far slower just for enc & dec ops. %% * For 1M entries of enc+dec: 0.215 sec vs. 15.5 sec. %% * File sorter when sorting binaries as-is is only 30-40% slower %% than an in-memory split (of huge binary emulated by file:read_file() %% "big slurp") and sort of the same as-is sortable binaries. %% * File sorter slows by a factor of about 2.5 if {order, fun compare/2} %% function must be used, i.e. because the checksum entry lengths differ. %% * File sorter + {order, fun compare/2} is still far faster than external %% sort by OS X's sort(1) of sortable ASCII hex-style: %% 4.5 sec vs. 21 sec. %% * File sorter {order, fun compare/2} is faster than in-memory sort %% of order-friendly 3-tuple-style: 4.5 sec vs. 15 sec.	2015-06-30 14:08:46 +09:00
Scott Lystig Fritchie	2a4ae1ba52	Merge branch 'slf/pb-api-experiment2'	2015-06-29 17:31:52 +09:00
Scott Lystig Fritchie	34b046acbd	Remove machi_pb_wrap.erl	2015-06-29 17:31:07 +09:00
Scott Lystig Fritchie	55db22efff	Merge branch 'slf/pb-api-experiment2'	2015-06-29 17:20:35 +09:00
Scott Lystig Fritchie	dba7041929	Change names to indicate we're no longer in PB land	2015-06-29 17:20:17 +09:00
Scott Lystig Fritchie	151e696324	WIP: yank out more unused cruft	2015-06-29 17:14:33 +09:00
Scott Lystig Fritchie	87ec988353	WIP: yank out more unused cruft	2015-06-29 17:06:28 +09:00

1 2 3 4 5 ...

568 commits