machi

greg/machi

Author	SHA1	Message	Date
Scott Lystig Fritchie	badcfa3064	Remove comment cruft	2015-07-07 14:32:02 +09:00
Scott Lystig Fritchie	0f3d11e1bf	Bugfix (part II) rare race between just-finished repair and flapping ending The prior commit wasn't sufficient: the range of transitions is wider than assumed by that commit. So, we take one of two options, with a TODO task of researching the other option.	2015-07-07 14:30:21 +09:00
Scott Lystig Fritchie	96ca7b7082	Bugfix for rare race between just-finished repair and flapping ending Fix for today: We are going to game the system. We know that C100 is going to be checking authorship relative to P_current's UPI's tail. Therefore, we're just going to set it here. Why??? Because we have been using this projection safely for the entire flapping period! ... The only other way I see is to allow C100 to carve out an exception if the repair finished PLUS author_server check fails PLUS if we came from here, but that feels a bit fragile to me: if some code factoring happens in projection_transition_is_saneprojection_transition_is_sane() or elsewhere that causes the author_server check to be something-other-than-the-final-thing-checked, then such a refactoring would likely cause an even harder bug to find & fix. Conditions tested: 5 FLUs plus alternating partitions of: [ [{a,b}], [], [{a,b}], [], [{a,b}], [], [{a,b}], [], [{a,b}], [], [{b,a},{d,e}], [{a,b}], [], [{a,b}], [], [{a,b}], [], [{a,b}], [], [{a,b}], [] ].	2015-07-07 01:29:37 +09:00
Scott Lystig Fritchie	54b5014446	WIP: bugfix in transition, just-in-case commit	2015-07-06 23:56:29 +09:00
Scott Lystig Fritchie	9d4b4b1df6	Bugfix: update inner projection based on previous inner projection	2015-07-06 17:38:15 +09:00
Scott Lystig Fritchie	3f8982cbe1	MAJOR WIP: set author's rank to constant 0? Worthwhile??	2015-07-06 16:12:15 +09:00
Scott Lystig Fritchie	471cde1f2c	WIP: debugging fmt shuffle	2015-07-06 16:11:14 +09:00
Scott Lystig Fritchie	8ee3377fa7	Fix a state transition bug (chain manager infinite loop, oops) %% We have a small problem for state transition sanity checking in the %% case where we are flapping and a repair has finished. One of the %% sanity checks in simple_chain_state_transition_is_sane(() is that %% the author of P2 in this case must be the tail of P1's UPI: i.e., %% it's the tail's responsibility to perform repair, therefore the tail %% must damn well be the author of any transition that says a repair %% finished successfully. %% %% The problem is that author_server of the inner projection does not %% reflect the actual author! See the comment with the text %% "The inner projection will have a fake author" in %react_to_env_A30(). %% %% So, there's a special return value that tells us to try to check for %% the correct authorship here.	2015-07-05 14:52:50 +09:00
Scott Lystig Fritchie	920c0fc610	WIP: much better structure for inner projection sanity checking	2015-07-04 16:46:02 +09:00
Scott Lystig Fritchie	8241d1f600	WIP: cruft, needs refactoring	2015-07-04 14:57:38 +09:00
Scott Lystig Fritchie	65ee0c23ec	Adjust author of inner projections to yield same checksum	2015-07-04 01:58:00 +09:00
Scott Lystig Fritchie	cd026303a0	Unused var cleanup	2015-07-04 00:35:05 +09:00
Scott Lystig Fritchie	9b0a5a1dc3	WIP: 1st part of moving old chain state transtion code to new Ha, famous last words, amirite? %% The chain sequence/order checks at the bottom of this function aren't %% as easy-to-read as they ought to be. However, I'm moderately confident %% that it isn't buggy. TODO: refactor them for clarity. So, now machi_chain_manager1:projection_transition_is_sane() is using newer, far less buggy code to make sanity decisions. TODO: Add support for Retrospective mode. TODO is it really needed? Examples of how the old code sucks and the new code sucks less. 138> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))). xxxxxxxxxxxx..x.xxxxxx..x.x....x..xx........................................................Failed! After 69 tests. [a,b,c] {c,[a,b,c],[c,b],b,[b,a],[b,a,c]} Old_res ([335,192,166,160,153,139]): true New_res: false (why line [1936]) Shrinking xxxxxxxxxxxx.xxxxxxx.xxx.xxxxxxxxxxxxxxxxx(3 times) [a,b,c] %% {Author1,UPI1, Repair1,Author2,UPI2, Repair2} %% {c, [a,b,c],[], a, [b,a],[]} Old_res ([338,185,160,153,147]): true New_res: false (why line [1936]) false Old code is wrong: we've swapped order of a & b, which is bad. 139> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))). xxxxxxxxxx..x...xx..........xxx..x..............x......x............................................(x10)...(x1)........Failed! After 120 tests. [b,c,a] {c,[c,a],[c],a,[a,b],[b,a]} Old_res ([335,192,185,160,153,123]): true New_res: false (why line [1936]) Shrinking xx.xxxxxx.x.xxxxxxxx.xxxxxxxxxxx(4 times) [b,a,c] %% {Author1,UPI1,Repair1,Author2,UPI2, Repair2} %% {a, [c], [], c, [c,b],[]} Old_res ([338,185,160,153,147]): true New_res: false (why line [1936]) false Old code is wrong: b wasn't repairing in the previous state. 150> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))). xxxxxxxxxxx....x...xxxxx..xx.....x.......xxx..x.......xxx...................x................x......(x10).....(x1)........xFailed! After 130 tests. [c,a,b] {b,[c],[b,a,c],c,[c,a,b],[b]} Old_res ([335,214,185,160,153,147]): true New_res: false (why line [1936]) Shrinking xxxx.x.xxx.xxxxxxx.xxxxxxxxx(4 times) [c,b,a] %% {Author1,UPI1,Repair1,Author2,UPI2, Repair2} %% {c, [c], [a,b], c, [c,b,a],[]} Old_res ([335,328,185,160,153,111]): true New_res: false (why line [1981,1679]) false Old code is wrong: a & b were repairing but UPI2 has a & b in the wrong order.	2015-07-04 00:32:28 +09:00
Scott Lystig Fritchie	42fb6dd002	WIP: it's clear that the legacy state transition check is broken, II	2015-07-03 23:37:36 +09:00
Scott Lystig Fritchie	caeb322725	WIP: it's clear that the legacy state transition check is broken	2015-07-03 23:17:34 +09:00
Scott Lystig Fritchie	83015c319d	WIP: yeah, now we're going places	2015-07-03 22:05:35 +09:00
Scott Lystig Fritchie	6a706cbfeb	WIP: Refactoring and prototyping goop, broken test	2015-07-03 19:21:41 +09:00
Scott Lystig Fritchie	a658a64482	Cosmetic formatting change	2015-07-01 15:37:53 +09:00
Scott Lystig Fritchie	22337e1819	Remove short circuit (bad idea!) from react_to_env_C100()	2015-06-15 17:22:02 +09:00
Scott Lystig Fritchie	b244a3b8e4	Reduce verbosity, try fix up convergence demo for chain len=4	2015-06-15 12:41:16 +09:00
Scott Lystig Fritchie	9bf76e0bfb	Fix for correctness bug, thanks PULSE	2015-06-05 01:06:39 +09:00
Scott Lystig Fritchie	be62300b3b	Bug fixes: model and real bugs, thanks PULSE and converge_demo both!	2015-06-04 17:39:29 +09:00
Scott Lystig Fritchie	0cf9627f26	Bugfix, found by inspection, yay!	2015-06-04 15:05:37 +09:00
Scott Lystig Fritchie	89b8b6a012	Bugfix, found by PULSE, yay!	2015-06-04 14:31:58 +09:00
Scott Lystig Fritchie	d3df2bd31d	WIP: remove repair_always_done option, it was flawed	2015-06-03 15:26:22 +09:00
Scott Lystig Fritchie	87417d2872	WIP: get the old jalopy into runnable shape	2015-06-03 11:48:55 +09:00
Scott Lystig Fritchie	2207151eba	Fix projection_transition_is_sane() bug	2015-06-02 21:20:50 +09:00
Scott Lystig Fritchie	deabe14d29	Un-proplist-ify the inner projection	2015-06-02 20:55:18 +09:00
Scott Lystig Fritchie	207be8729b	Un-proplist-ify the flapping_i info	2015-06-02 20:32:52 +09:00
Scott Lystig Fritchie	000d687588	Fix creation_time bug in inner projection	2015-06-02 16:26:49 +09:00
Scott Lystig Fritchie	69244691f4	Such wonder when one reads the docs...	2015-05-20 14:12:48 +09:00
Scott Lystig Fritchie	a4266e8aa4	Fix known chain repair bugs, add basic smoke test	2015-05-19 19:32:48 +09:00
Scott Lystig Fritchie	a347722a15	Fix {error,not_written} type bugs in chmgr	2015-05-18 17:32:22 +09:00
Scott Lystig Fritchie	d293170e92	WIP: starting machi_cr_client.erl	2015-05-17 23:48:05 +09:00
Scott Lystig Fritchie	10364834de	Add a dummy client-side implementation module:machi_yessir_client.erl	2015-05-17 19:00:51 +09:00
Scott Lystig Fritchie	5c2635346f	Basic multi-party chain repair for ap_mode finished	2015-05-16 17:39:58 +09:00
Scott Lystig Fritchie	a9c753ad64	WIP: more generic all-way file chunk merge func	2015-05-15 17:15:02 +09:00
Scott Lystig Fritchie	eec029b08f	WIP: aside, fix FLU wedge status @ init()	2015-05-13 17:59:32 +09:00
Scott Lystig Fritchie	4ae0f94649	WIP: move to stats via ETS, success/failure propagates, yay!	2015-05-12 23:45:35 +09:00
Scott Lystig Fritchie	cad84442bb	WIP: stats record, hrm	2015-05-12 22:42:03 +09:00
Scott Lystig Fritchie	8807f954ff	WIP: Whole file repair is 95% complete, yay!	2015-05-12 21:45:40 +09:00
Scott Lystig Fritchie	f48720e4dc	WIP: set up proxies for repair	2015-05-12 12:56:41 +09:00
Scott Lystig Fritchie	1c70a46b09	Add basic process & bookkeeping structure for repair proc =INFO REPORT==== 11-May-2015::19:50:09 === Chain tail a of [a] starting repair of [c] =INFO REPORT==== 11-May-2015::19:50:12 === Chain tail a of [a]: repair finished in 2.438 seconds: todo_yo	2015-05-11 19:50:21 +09:00
Scott Lystig Fritchie	c82000dc30	Reduce spamminess slightly	2015-05-11 19:00:21 +09:00
Scott Lystig Fritchie	33bfbe109e	Chain manager bug fixes & enhancment (more...) * Set max length of a chain at -define(MAX_CHAIN_LENGTH, 64). * Perturb tick sleep time of each manager * If a chain manager L has zero members in its chain, and then its local public projection store (authored by some remote author R) has a projection that contains L, then adopt R's projection and start humming consensus. * Handle "cross-talk" across projection stores, when chain membership is changed administratively, e.g. chain was [a,b,c] then changed to merely [a], but that change only happens on a. Servers b & c continue to use stale projections and scribble their projection suggestions to a, causing it to flap. What's really cool about the flapping handling is that it works. I wasn't thinking about this scenario when designing the flapping logic, but it's really nifty that this extra scenario causes a to flap and then a's inner projection remains stable, yay! * Add complaints when "cross-talk" is observed. * Fix flapping sleep time throttle. * Fix bug in the machi_projection_store.erl's bookkeeping of the max epoch number when flapping.	2015-05-11 18:41:45 +09:00
Scott Lystig Fritchie	7906e6c235	WIP: basic wedge notifications now working	2015-05-08 18:17:41 +09:00
Scott Lystig Fritchie	762aef557f	WIP: Set the stage for FLU wedging API	2015-05-08 15:36:53 +09:00
Scott Lystig Fritchie	ae1d038abe	Change default value of chmgr's use_partition_simulator to false	2015-05-08 13:40:44 +09:00
Scott Lystig Fritchie	238c8472cd	WIP: timeout comments	2015-05-07 18:52:01 +09:00
Scott Lystig Fritchie	14fc37bd0d	Add ability to start FLUs at application startup	2015-05-07 18:39:39 +09:00

1 2

74 commits