machi

greg/machi

Author	SHA1	Message	Date
Scott Lystig Fritchie	0d7f6c8d7e	WIP: chain transitions are now fully (?) aware of witness servers	2015-08-06 17:48:31 +09:00
Scott Lystig Fritchie	e9c4e2f98d	WIP: rearrange CP mode projection calc	2015-08-06 15:22:04 +09:00
Scott Lystig Fritchie	82b6726261	Revert UPI [] -> [FirstRepairing] to commit `91496c6`	2015-08-06 15:21:44 +09:00
Scott Lystig Fritchie	b21803a6c6	Fix witness calculation projections, part II	2015-08-05 16:05:03 +09:00
Scott Lystig Fritchie	f43a5ca96d	Fix witness calculation projections, part I	2015-08-05 15:50:32 +09:00
Scott Lystig Fritchie	91496c656b	Oops, fix PB stuff to add witnesses	2015-08-05 12:53:20 +09:00
Scott Lystig Fritchie	3f51357577	WIP: pre-travel code, not sure if good, check in for history	2015-07-30 13:12:08 -07:00
Scott Lystig Fritchie	6e521700bd	WIP: Adding witness_smoke_test_ but it's broken (more) So, the problem is that the chain manager isn't finishing repair because UPI=[a], and a is a witness, and a can't do the list files etc etc repair stuff that repairer FLUs need to do. The best (?) way forward is to add some advance smarts to the chain manager so that it doesn't propose a UPI of 100% witnesses?	2015-07-21 19:05:04 +09:00
Scott Lystig Fritchie	88d3228a4c	Fix various problems with repair not being aware of inner projections	2015-07-20 16:25:42 +09:00
Scott Lystig Fritchie	9ae4afa58e	Reduce chmgr verbosity a bit	2015-07-20 14:58:21 +09:00
Scott Lystig Fritchie	e14493373b	Bugfix: add missing reset of not_sanes dictionary, fix comments	2015-07-20 14:04:25 +09:00
Scott Lystig Fritchie	f7ef8c54f5	Reduce # of assumptions made by ch_mgr + simulator for 'repair_airquote_done'	2015-07-19 13:32:55 +09:00
Scott Lystig Fritchie	b8c642aaa7	WIP: bugfix for rare flapping infinite loop (done^2 fix I hope) How can even computer? So, there's a flavor of the flapping infinite loop problem that can happen without flapping being detected (by the existing flapping detector, that is). That detector relies on a series of accepted projections to converge to a single projection repeated X times. However, it's possible to have a race with a simulated repair "finishing" that causes a problem so that no more projections are ever accepted. Oops. See also: new comments in do_react_to_env().	2015-07-19 00:43:10 +09:00
Scott Lystig Fritchie	87867f8f2e	WIP: bugfix for rare flapping infinite loop (done fix I hope) {sigh} This is a correction to a think-o error in the "WIP: bugfix for rare flapping infinite loop (better fix I hope)" bugfix that I thought I had finished in the slf/chain-manager/cp-mode branch. Silly me, the test for myself as the author of the not_sane transition was wrong: we don't do that kind of insanity, other nodes might, though. ^_^	2015-07-18 17:53:17 +09:00
Scott Lystig Fritchie	19ce841471	Merge slf/chain-manager/cp-mode (fix conflicts)	2015-07-17 16:39:37 +09:00
Scott Lystig Fritchie	b295c7f374	Log more info on private projection write failure	2015-07-17 16:20:54 +09:00
Scott Lystig Fritchie	f4d16881c0	WIP: bugfix for rare flapping infinite loop (better fix I hope) %% So, I'd tried this kind of "if everyone is doing it, then we %% 'agree' and we can do something different" strategy before, %% and it didn't work then. Silly me. Distributed systems %% lesson #823: do not forget the past. In a situation created %% by PULSE, of all=[a,b,c,d,e], b & d & e were scheduled %% completely unfairly. So a & c were the only authors ever to %% suceessfully write a suggested projection to a public store. %% Oops. %% %% So, we're going to keep track in #ch_mgr state for the number %% of times that this insane judgement has happened.	2015-07-17 14:51:39 +09:00
Scott Lystig Fritchie	0a8821a1c6	WIP: bugfix for rare flapping infinite loop (fixed I hope) I'll run a set of PULSE tests (Cmd_e of the 'regression' style) to try to confirm a fix for this pernicious little thing. Final (?) part of the fix: add myself to SeenFlappers in react_to_env_A30().	2015-07-16 23:23:30 +09:00
Scott Lystig Fritchie	d331e09923	Hrm, fewer deadlocks, but sometimes unreliable shutdown	2015-07-16 17:59:02 +09:00
Scott Lystig Fritchie	f2fc5b91c2	Add more PULSE instrumentation -> more deadlocks	2015-07-16 16:25:38 +09:00
Scott Lystig Fritchie	73ac220d75	Add machi_verbose.hrl	2015-07-16 16:01:53 +09:00
Scott Lystig Fritchie	0ead97093b	WIP: bugfix for rare flapping infinite loop (unfinished) part ...	2015-07-16 00:18:42 +09:00
Scott Lystig Fritchie	18c92c98f8	WIP: bugfix for rare flapping infinite loop (unfinished) part IV	2015-07-15 18:42:59 +09:00
Scott Lystig Fritchie	402720d301	WIP: bugfix for rare flapping infinite loop (unfinished) part II	2015-07-15 17:23:17 +09:00
Scott Lystig Fritchie	6f9a603e99	WIP: bugfix for rare flapping infinite loop (unfinished)	2015-07-15 12:44:56 +09:00
Scott Lystig Fritchie	0f667c4356	WIP: add more debugging/react info	2015-07-15 11:25:06 +09:00
Scott Lystig Fritchie	7c970d90a6	Bugfix: use correct updated #state in react_to_env_A30() {sigh}	2015-07-15 00:44:07 +09:00
Scott Lystig Fritchie	5eb6ebc874	Bugfix: add missing remember_partition_hack() calls in perhaps_call path	2015-07-14 17:17:14 +09:00
Scott Lystig Fritchie	fd66fe46b5	Move react logging in react_to_env_A30()	2015-07-14 17:16:23 +09:00
Scott Lystig Fritchie	0089af0a86	Bugfix: moving inner -> outer projection, use calc_projection() for sanity	2015-07-10 21:11:34 +09:00
Scott Lystig Fritchie	f746b75254	Bugfix: A30: if Kicker_p only true if we actually have an inner proj!	2015-07-10 20:25:44 +09:00
Scott Lystig Fritchie	e9e4c54b25	Bugfix: undo the jump directly from A30 -> C100.	2015-07-10 20:24:44 +09:00
Scott Lystig Fritchie	ed7dcd14db	Avoid putting inner_summary in dbg proplist	2015-07-10 17:47:33 +09:00
Scott Lystig Fritchie	4d41c59e19	Bugfix: machi_projection:new/6 derp: argument order mistake	2015-07-10 16:41:28 +09:00
Scott Lystig Fritchie	cf9ae5b555	WIP: correct calc of All_UPI_Repairing_were_unanimous, but now infinite loop in long chains??	2015-07-10 15:30:31 +09:00
Scott Lystig Fritchie	2060b80830	Keep good refactorings from commit a8390ee2 Also, add more misc details to the 'react' breadcrumb trail. Also, save get(react) results into dbg2 whenever we write a private projection, very valuable for debugging. Also: cleanup PULSE code, add regression commands as option and controls with some new environment variables. These regression sequences were responsbile for several fruitful debugging sessions, so we keep them for posterity and for their ability (with new seeds and PULSE) to find new interleavings.	2015-07-10 15:04:50 +09:00
Scott Lystig Fritchie	badcfa3064	Remove comment cruft	2015-07-07 14:32:02 +09:00
Scott Lystig Fritchie	0f3d11e1bf	Bugfix (part II) rare race between just-finished repair and flapping ending The prior commit wasn't sufficient: the range of transitions is wider than assumed by that commit. So, we take one of two options, with a TODO task of researching the other option.	2015-07-07 14:30:21 +09:00
Scott Lystig Fritchie	96ca7b7082	Bugfix for rare race between just-finished repair and flapping ending Fix for today: We are going to game the system. We know that C100 is going to be checking authorship relative to P_current's UPI's tail. Therefore, we're just going to set it here. Why??? Because we have been using this projection safely for the entire flapping period! ... The only other way I see is to allow C100 to carve out an exception if the repair finished PLUS author_server check fails PLUS if we came from here, but that feels a bit fragile to me: if some code factoring happens in projection_transition_is_saneprojection_transition_is_sane() or elsewhere that causes the author_server check to be something-other-than-the-final-thing-checked, then such a refactoring would likely cause an even harder bug to find & fix. Conditions tested: 5 FLUs plus alternating partitions of: [ [{a,b}], [], [{a,b}], [], [{a,b}], [], [{a,b}], [], [{a,b}], [], [{b,a},{d,e}], [{a,b}], [], [{a,b}], [], [{a,b}], [], [{a,b}], [], [{a,b}], [] ].	2015-07-07 01:29:37 +09:00
Scott Lystig Fritchie	54b5014446	WIP: bugfix in transition, just-in-case commit	2015-07-06 23:56:29 +09:00
Scott Lystig Fritchie	9d4b4b1df6	Bugfix: update inner projection based on previous inner projection	2015-07-06 17:38:15 +09:00
Scott Lystig Fritchie	3f8982cbe1	MAJOR WIP: set author's rank to constant 0? Worthwhile??	2015-07-06 16:12:15 +09:00
Scott Lystig Fritchie	471cde1f2c	WIP: debugging fmt shuffle	2015-07-06 16:11:14 +09:00
Scott Lystig Fritchie	8ee3377fa7	Fix a state transition bug (chain manager infinite loop, oops) %% We have a small problem for state transition sanity checking in the %% case where we are flapping and a repair has finished. One of the %% sanity checks in simple_chain_state_transition_is_sane(() is that %% the author of P2 in this case must be the tail of P1's UPI: i.e., %% it's the tail's responsibility to perform repair, therefore the tail %% must damn well be the author of any transition that says a repair %% finished successfully. %% %% The problem is that author_server of the inner projection does not %% reflect the actual author! See the comment with the text %% "The inner projection will have a fake author" in %react_to_env_A30(). %% %% So, there's a special return value that tells us to try to check for %% the correct authorship here.	2015-07-05 14:52:50 +09:00
Scott Lystig Fritchie	920c0fc610	WIP: much better structure for inner projection sanity checking	2015-07-04 16:46:02 +09:00
Scott Lystig Fritchie	8241d1f600	WIP: cruft, needs refactoring	2015-07-04 14:57:38 +09:00
Scott Lystig Fritchie	65ee0c23ec	Adjust author of inner projections to yield same checksum	2015-07-04 01:58:00 +09:00
Scott Lystig Fritchie	cd026303a0	Unused var cleanup	2015-07-04 00:35:05 +09:00
Scott Lystig Fritchie	9b0a5a1dc3	WIP: 1st part of moving old chain state transtion code to new Ha, famous last words, amirite? %% The chain sequence/order checks at the bottom of this function aren't %% as easy-to-read as they ought to be. However, I'm moderately confident %% that it isn't buggy. TODO: refactor them for clarity. So, now machi_chain_manager1:projection_transition_is_sane() is using newer, far less buggy code to make sanity decisions. TODO: Add support for Retrospective mode. TODO is it really needed? Examples of how the old code sucks and the new code sucks less. 138> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))). xxxxxxxxxxxx..x.xxxxxx..x.x....x..xx........................................................Failed! After 69 tests. [a,b,c] {c,[a,b,c],[c,b],b,[b,a],[b,a,c]} Old_res ([335,192,166,160,153,139]): true New_res: false (why line [1936]) Shrinking xxxxxxxxxxxx.xxxxxxx.xxx.xxxxxxxxxxxxxxxxx(3 times) [a,b,c] %% {Author1,UPI1, Repair1,Author2,UPI2, Repair2} %% {c, [a,b,c],[], a, [b,a],[]} Old_res ([338,185,160,153,147]): true New_res: false (why line [1936]) false Old code is wrong: we've swapped order of a & b, which is bad. 139> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))). xxxxxxxxxx..x...xx..........xxx..x..............x......x............................................(x10)...(x1)........Failed! After 120 tests. [b,c,a] {c,[c,a],[c],a,[a,b],[b,a]} Old_res ([335,192,185,160,153,123]): true New_res: false (why line [1936]) Shrinking xx.xxxxxx.x.xxxxxxxx.xxxxxxxxxxx(4 times) [b,a,c] %% {Author1,UPI1,Repair1,Author2,UPI2, Repair2} %% {a, [c], [], c, [c,b],[]} Old_res ([338,185,160,153,147]): true New_res: false (why line [1936]) false Old code is wrong: b wasn't repairing in the previous state. 150> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))). xxxxxxxxxxx....x...xxxxx..xx.....x.......xxx..x.......xxx...................x................x......(x10).....(x1)........xFailed! After 130 tests. [c,a,b] {b,[c],[b,a,c],c,[c,a,b],[b]} Old_res ([335,214,185,160,153,147]): true New_res: false (why line [1936]) Shrinking xxxx.x.xxx.xxxxxxx.xxxxxxxxx(4 times) [c,b,a] %% {Author1,UPI1,Repair1,Author2,UPI2, Repair2} %% {c, [c], [a,b], c, [c,b,a],[]} Old_res ([335,328,185,160,153,111]): true New_res: false (why line [1981,1679]) false Old code is wrong: a & b were repairing but UPI2 has a & b in the wrong order.	2015-07-04 00:32:28 +09:00
Scott Lystig Fritchie	42fb6dd002	WIP: it's clear that the legacy state transition check is broken, II	2015-07-03 23:37:36 +09:00

1 2 3

112 commits