machi

greg/machi

Author	SHA1	Message	Date
Scott Lystig Fritchie	2e2f5f44c4	Another tweak to private_projections_are_stable()	2015-09-01 00:51:12 +09:00
Scott Lystig Fritchie	823b47bef3	Bugfix: convergence property for CP mode, again	2015-08-30 19:52:31 +09:00
Scott Lystig Fritchie	764708f3ef	Fix private_projections_are_stable() for long CP mode chains	2015-08-30 00:03:51 +09:00
Scott Lystig Fritchie	94394d3429	Bugfix: allow none proj to re-emerge from flapping (more) See comments added in this commit at A40. So far, I've been doing CP mode testing with a handful of (very useful) network partition combinations using: machi_chain_manager1_converge_demo:t(3, [{private_write_verbose,true}, {consistency_mode, cp_mode}, {witnesses, [a]}]). Next steps: * Expand number & types of partitions * Expand to chain lengths of 5 and beyond	2015-08-29 21:36:53 +09:00
Scott Lystig Fritchie	ee19a0856b	WIP: justincase	2015-08-29 19:59:46 +09:00
Scott Lystig Fritchie	dc5ae4047a	Bugfix: react_to_env_A30 inner->norm fix, make_zerf() none proj derp fix	2015-08-29 18:01:13 +09:00
Scott Lystig Fritchie	85eb3567a3	Bugfix: convergence property for CP mode	2015-08-29 15:57:23 +09:00
Scott Lystig Fritchie	403cb5b7a6	WIP: improvements, but now flapping inner epoch keeps increasing {sigh}	2015-08-28 21:13:54 +09:00
Scott Lystig Fritchie	3dfe5c2677	WIP: fix annotation history on disk	2015-08-28 18:37:11 +09:00
Scott Lystig Fritchie	12b74a52fd	WIP: pre-dinner paranoid checkin	2015-08-27 18:45:27 +09:00
Scott Lystig Fritchie	28335a1310	Add CP mode unwedge. All eunit tests are passing again.	2015-08-26 18:47:39 +09:00
Scott Lystig Fritchie	e8f3ab381d	Add set_consistency_mode() to projection store API, use it	2015-08-26 14:57:51 +09:00
Scott Lystig Fritchie	833463f20d	Merge branch 'master' into slf/chain-manager/cp-mode4	2015-08-26 14:39:42 +09:00
Scott Lystig Fritchie	27656eafaa	Fix (via sleep, egadz) race condition in machi_flu_psup_test	2015-08-26 14:38:56 +09:00
Scott Lystig Fritchie	c12231c7b6	Fix other tests to accomodate new semantics	2015-08-25 19:45:31 +09:00
Scott Lystig Fritchie	c0ee323637	Our new unit test works, yay	2015-08-25 19:42:33 +09:00
Scott Lystig Fritchie	83f49472db	WIP: intermediate refactoring	2015-08-25 19:31:05 +09:00
Scott Lystig Fritchie	0a4c0f963e	Add failing test case for annotating private projections via dbg2 list	2015-08-25 19:12:23 +09:00
Scott Lystig Fritchie	1c5a17b708	WIP: adjust throttle of flapping 'shut up'	2015-08-25 17:01:14 +09:00
Scott Lystig Fritchie	9a86453753	WIP: half-baked idea, stopping for the night (more) So, I'm 50% sure this is a good idea for CP mode: if there's a later public projection than P_current, then who knows what we might have missed. So, call make_zerf() to find out the absolute latest. Problem: flapping state appears to be lost, booo.	2015-08-24 21:54:30 +09:00
Scott Lystig Fritchie	2f82fe0487	WIP: cp_mode improvements	2015-08-24 19:04:26 +09:00
Scott Lystig Fritchie	f6e81e6cd0	Add damper check for flapping of inner projections, whee!	2015-08-23 20:01:44 +09:00
Scott Lystig Fritchie	2b2facaba2	Add more FLU choices to converge demo	2015-08-22 14:56:26 +09:00
Scott Lystig Fritchie	14fad2d704	End-to-end chain state checking is still broken (more) If we use verbose output from: machi_chain_manager1_converge_demo:t(3, [{private_write_verbose,true}, {consistency_mode, cp_mode}, {witnesses, [a]}]). And use: tail -f typescript_file \| egrep --line-buffered 'SET\|attempted\|CONFIRM' ... then we can clearly see a chain safety violation when moving from epoch 81 -> 83. I need to add more smarts to the safety checking, both at the individual transition sanity check and at the converge_demo overall rolling sanity check. Key to output: CONFIRM by epoch {num} {csum} at {UPI} {Repairing} SET # of FLUs = 3 members [a,b,c]). CONFIRM by epoch 1 <<96,161,96,...>> at [a,b] [c] CONFIRM by epoch 5 <<134,243,175,...>> at [b,c] [] CONFIRM by epoch 7 <<207,93,225,...>> at [b,c] [] CONFIRM by epoch 47 <<60,142,248,...>> at [b,c] [] SET partitions = [{c,b},{c,a}] (1 of 2) at {22,3,34} CONFIRM by epoch 81 <<223,58,184,...>> at [a,b] [] SET partitions = [{b,c},{b,a}] (2 of 2) at {22,3,38} CONFIRM by epoch 83 <<33,208,224,...>> at [a,c] [] SET partitions = [] CONFIRM by epoch 85 <<173,179,149,...>> at [a,c] [b]	2015-08-13 22:16:28 +09:00
Scott Lystig Fritchie	e956c0b534	Fix (yet again) converge demo stable criteria	2015-08-13 21:26:07 +09:00
Scott Lystig Fritchie	eecf5479ed	Tweak stability criteria for converge demo	2015-08-13 16:18:33 +09:00
Scott Lystig Fritchie	30a5652299	WIP: refining stable success for machi_chain_manager1_converge_demo, even better	2015-08-07 15:06:23 +09:00
Scott Lystig Fritchie	c8ddce103e	WIP: refining stable success for machi_chain_manager1_converge_demo	2015-08-07 12:28:51 +09:00
Scott Lystig Fritchie	3ca0f4491d	WIP: always start chain manager with none projection	2015-08-06 19:24:14 +09:00
Scott Lystig Fritchie	0d7f6c8d7e	WIP: chain transitions are now fully (?) aware of witness servers	2015-08-06 17:48:31 +09:00
Scott Lystig Fritchie	e9c4e2f98d	WIP: rearrange CP mode projection calc	2015-08-06 15:22:04 +09:00
Scott Lystig Fritchie	dcf532bafd	WIP: Witness test expansion	2015-08-05 18:23:44 +09:00
Scott Lystig Fritchie	e3d9ba2b83	WIP: Witness test expansion	2015-08-05 17:17:25 +09:00
Scott Lystig Fritchie	6e521700bd	WIP: Adding witness_smoke_test_ but it's broken (more) So, the problem is that the chain manager isn't finishing repair because UPI=[a], and a is a witness, and a can't do the list files etc etc repair stuff that repairer FLUs need to do. The best (?) way forward is to add some advance smarts to the chain manager so that it doesn't propose a UPI of 100% witnesses?	2015-07-21 19:05:04 +09:00
Scott Lystig Fritchie	432190435e	Add witness_mode to FLU	2015-07-21 17:29:33 +09:00
Scott Lystig Fritchie	52dc40e1fe	converge demo: converged iff all private projs are stable and all inner/outer	2015-07-21 14:19:08 +09:00
Scott Lystig Fritchie	319397ecd2	machi_chain_manager1_pulse.erl tweaks	2015-07-20 15:08:03 +09:00
Scott Lystig Fritchie	57b7122035	Fix bug found by PULSE that's not directly chain manager-related (more) PULSE managed to create a situation where machi_proxy_flu_client1 would appear to fail a remote attempt to write_projection. The client would retry, but the 1st attempt really did get through to the server. So, if we hit this case, we try to read the projection, and if it's exactly equal to what we tried to write, we consider the op a success. Ditto for write_chunk. Fix up eunit test to accomodate the change of semantics.	2015-07-18 23:22:14 +09:00
Scott Lystig Fritchie	c5052c4f11	More verbose dump_state() in PULSE test	2015-07-17 20:32:36 +09:00
Scott Lystig Fritchie	7a28d9ac73	Fix partial_stop_restart2() (more) Due to changes by slf/chain-manager/cp-mode branch, there are no longer extraneous epoch changes by "larger" authors that re-suggest the same UPI+Repairing just because their author rank is very slightly higher than the current epoch. Thus the partial_stop_restart2() test only needs to deal with one epoch change instead of the original two.	2015-07-17 17:47:19 +09:00
Scott Lystig Fritchie	4e1e6e3e83	Derp, delete mistakenly-added patch goop	2015-07-17 17:47:19 +09:00
Scott Lystig Fritchie	19ce841471	Merge slf/chain-manager/cp-mode (fix conflicts)	2015-07-17 16:39:37 +09:00
Scott Lystig Fritchie	41a29a6f17	Add Seed to verbose PULSE output	2015-07-17 14:55:42 +09:00
Scott Lystig Fritchie	50b2a28ca4	Fix derp mistakes in noshrink env handling for PULSE test	2015-07-17 14:45:40 +09:00
Scott Lystig Fritchie	b4d9ac5fe0	Hooray, PULSE things look stable; remove debugging verbose cruft	2015-07-16 21:57:34 +09:00
Scott Lystig Fritchie	c10200138c	Hooray??! Fix the damn PULSE hangs by using infinity supervisor shutdown times	2015-07-16 21:17:46 +09:00
Scott Lystig Fritchie	dbbb6e8b14	Try to pinpoint a hang with even more verbosity (more) Run via: env PULSE_NOSHRINK=yes PULSE_SKIP_NEW=yes PULSE_TIME=900 make pulse So, this one hangs here: tick-<0.991.0>,dump_state(){prop,machi_chain_manager1_pulse,358,<0.891.0>} At machi_chain_manager1_pulse.erl line 358, that's after the return of run_commands(). The next verbose message should come from line 362, after the return of pulse:run(), but that message never appears. My laptop CPU is really busy (fans running, case is hot), but both console & disterl aren't available now, so no idea why, alas. Ah, when I run with a console available and then run Redbug, there is zero activity calling both machi_chain_manager1_pulse:'_' and machi_chain_manager1:'_' This may be related to a bad/ugly shutdown? In both hang cases, I see at least one SASL error message such as the one below ... BUT! There should be erlang:display() messages from the shutdown_hard() function, which does some exit(Pid, kill) calls, but there is no output from them! So, the killing is coming from some kind of PULSE-initiated process shutdown/cleanup/?? =SUPERVISOR REPORT==== 16-Jul-2015::20:24:31 === Supervisor: {local,machi_sup} Context: shutdown_error Reason: killed Offender: [{pid,<0.200.0>}, {name,machi_flu_sup}, {mfargs,{machi_flu_sup,start_link,[]}}, {restart_type,permanent}, {shutdown,5000}, {child_type,supervisor}]	2015-07-16 20:40:51 +09:00
Scott Lystig Fritchie	3a4624ab06	Hrm, fewer deadlocks, but lots of !@#$! mystery hangs @ startup & teardown	2015-07-16 20:13:48 +09:00
Scott Lystig Fritchie	d331e09923	Hrm, fewer deadlocks, but sometimes unreliable shutdown	2015-07-16 17:59:02 +09:00
Scott Lystig Fritchie	f2fc5b91c2	Add more PULSE instrumentation -> more deadlocks	2015-07-16 16:25:38 +09:00

1 2 3 4

185 commits