machi

greg/machi

Author	SHA1	Message	Date
Mark Allen	85e1e5a26d	Handle {error, bad_arg} on read	2015-10-13 21:08:24 -05:00
UENISHI Kota	e113f6ffdd	Reach the trim stub to CR client	2015-10-13 17:25:59 +09:00
UENISHI Kota	dfe953b7d8	Add surface of trim to scrub	2015-10-13 17:14:44 +09:00
Scott Lystig Fritchie	777909b0f5	TODO MARK todo comment and bugfix for machi_cr_client_test	2015-10-12 15:30:37 +09:00
Mark Allen	289b2bcc7c	Debug WIP	2015-10-11 23:04:29 -05:00
Mark Allen	c1b9038447	The return value of ets is generally 'true'	2015-10-08 15:47:11 -05:00
Mark Allen	aca3759e45	Bug fixes found during testing runs	2015-10-08 15:46:40 -05:00
Mark Allen	1ecbb5cffe	Fixed order of start_link parameters	2015-10-08 15:45:04 -05:00
Mark Allen	303aad97e9	Use {error, bad_checksum} directly We previously copied {error, bad_csum} as it was used in the main FLU code. The protobufs stuff expects the full atom bad_checksum though.	2015-10-08 15:43:54 -05:00
Scott Lystig Fritchie	952d2fa508	Change flag_checksum -> flag_no_checksum for consistency	2015-10-08 20:41:59 +09:00
Mark Allen	679046600f	Merge remote-tracking branch 'origin/bug/from-bp-request-error' into mra/write-once-clean	2015-10-07 23:02:03 -05:00
Scott Lystig Fritchie	796937fe75	Add LL generic error PB response decoding	2015-10-08 12:33:55 +09:00
Scott Lystig Fritchie	0054445f13	Delete spammy message from fitness servers every 5 seconds	2015-10-07 18:52:24 +09:00
Mark Allen	d627f238bf	Cache generated names until disk files are written	2015-10-06 22:44:31 -05:00
Mark Allen	f83b0973f2	Have to call filename mgr with FluName	2015-10-06 22:43:19 -05:00
Mark Allen	7a6999465a	Make sure we use '^' as filename separators	2015-10-06 22:02:31 -05:00
Mark Allen	2d0c03ef35	Integration with current FLU implementation	2015-10-05 22:18:29 -05:00
Mark Allen	36c11e7d08	Add a metadata manager supervisor	2015-10-05 16:37:53 -05:00
Mark Allen	d3fe7ee181	Pull write-once files over to clean branch I am treating the original write-once branch as a prototype which I am now throwing away. I had too much work interleved in there, so I felt like the best thing to do would be to cut a new clean branch and pull the files over and start over against a recent-ish master. We will have to refactor the other things in FLU in a more piecemeal fashion.	2015-10-02 16:29:09 -05:00
Scott Lystig Fritchie	6d5b61f747	Tweaks to sleep_ranked_order() call in C200	2015-09-21 21:47:25 +09:00
Scott Lystig Fritchie	5eecb2b935	Change to P_current_calc epoch @ C100	2015-09-21 21:44:03 +09:00
Scott Lystig Fritchie	340af05f0f	WIP: server-side of CP mode repairing-as-witness	2015-09-21 21:44:03 +09:00
Scott Lystig Fritchie	d9b9397e75	Avoid some projection churn in C100's sanity check	2015-09-21 21:44:03 +09:00
Scott Lystig Fritchie	5010d03677	Call manage_last_down_list() at C220 and C310	2015-09-21 15:36:54 +09:00
Scott Lystig Fritchie	69a304102e	Write public proj in all_members order only	2015-09-21 15:09:16 +09:00
Scott Lystig Fritchie	6b4ed1c061	Verbose debugging cruft	2015-09-19 14:25:07 +09:00
Scott Lystig Fritchie	72bfa163ba	Small test bugfixes & verbose/debugging cruft	2015-09-19 14:16:54 +09:00
Scott Lystig Fritchie	d695f30e4f	Avoid using host/port combo for machi_fitness (ab)use of machi_projection	2015-09-17 16:43:08 +09:00
Scott Lystig Fritchie	09ae2db0ba	Bugfix: double-check local private projection write with a read	2015-09-16 16:31:10 +09:00
Scott Lystig Fritchie	79b1d156c4	Add backlog option to gen_tcp:listen	2015-09-16 13:52:36 +09:00
Scott Lystig Fritchie	778bd015ee	Bugfix: pattern matching error in C110	2015-09-16 12:41:53 +09:00
Scott Lystig Fritchie	d3b116bd9e	Bugfix: CP mode: ignore P_latest if it has UPI or down server in my down list	2015-09-15 17:55:18 +09:00
Scott Lystig Fritchie	75c94420e0	Add test_ets_table to give programmatic slowdown	2015-09-14 22:52:41 +09:00
Scott Lystig Fritchie	7bf1132142	Bugfix: IsRelevantToMe_p adjustment for P_latest.upi == []	2015-09-14 17:28:50 +09:00
Scott Lystig Fritchie	b4f8bc8058	Add pretty_time(). Add CONFIRM verbose logging for none proj	2015-09-14 17:00:09 +09:00
Scott Lystig Fritchie	4e11cdd50f	Bugfix: derp, pattern match for UniqueHistoryTrigger_p	2015-09-14 16:59:58 +09:00
Scott Lystig Fritchie	a036f119a6	Add send_spam_to_everyone(), add 1% chance of using it	2015-09-14 16:01:26 +09:00
Scott Lystig Fritchie	6c543dfc18	Re-use the flapping criteria for a different use (more) Hooray, very early I ended up with a simulator example which kicked in and tested this change. (A deterministice fault injection method for testing would also be valuable, probably.) machi_chain_manager1_converge_demo:t(7, [{private_write_verbose,true}]). We switched partitions in the simulator like this: SET partitions = [{b,f},{c,f},{d,e},{f,e}] (2 of 90252) at {14,37,5} ... Stable projection at epoch 1429 upi=[b,c,g,a,d],repairing=[] ... SET partitions = [{b,d},{c,b},{d,c},{f,a}] (3 of 90252) at {14,37,44} Part of the chain reassembled quickly from the following UPIs: [g], then [g,e], then [g,e,f] via a series of successful simulated repairs. For the first two repairs, all parties (e & f & g) are unanimous about the projections. For the final repair, very strange, not all three adopt [g,e,f] chain: e says nothing, f & g use it. Also weird, then g immediately moves f! upi=[g,e],repairing=[f]. Then e also adopts this chain of 2. From that point forward, f keeps trying to use upi=[g,e,f],[] and the others try using only upi=[g,e],[f]. There are lots of messages from g saying that it's insane (correctly!) to try calc=1487:[g,e],[f] -> 1494:[g,e,f],[] without a valid repair author. It's worth checking why g dropped from [g,e,f] -> [g,e]. But even still, this new use for the flapping counter & reset via C103 is working. ... Ah, now I understand. The very occasional undefined socket bug in machi_flu1_client appears to be the cause: g had a one-time problem talking with f and so decided f was down long enough to make the shorter UPI. The other participants didn't have any such problem with f and so kept f in the UPI. This would have been a deadlock/infinite loop case without someone deciding to reset state.	2015-09-14 15:41:48 +09:00
Scott Lystig Fritchie	23554ffccc	Handle timeout/paritition failures in C110	2015-09-14 13:54:47 +09:00
Scott Lystig Fritchie	fdf78bdbbc	Tweak IsRelevantToMe_p in B10 (more) Last night we hit a rare case of failed convergence. f was out of sync with the rest of the world. f: upi=[b,g,f] repairing=[a,c] The "rest of the world" used a larger chain at: : upi=[c,b,g,a], repairing=[f] And f refused to join the larger chain because of the way that IsRelevantToMe_p was being calculated before this commit. Hrrrm, though, I'm not convinced that this particular problem is fixed 100% by this patch. What if the chain lengths were the same but also UPI incompatible? e.g. if I remove 'a' from the "real world (in the partition simulator)" example above: f: upi=[b,g,f] repairing=[c] : upi=[c,b,g], repairing=[f] Hrmmmmm, I may need to reintroduce the my-recent-adopted-projection- flapping-like-counter thingie to try to break this kind of incompatible deadlock.	2015-09-14 13:40:34 +09:00
Scott Lystig Fritchie	62186395ed	Hooray! The weekend's CP work hasn't broken AP, I believe.	2015-09-14 00:04:53 +09:00
Scott Lystig Fritchie	f5901c6cd3	Hey, appears to work for CP mode chain len=3, hooray!	2015-09-13 21:51:20 +09:00
Scott Lystig Fritchie	89f57616a8	Avoid some churn when both latest & newprop are none proj	2015-09-13 17:44:23 +09:00
Scott Lystig Fritchie	f3a0ee91cf	WIP: thread P_calc_current all the way to C100 for CP mode assist	2015-09-13 15:58:45 +09:00
Scott Lystig Fritchie	0a20417682	Adjustments for CP mode (still slightly experimental)	2015-09-13 14:56:28 +09:00
Scott Lystig Fritchie	32c4d39156	Bugfix: set consistency_mode at set_chain_members	2015-09-13 14:16:02 +09:00
Scott Lystig Fritchie	b3ce9f9ab8	A bit less verbose output	2015-09-11 23:08:47 +09:00
Scott Lystig Fritchie	5efec1b6cd	Add upi_unanimous annotation to AP mode	2015-09-11 21:47:05 +09:00
Scott Lystig Fritchie	fe8ff6033d	Make better state transition choices in AP mode	2015-09-11 19:14:41 +09:00
Scott Lystig Fritchie	a0c129c16d	Bugfix: wow, a chain state transition sanity check bug	2015-09-11 17:32:52 +09:00
Scott Lystig Fritchie	8df7d58365	Add partition simulator support to fitness service	2015-09-11 16:45:29 +09:00
Scott Lystig Fritchie	efe6ce7894	WIP: small refactoring to prepare for fitness server 'use' of partition simulator	2015-09-11 16:03:49 +09:00
Scott Lystig Fritchie	35e8efeb96	Add timer:sleep() to accomodate machi_chain_manager1_converge_demo	2015-09-11 15:56:02 +09:00
Scott Lystig Fritchie	bbf925d132	Add fault injection method via C100 to test C103 admin down cycle	2015-09-10 18:05:55 +09:00
Scott Lystig Fritchie	41737ae62a	Add delete_admin_down API implementation, oops!	2015-09-10 18:05:18 +09:00
Scott Lystig Fritchie	d45c249e89	Add admin down status API to fitness server	2015-09-10 17:30:11 +09:00
Scott Lystig Fritchie	c14b9ce50f	Minor cleanup, add more partitions to converge demo	2015-09-10 16:39:15 +09:00
Scott Lystig Fritchie	af94d1c1c3	Bugfix: ExpectedUPI error in A40	2015-09-10 02:15:49 +09:00
Scott Lystig Fritchie	daf3a3d65a	Remove some verbose debugging cruft	2015-09-10 01:47:46 +09:00
Scott Lystig Fritchie	329a5e0682	Bugfix: damn, no idea how many problems this 5 month old bug caused	2015-09-10 01:33:55 +09:00
Scott Lystig Fritchie	5943494d54	Add ExpectedUPI to A40's AmHosedP clause	2015-09-10 00:43:37 +09:00
Scott Lystig Fritchie	10c655ebfe	WIP: fix one source of problems, now shift back to 'TODO this clause needs more review'	2015-09-09 23:59:40 +09:00
Scott Lystig Fritchie	b7aa33c617	Yeah, nearly there. AP fails occasionally in multiple-asymmetric-partition sequence	2015-09-09 23:10:39 +09:00
Scott Lystig Fritchie	72141c8ecb	WIP: split A30 into A30/A31 based on AllHosed	2015-09-09 21:06:40 +09:00
Scott Lystig Fritchie	5029911b52	WIP: remove verbose goop	2015-09-09 20:46:52 +09:00
Scott Lystig Fritchie	38ea36fc1c	WIP: Stand back, I'm going to try math! ... It works, {redacted}!	2015-09-09 20:45:57 +09:00
Scott Lystig Fritchie	27891bc5e9	WIP: 'broadcast'/spam works! async reminder ticks remain!	2015-09-09 19:14:52 +09:00
Scott Lystig Fritchie	dd095f117f	Derp, fix smoke_test() for machi_fitness:map_set()	2015-09-09 16:49:27 +09:00
Scott Lystig Fritchie	21015efcbb	WIP: Stand back, I'm going to try CRDTs!	2015-09-08 19:13:03 +09:00
Scott Lystig Fritchie	7af863d840	Add stubs of machi_fitness server	2015-09-08 16:13:07 +09:00
Scott Lystig Fritchie	185c9eb313	WIP: add failing eunit placeholder for spam	2015-09-07 15:38:23 +09:00
Scott Lystig Fritchie	c7684f660c	WIP: Friday evening/Monday morning, laying groundwork for spam "broadcast"	2015-09-07 15:20:10 +09:00
Scott Lystig Fritchie	4376ce9ec1	Remove all flap counting and inner projection stuff	2015-09-04 17:17:49 +09:00
Scott Lystig Fritchie	42aeecd9db	Fix machi_projection_store_test error	2015-09-04 15:24:16 +09:00
Scott Lystig Fritchie	3c1026da28	WIP: too tired to continue tonight	2015-09-01 22:10:45 +09:00
Scott Lystig Fritchie	4378ef7b54	Bugfix: inner->outer proj @ A30	2015-09-01 00:51:46 +09:00
Scott Lystig Fritchie	e79265228e	Bugfix: more correct for inner->outer sanity transition	2015-08-31 22:14:28 +09:00
Scott Lystig Fritchie	1e5d58b22d	Bugfix: more to ignore in make_basic_comparison_stable()	2015-08-31 17:57:37 +09:00
Scott Lystig Fritchie	bce225a200	Bugfix: a30_make_inner_projection() ignore newprop down list if none proj	2015-08-31 17:03:12 +09:00
Scott Lystig Fritchie	a095e0cfc3	Bugfix: ignore creation_time in make_comparison_stable()	2015-08-31 15:40:19 +09:00
Scott Lystig Fritchie	c637939cc2	Bugfix: A29 should trigger if EpochID (not Epoch# alone) differs	2015-08-31 15:21:17 +09:00
Scott Lystig Fritchie	5422dc45c2	Bugfix: derp in A29 revival	2015-08-31 14:44:05 +09:00
Scott Lystig Fritchie	004c686c8c	WIP: remove make_zerf() from calc_projection(); add make_zerf() to resurrected A29. Status: broken, needs work	2015-08-30 20:39:58 +09:00
Scott Lystig Fritchie	a449025e8b	Bugfix: epoch handling around none proj: epoch 0 only at first bootstrap!	2015-08-30 19:53:47 +09:00
Scott Lystig Fritchie	ec2e7b5669	Sunday experiment: all-but-remove A29, feels right but definitely not sure yet	2015-08-30 16:08:14 +09:00
Scott Lystig Fritchie	0dc53274d1	Get more aggressive about AllHosed+down nodes for inner proj	2015-08-30 02:22:59 +09:00
Scott Lystig Fritchie	771164b82f	Bugfix: Flapping manifesto, leaving #2 : only if not me	2015-08-30 00:50:23 +09:00
Scott Lystig Fritchie	4b83893047	Bugfix: minor flap count bookeeping error	2015-08-30 00:50:03 +09:00
Scott Lystig Fritchie	a7db3a26c6	Bugfix: a30_make_inner_projection() compatible inner if not none proj	2015-08-30 00:04:13 +09:00
Scott Lystig Fritchie	53d865b247	Bugfix: serious derp fix for A30's inner->outer	2015-08-29 23:42:47 +09:00
Scott Lystig Fritchie	5c8b255da9	Bugfix: first new CP experiments with chain len=5	2015-08-29 22:40:18 +09:00
Scott Lystig Fritchie	94394d3429	Bugfix: allow none proj to re-emerge from flapping (more) See comments added in this commit at A40. So far, I've been doing CP mode testing with a handful of (very useful) network partition combinations using: machi_chain_manager1_converge_demo:t(3, [{private_write_verbose,true}, {consistency_mode, cp_mode}, {witnesses, [a]}]). Next steps: * Expand number & types of partitions * Expand to chain lengths of 5 and beyond	2015-08-29 21:36:53 +09:00
Scott Lystig Fritchie	ee19a0856b	WIP: justincase	2015-08-29 19:59:46 +09:00
Scott Lystig Fritchie	6b84cd6e6a	Reduce poll sleep time when running with partition simulator	2015-08-29 18:30:53 +09:00
Scott Lystig Fritchie	dc5ae4047a	Bugfix: react_to_env_A30 inner->norm fix, make_zerf() none proj derp fix	2015-08-29 18:01:13 +09:00
Scott Lystig Fritchie	c9340a662d	Bugfix: force stable creation_time on inner none proj	2015-08-29 15:06:57 +09:00
Scott Lystig Fritchie	6d9526b379	Add more ?REACT()	2015-08-29 13:13:31 +09:00
Scott Lystig Fritchie	f21fcdd7be	Bugfix: none proj must flap, undo previous commits, which may cause mess later	2015-08-29 13:13:23 +09:00
Scott Lystig Fritchie	af0ade9840	Bugfix: projection checksum fix in A30	2015-08-29 12:33:41 +09:00
Scott Lystig Fritchie	582f9e5eab	Bugfix: fix effectively-none-projection transition to C100. Still buggy	2015-08-28 23:08:38 +09:00
Scott Lystig Fritchie	403cb5b7a6	WIP: improvements, but now flapping inner epoch keeps increasing {sigh}	2015-08-28 21:13:54 +09:00
Scott Lystig Fritchie	9edd91f48e	Bugfixes for a->b column transition & flap dampening	2015-08-28 20:06:09 +09:00
Scott Lystig Fritchie	18aac6e489	WIP: undo AmFlappingNow_p condition added at commit `3dfe5c2`	2015-08-28 18:39:18 +09:00
Scott Lystig Fritchie	3dfe5c2677	WIP: fix annotation history on disk	2015-08-28 18:37:11 +09:00
Scott Lystig Fritchie	8ca1ffdb13	WIP: bugfixes and lots of verbose goop added	2015-08-28 01:55:31 +09:00
Scott Lystig Fritchie	deb2cdee2c	Bugfix: correct epoch number checking when inner proj	2015-08-27 22:22:15 +09:00
Scott Lystig Fritchie	93b9b948fc	WIP: debugging, uff da	2015-08-27 22:02:23 +09:00
Scott Lystig Fritchie	efb89efb0d	Reduce verbosity	2015-08-27 20:27:33 +09:00
Scott Lystig Fritchie	0eaa008810	Change checksum algorithm to exclude 'flap' also	2015-08-27 20:27:24 +09:00
Scott Lystig Fritchie	12b74a52fd	WIP: pre-dinner paranoid checkin	2015-08-27 18:45:27 +09:00
Scott Lystig Fritchie	65cd18939c	WIP: changes to annotation management	2015-08-27 17:58:43 +09:00
Scott Lystig Fritchie	8a61a85ae0	WIP: rewrite make_zerf() to use new annotation scheme	2015-08-27 16:19:22 +09:00
Scott Lystig Fritchie	28335a1310	Add CP mode unwedge. All eunit tests are passing again.	2015-08-26 18:47:39 +09:00
Scott Lystig Fritchie	9222881689	Oops, bugfixes	2015-08-26 17:51:43 +09:00
Scott Lystig Fritchie	568e165f4f	Allow pstore -> FLU unwedge only in ap_mode, machi_cr_client_test broken (uses cp_mode)	2015-08-26 15:51:14 +09:00
Scott Lystig Fritchie	e8f3ab381d	Add set_consistency_mode() to projection store API, use it	2015-08-26 14:57:51 +09:00
Scott Lystig Fritchie	c0ee323637	Our new unit test works, yay	2015-08-25 19:42:33 +09:00
Scott Lystig Fritchie	83f49472db	WIP: intermediate refactoring	2015-08-25 19:31:05 +09:00
Scott Lystig Fritchie	6dbe887298	Remove old cruft, including hugly HTTP server hack	2015-08-25 18:49:48 +09:00
Scott Lystig Fritchie	1c5a17b708	WIP: adjust throttle of flapping 'shut up'	2015-08-25 17:01:14 +09:00
Scott Lystig Fritchie	9a86453753	WIP: half-baked idea, stopping for the night (more) So, I'm 50% sure this is a good idea for CP mode: if there's a later public projection than P_current, then who knows what we might have missed. So, call make_zerf() to find out the absolute latest. Problem: flapping state appears to be lost, booo.	2015-08-24 21:54:30 +09:00
Scott Lystig Fritchie	ea61fe78bf	Add flap disabler for 3 seconds after up/down change	2015-08-24 20:38:54 +09:00
Scott Lystig Fritchie	2f82fe0487	WIP: cp_mode improvements	2015-08-24 19:04:26 +09:00
Scott Lystig Fritchie	66cafe066e	Remove proj_i_history, tweak AllAreFlapping_and_IamBad_and_NotRelevant_p in B10	2015-08-23 20:47:43 +09:00
Scott Lystig Fritchie	70022d11ce	Add damper check for flapping of inner projections, whee!	2015-08-23 20:00:19 +09:00
Scott Lystig Fritchie	561e60a7ac	WIP: start adding support to detect flapping of inner projections (ha!)	2015-08-23 17:50:25 +09:00
Scott Lystig Fritchie	0136fccff7	CP mode fix a30_make_inner_projection	2015-08-23 16:43:15 +09:00
Scott Lystig Fritchie	2d050ff7a6	Fix ?REACT() FSM names: a30->a40	2015-08-23 15:46:57 +09:00
Scott Lystig Fritchie	34d35fab63	Shorten the verbose output of private_write_verbose	2015-08-22 23:30:30 +09:00
Scott Lystig Fritchie	51a06844d5	Fix epoch number reuse bug when transiting C103	2015-08-22 21:40:21 +09:00
Scott Lystig Fritchie	0414da783a	Fix repairs when everyone is in stable flapping state	2015-08-22 21:27:01 +09:00
Scott Lystig Fritchie	a0477d62c0	WIP: bugfix for checking latest proj's flap count	2015-08-22 14:50:10 +09:00
Scott Lystig Fritchie	0278d7254b	Add A29 state for shouting circuit breaker for long long loops	2015-08-20 23:04:27 +09:00
Scott Lystig Fritchie	b46730eb2c	WIP: adjust the flapping manifest: delete clause 3	2015-08-20 21:28:56 +09:00
Scott Lystig Fritchie	71decc5dc0	WIP: AP mode less bad again	2015-08-20 18:47:50 +09:00
Scott Lystig Fritchie	4e7d1f2310	WIP: egadz, a refactoring mess, but finally AP mode not sucky	2015-08-20 17:32:46 +09:00
Scott Lystig Fritchie	a71e9543fe	WIP: refactoring inner handling, but ... (more) There are a couple of weird things in the snippet below (AP mode): 22:32:58.209 b uses inner: [{epoch,136},{author,c},{mode,ap_mode},{witnesses,[]},{upi,[b,c]},{repair,[]},{down,[a]},{flap,undefined},{d,[d_foo1,{ps,[{a,b}]},{nodes_up,[b,c]}]},{d2,[]}] (outer flap epoch 136: {flap_i,{{{epk,115},{1439,904777,11627}},28},[a,{a,problem_with,b},{b,problem_with,a}],[{a,{{{epk,126},{1439,904777,149865}},16}},{b,{{{epk,115},{1439,904777,11627}},28}},{c,{{{epk,121},{1439,904777,134392}},15}}]}) (my flap {{epk,115},{1439,904777,11627}} 29 [{a,{{{epk,126},{1439,904777,149865}},28}},{b,{{{epk,115},{1439,904777,11627}},29}},{c,{{{epk,121},{1439,904777,134392}},26}}]) 22:32:58.224 c uses inner: [{epoch,136},{author,c},{mode,ap_mode},{witnesses,[]},{upi,[b,c]},{repair,[]},{down,[a]},{flap,undefined},{d,[d_foo1,{ps,[{a,b}]},{nodes_up,[b,c]}]},{d2,[]}] (outer flap epoch 136: {flap_i,{{{epk,115},{1439,904777,11627}},28},[a,{a,problem_with,b},{b,problem_with,a}],[{a,{{{epk,126},{1439,904777,149865}},16}},{b,{{{epk,115},{1439,904777,11627}},28}},{c,{{{epk,121},{1439,904777,134392}},15}}]}) (my flap {{epk,121},{1439,904777,134392}} 28 [{a,{{{epk,126},{1439,904777,149865}},28}},{b,{{{epk,115},{1439,904777,11627}},28}},{c,{{{epk,121},{1439,904777,134392}},28}}]) CONFIRM by epoch inner 136 <<103,64,252,...>> at [b,c] [] Priv1 [{a,{{132,<<"Cï\|ÿzKX:Á"...>>},[a],[c],[b],[],false}}, {b,{{127,<<185,139,3,2,96,189,...>>},[b,c],[],[a],[],false}}, {c,{{133,<<145,71,223,6,177,...>>},[b,c],[a],[],[],false}}] agree false Pubs: [{a,136},{b,136},{c,136}] DoIt, 1. Both the "uses inner" messages and also the "CONFIRM by epoch inner 136" show that B & C are using the same inner projection. However, the 'Priv1' output shows b & c on different epochs, 127 & 133. Weird. 2. I've added an infinite loop, probably in this commit. :-(	2015-08-18 22:35:57 +09:00
Scott Lystig Fritchie	9bf0eedb64	WIP: add the flapping manifesto, much is muchmuch better now	2015-08-18 20:49:36 +09:00
Scott Lystig Fritchie	e9268080af	Finish/catchup commit from end of last week, silly me	2015-08-17 20:14:29 +09:00
Scott Lystig Fritchie	48e82ac1a4	WIP: use digraph to calculate better AllHosed	2015-08-14 22:29:20 +09:00
Scott Lystig Fritchie	20f2bf4b92	WIP: more ?REACT() tracing	2015-08-14 22:28:50 +09:00
Scott Lystig Fritchie	d2ce8f8447	Fix repair bug that has survived witness additions, oops	2015-08-14 19:30:36 +09:00
Scott Lystig Fritchie	9e02a1ea73	Add more ?REACT() tracing	2015-08-14 19:30:05 +09:00
Scott Lystig Fritchie	5aff775383	WIP: it's ugly, but CP+witnesses is mostly working?	2015-08-14 17:05:16 +09:00
Scott Lystig Fritchie	4e66d7bd91	WIP: keep CMode propagation consistent, but still violating CP transition safety	2015-08-14 00:12:13 +09:00
Scott Lystig Fritchie	14fad2d704	End-to-end chain state checking is still broken (more) If we use verbose output from: machi_chain_manager1_converge_demo:t(3, [{private_write_verbose,true}, {consistency_mode, cp_mode}, {witnesses, [a]}]). And use: tail -f typescript_file \| egrep --line-buffered 'SET\|attempted\|CONFIRM' ... then we can clearly see a chain safety violation when moving from epoch 81 -> 83. I need to add more smarts to the safety checking, both at the individual transition sanity check and at the converge_demo overall rolling sanity check. Key to output: CONFIRM by epoch {num} {csum} at {UPI} {Repairing} SET # of FLUs = 3 members [a,b,c]). CONFIRM by epoch 1 <<96,161,96,...>> at [a,b] [c] CONFIRM by epoch 5 <<134,243,175,...>> at [b,c] [] CONFIRM by epoch 7 <<207,93,225,...>> at [b,c] [] CONFIRM by epoch 47 <<60,142,248,...>> at [b,c] [] SET partitions = [{c,b},{c,a}] (1 of 2) at {22,3,34} CONFIRM by epoch 81 <<223,58,184,...>> at [a,b] [] SET partitions = [{b,c},{b,a}] (2 of 2) at {22,3,38} CONFIRM by epoch 83 <<33,208,224,...>> at [a,c] [] SET partitions = [] CONFIRM by epoch 85 <<173,179,149,...>> at [a,c] [b]	2015-08-13 22:16:28 +09:00
Scott Lystig Fritchie	f7121f8845	Witness + flapping seems to mostly work, yay!	2015-08-13 21:24:56 +09:00
Scott Lystig Fritchie	425b9c8f60	Merge slf/projection-conditional-write branch	2015-08-13 19:10:48 +09:00
Scott Lystig Fritchie	dcbc3b45ff	C110: handle proj store private write failure when conditional fails	2015-08-13 18:45:15 +09:00
Scott Lystig Fritchie	9768f3c035	Projection store private write returns bad_arg if max_public_epochid is greater	2015-08-13 18:44:25 +09:00
Scott Lystig Fritchie	58d840ef7e	Minor react changes, minor fix for return val of A50	2015-08-13 18:43:41 +09:00
Scott Lystig Fritchie	d4275e5460	WIP: zerf_find_last_common() fix, eunit passes & very basic len=3 converge demo works	2015-08-13 15:41:18 +09:00
Scott Lystig Fritchie	0b8de235a9	WIP: zerf_find_last_common(), but is confused/broken by partial write @ private	2015-08-13 14:21:31 +09:00
Scott Lystig Fritchie	054397d187	WIP: find last common majority epoch	2015-08-12 17:53:39 +09:00
Scott Lystig Fritchie	d340b6a706	WIP: Duh, fix think-o in a40_latest_author_down()	2015-08-12 17:37:45 +09:00
Scott Lystig Fritchie	8e2a688526	WIP: cp_mode code from last Friday	2015-08-11 15:24:26 +09:00
Scott Lystig Fritchie	512251ac55	Adjust flap_limit constant	2015-08-07 12:29:10 +09:00
Scott Lystig Fritchie	3ca0f4491d	WIP: always start chain manager with none projection	2015-08-06 19:24:14 +09:00
Scott Lystig Fritchie	0d7f6c8d7e	WIP: chain transitions are now fully (?) aware of witness servers	2015-08-06 17:48:31 +09:00
Scott Lystig Fritchie	e9c4e2f98d	WIP: rearrange CP mode projection calc	2015-08-06 15:22:04 +09:00
Scott Lystig Fritchie	82b6726261	Revert UPI [] -> [FirstRepairing] to commit `91496c6`	2015-08-06 15:21:44 +09:00
Scott Lystig Fritchie	01da7a7046	TODO WTF was I thinking here??....	2015-08-06 14:13:19 +09:00
Scott Lystig Fritchie	dcf532bafd	WIP: Witness test expansion	2015-08-05 18:23:44 +09:00
Scott Lystig Fritchie	0f18ab8d20	Add better (?) timeout handling to machi_cr_client.erl gen_server calls	2015-08-05 17:48:06 +09:00
Scott Lystig Fritchie	e3d9ba2b83	WIP: Witness test expansion	2015-08-05 17:17:25 +09:00
Scott Lystig Fritchie	b21803a6c6	Fix witness calculation projections, part II	2015-08-05 16:05:03 +09:00
Scott Lystig Fritchie	f43a5ca96d	Fix witness calculation projections, part I	2015-08-05 15:50:32 +09:00
Scott Lystig Fritchie	91496c656b	Oops, fix PB stuff to add witnesses	2015-08-05 12:53:20 +09:00
Scott Lystig Fritchie	3f51357577	WIP: pre-travel code, not sure if good, check in for history	2015-07-30 13:12:08 -07:00
Scott Lystig Fritchie	aa1a31982a	Add 'witnesses' to machi_projection:make_summary()	2015-07-30 13:11:43 -07:00
Scott Lystig Fritchie	6e521700bd	WIP: Adding witness_smoke_test_ but it's broken (more) So, the problem is that the chain manager isn't finishing repair because UPI=[a], and a is a witness, and a can't do the list files etc etc repair stuff that repairer FLUs need to do. The best (?) way forward is to add some advance smarts to the chain manager so that it doesn't propose a UPI of 100% witnesses?	2015-07-21 19:05:04 +09:00
Scott Lystig Fritchie	432190435e	Add witness_mode to FLU	2015-07-21 17:29:33 +09:00
Scott Lystig Fritchie	88d3228a4c	Fix various problems with repair not being aware of inner projections	2015-07-20 16:25:42 +09:00
Scott Lystig Fritchie	9ae4afa58e	Reduce chmgr verbosity a bit	2015-07-20 14:58:21 +09:00
Scott Lystig Fritchie	e14493373b	Bugfix: add missing reset of not_sanes dictionary, fix comments	2015-07-20 14:04:25 +09:00
Scott Lystig Fritchie	f7ef8c54f5	Reduce # of assumptions made by ch_mgr + simulator for 'repair_airquote_done'	2015-07-19 13:32:55 +09:00
Scott Lystig Fritchie	b8c642aaa7	WIP: bugfix for rare flapping infinite loop (done^2 fix I hope) How can even computer? So, there's a flavor of the flapping infinite loop problem that can happen without flapping being detected (by the existing flapping detector, that is). That detector relies on a series of accepted projections to converge to a single projection repeated X times. However, it's possible to have a race with a simulated repair "finishing" that causes a problem so that no more projections are ever accepted. Oops. See also: new comments in do_react_to_env().	2015-07-19 00:43:10 +09:00
Scott Lystig Fritchie	57b7122035	Fix bug found by PULSE that's not directly chain manager-related (more) PULSE managed to create a situation where machi_proxy_flu_client1 would appear to fail a remote attempt to write_projection. The client would retry, but the 1st attempt really did get through to the server. So, if we hit this case, we try to read the projection, and if it's exactly equal to what we tried to write, we consider the op a success. Ditto for write_chunk. Fix up eunit test to accomodate the change of semantics.	2015-07-18 23:22:14 +09:00
Scott Lystig Fritchie	87867f8f2e	WIP: bugfix for rare flapping infinite loop (done fix I hope) {sigh} This is a correction to a think-o error in the "WIP: bugfix for rare flapping infinite loop (better fix I hope)" bugfix that I thought I had finished in the slf/chain-manager/cp-mode branch. Silly me, the test for myself as the author of the not_sane transition was wrong: we don't do that kind of insanity, other nodes might, though. ^_^	2015-07-18 17:53:17 +09:00
Scott Lystig Fritchie	19ce841471	Merge slf/chain-manager/cp-mode (fix conflicts)	2015-07-17 16:39:37 +09:00
Scott Lystig Fritchie	b295c7f374	Log more info on private projection write failure	2015-07-17 16:20:54 +09:00
Scott Lystig Fritchie	f4d16881c0	WIP: bugfix for rare flapping infinite loop (better fix I hope) %% So, I'd tried this kind of "if everyone is doing it, then we %% 'agree' and we can do something different" strategy before, %% and it didn't work then. Silly me. Distributed systems %% lesson #823: do not forget the past. In a situation created %% by PULSE, of all=[a,b,c,d,e], b & d & e were scheduled %% completely unfairly. So a & c were the only authors ever to %% suceessfully write a suggested projection to a public store. %% Oops. %% %% So, we're going to keep track in #ch_mgr state for the number %% of times that this insane judgement has happened.	2015-07-17 14:51:39 +09:00
Scott Lystig Fritchie	0a8821a1c6	WIP: bugfix for rare flapping infinite loop (fixed I hope) I'll run a set of PULSE tests (Cmd_e of the 'regression' style) to try to confirm a fix for this pernicious little thing. Final (?) part of the fix: add myself to SeenFlappers in react_to_env_A30().	2015-07-16 23:23:30 +09:00
Scott Lystig Fritchie	b4d9ac5fe0	Hooray, PULSE things look stable; remove debugging verbose cruft	2015-07-16 21:57:34 +09:00
Scott Lystig Fritchie	c10200138c	Hooray??! Fix the damn PULSE hangs by using infinity supervisor shutdown times	2015-07-16 21:17:46 +09:00
Scott Lystig Fritchie	3a4624ab06	Hrm, fewer deadlocks, but lots of !@#$! mystery hangs @ startup & teardown	2015-07-16 20:13:48 +09:00
Scott Lystig Fritchie	d331e09923	Hrm, fewer deadlocks, but sometimes unreliable shutdown	2015-07-16 17:59:02 +09:00
Scott Lystig Fritchie	f2fc5b91c2	Add more PULSE instrumentation -> more deadlocks	2015-07-16 16:25:38 +09:00
Scott Lystig Fritchie	73ac220d75	Add machi_verbose.hrl	2015-07-16 16:01:53 +09:00
Scott Lystig Fritchie	0ead97093b	WIP: bugfix for rare flapping infinite loop (unfinished) part ...	2015-07-16 00:18:42 +09:00
Scott Lystig Fritchie	18c92c98f8	WIP: bugfix for rare flapping infinite loop (unfinished) part IV	2015-07-15 18:42:59 +09:00
Scott Lystig Fritchie	402720d301	WIP: bugfix for rare flapping infinite loop (unfinished) part II	2015-07-15 17:23:17 +09:00
Scott Lystig Fritchie	6f9a603e99	WIP: bugfix for rare flapping infinite loop (unfinished)	2015-07-15 12:44:56 +09:00
Scott Lystig Fritchie	0f667c4356	WIP: add more debugging/react info	2015-07-15 11:25:06 +09:00
Scott Lystig Fritchie	7c970d90a6	Bugfix: use correct updated #state in react_to_env_A30() {sigh}	2015-07-15 00:44:07 +09:00
Scott Lystig Fritchie	5eb6ebc874	Bugfix: add missing remember_partition_hack() calls in perhaps_call path	2015-07-14 17:17:14 +09:00
Scott Lystig Fritchie	fd66fe46b5	Move react logging in react_to_env_A30()	2015-07-14 17:16:23 +09:00
Scott Lystig Fritchie	0089af0a86	Bugfix: moving inner -> outer projection, use calc_projection() for sanity	2015-07-10 21:11:34 +09:00
Scott Lystig Fritchie	f746b75254	Bugfix: A30: if Kicker_p only true if we actually have an inner proj!	2015-07-10 20:25:44 +09:00
Scott Lystig Fritchie	e9e4c54b25	Bugfix: undo the jump directly from A30 -> C100.	2015-07-10 20:24:44 +09:00

... 2 3 4 5 6 ...

559 commits