machi

greg/machi

Author	SHA1	Message	Date
Scott Lystig Fritchie	c69a206039	WIP: stuck, need to add even MORE repairing list, before continuing	2015-03-02 20:20:19 +09:00
Scott Lystig Fritchie	a36f23ee7a	WIP: stuck, need to add repairing list before continuing with projection sanity check	2015-03-02 20:20:19 +09:00
Scott Lystig Fritchie	8d9cabd214	Bring flowchart & code back into sync, yay!	2015-03-02 20:20:19 +09:00
Scott Lystig Fritchie	30f5a84cea	Clean up cruft, add more comments	2015-03-02 20:20:19 +09:00
Scott Lystig Fritchie	32cfcccf34	First part of larger sanity test is now prototyped. This is some brute-force-and-not-subtle hackery, but it looks like I've got the basis for a test that a model checker (QuickCheck or Concuerror or something else) can use for a good/bad check. The following properties are examined (but not enforced): * At each epoch, are each of the chains disjoint? I.e. no single FLU is a member of different chains at the same epoch. This is a safety/sanity check. * For each unique chain UPI list at each epoch, are all of the FLUs in that chain unanimous in their agreement: agreed_membership: all UPI FLUs agree about the UPI list not_agreed: the membership algorithm has not yet agreed on the UPI list This is not a safety/sanity check per se, but it can be useful input into a good safety check. Some examples: * At epoch 0, there is no agreement on UPI membership of the one [a,b,c] chain. * At epoch 1, there is full agreement, * At epoch 4, we're back to no agreement. * At epoch 17, there's agreement on a small chain with UPI list=[a]. (This agreement continues until epoch 216, but that history is omitted here.) [{0, {ok_disjoint,[{[a,b,c], not_unique,0, [<<159,215,105,140,29,151,142,2,162,90,225,209,10,102,119, 193,110,72,75,245>>, <<213,46,129,248,23,50,210,247,145,68,65,112,232,101,28,56, 239,12,78,227>>, <<230,146,66,183,10,218,57,29,233,166,108,176,118,109, 226,186,190,56,174,108>>]}]}}, {1,{ok_disjoint,[{agreed_membership,[a,b,c]}]}}, {4, {ok_disjoint,[{not_unique,[a,b,c], [not_in_this_epoch, <<208,227,221,233,254,160,36,134,252,106, 124,192,101,171,168,68,169,55,2,54>>]}]}}, {6, {ok_disjoint,[{not_unique,[a,b,c], [not_in_this_epoch, <<191,47,203,143,195,230,71,162,39,132,188, 128,64,39,18,9,73,148,207,220>>]}]}}, {17,{ok_disjoint,[{agreed_membership,[a]}]}}, {24,{ok_disjoint,[{agreed_membership,[a]}]}}, [...] Starting at epoch 419, the network stabilized, but not fully, into two "islands" of servers, a alone and b&c together. At epoch 486, the network is fully stabilized with the same network partition. We see rapid convergence to two chains, [a] and [b,c]. {419,{ok_disjoint,[{agreed_membership,[a]}]}}, {425,{ok_disjoint,[{agreed_membership,[b]}]}}, {436,{ok_disjoint,[{agreed_membership,[b]}]}}, {442,{ok_disjoint,[{agreed_membership,[b]}]}}, {444,{ok_disjoint,[{agreed_membership,[b]}]}}, {454,{ok_disjoint,[{agreed_membership,[b]}]}}, {456,{ok_disjoint,[{agreed_membership,[b]}]}}, {458,{ok_disjoint,[{agreed_membership,[b]}]}}, {463,{ok_disjoint,[{agreed_membership,[b]}]}}, {468,{ok_disjoint,[{agreed_membership,[b]}]}}, {479,{ok_disjoint,[{agreed_membership,[b]}]}}, {482,{ok_disjoint,[{agreed_membership,[b]}]}}, {486,{ok_disjoint,[{agreed_membership,[a]}]}}, {488,{ok_disjoint,[{agreed_membership,[b]}]}}, {490,{ok_disjoint,[{agreed_membership,[b,c]}]}}, {492,{ok_disjoint,[{agreed_membership,[b,c]}]}}] foo	2015-03-02 20:20:19 +09:00
Scott Lystig Fritchie	4def1ad026	Move test code from machi_chain_manager1.erl -> machi_chain_manager1_test.erl	2015-03-02 20:20:19 +09:00
Scott Lystig Fritchie	a2f087181e	Change C210's sleep time to be proportional/ranked to member's rank	2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie	c0ef199c6f	Hey, I think this is finally working, hooray!	2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie	258bce84d1	WIP: remove lots of debugging cruft	2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie	4fac90f5a9	Fix end-of-repair logic by querying repair target for epoch sync	2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie	0a6b8268fb	Fix major error in rank_projection(), silly me	2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie	fdca511385	Fix broken machi_partition_simulator.erl, derp	2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie	0b88a12c16	WIP: Debugging cruft, egadz, but improving (see below) So, this still pops up occasionally: % rebar skip_deps=true -v eunit suites=machi_flu0_test,machi_chain_manager1 [...] a private: [{epoch,223},{author,a},{upi,[a,b]},{repair,[]},{down,[c]},{d,[{author_proc,react},{nodes_up,[a,b]}]},{d2,[{up_nodz,[a,b]},{hooray,{v2,{2014,11,3},{20,19,57}}}]}] b private: [{epoch,224},{author,b},{upi,[b,a]},{repair,[]},{down,[c]},{d,[{author_proc,react},{nodes_up,[a,b]}]},{d2,[{up_nodz,[a,b]},{hooray,{v2,{2014,11,3},{20,19,57}}}]}] c private: [{epoch,191},{author,c},{upi,[c]},{repair,[]},{down,[a,b]},{d,[{author_proc,react},{nodes_up,[c]}]},{d2,[{up_nodz,[c]},{hooray,{v2,{2014,11,3},{20,19,57}}}]}] The mis-ordering between [a,b] and [b,a] happens after the partition settled on the islands of [a,b] and [c]. { c100 , ? LINE , _AnyOtherReturnValue } {c100,734, {err,error, {badmatch,[a,b]}, from, [{epoch,70}, {author,a}, {upi,[a]}, {repair,[b]}, {down,[c]}, {d, [{author_proc,react}, {nodes_up,[a,b]}]}, {d2,[]}], to, [{epoch,194}, {author,b}, {upi,[b,a]}, {repair,[]}, {down,[c]}, {d, [{author_proc,react}, {nodes_up,[a,b]}]}, {d2,[]}], relative_to,a,stack,[...]	2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie	5d0eed865a	Duh, fix really stupid think-o bug in perhaps_call(), oi oi oi	2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie	a94374cc8c	Add machi_partition_simulator.erl + refactor to use it	2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie	4d3a9ed757	WIP: Per notes: change unanimous test @ A20. Improvement!	2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie	201108ec5b	Temp WIP: Change network partitions to be bi-directional only	2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie	6ef46a3464	Temp WIP: I am going to sever the connection between the flowchart and the code. TODO That diagram is really valuable, but it also takes a long time to make any kind of edit; the process is too slow. This is a todo item a reminder that the flowchart is important documentation and must be brought back into sync with the code soon.	2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie	9404e954e7	WIP: chain mgr clutter, trying to debug infinite loop	2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie	fd1b4363b9	WIP: chain manager getting better, but occasionally gets infinite loop (II)	2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie	67f94d1cff	WIP: chain manager getting better, but occasionally gets infinite loop	2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie	2f54525422	Fix chain mgmt flowchart A40 conditions (II)	2015-03-02 20:20:18 +09:00
Scott Lystig Fritchie	c1fd3df35d	Fix chain mgmt flowchart A40 conditions	2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie	12d8a94497	Add reminder about chain manager init bootstrapping TODO	2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie	064b637d81	Remove docs/machi/flowchart-machi-chain-mgmt1.jpg	2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie	83e4937658	Chain manager projection store flowchart implemented & passes smoke test!	2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie	8faa1404c6	Remove unused prev_epoch_num and prev_epoch_csum	2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie	9af576d753	WIP: broken, don't use	2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie	cbc5260e93	WIP: Chain manager projection store flowchart goop draft 2	2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie	dbcc87b4a4	WIP: Chain manager projection store flowchart goop	2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie	ca5ddb2cf1	WIP: chain mgmt prototype scaffolding 9: before start of next simulator stage	2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie	0a77c09779	Fix non-TEST compilation problem	2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie	67b661494e	WIP: chain mgmt prototype scaffolding 8: basic read repair done	2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie	932b6afb76	WIP: chain mgmt prototype scaffolding 7: inching better	2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie	616a11e230	WIP: chain mgmt prototype scaffolding 6: refactoring	2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie	e5b9230af0	WIP: chain mgmt prototype scaffolding 5: before refactor & continuing	2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie	b757878c81	WIP: chain mgmt prototype scaffolding 4: uncompileable at the moment	2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie	dfbbaf6bfe	WIP: chain mgmt prototype scaffolding 3	2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie	63d23330b2	WIP: chain mgmt prototype scaffolding 2	2015-03-02 20:20:17 +09:00
Scott Lystig Fritchie	add6f421aa	WIP: chain mgmt prototype scaffolding 2	2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie	9c04537497	WIP: chain mgmt prototype scaffolding	2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie	3e499e241a	WIP: Fix flu0 name registration	2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie	500a13a01d	WIP: Machi chain management PULSE prototype work	2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie	b41dbffe95	Cruft cleanup	2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie	5e49bd6c29	WIP: Machi chain management PULSE prototype work	2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie	e9ea20e941	Move to private proj store for eunit tests	2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie	ddce145bfb	Add public/private split in projection store of machi_flu0.erl	2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie	fd7dad0714	Coverage is about as good as it's going to get	2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie	da2bad564f	Getting closer to understanding why test coverage appears so poor, part 2	2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie	342a972543	Getting closer to understanding why test coverage appears so poor	2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie	b4f2d314c7	More single chain manager simulation tests	2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie	e717d797b3	Move almost all test code to test/* modules	2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie	01e3325b81	Tiny refactoring of random number gen	2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie	d2f93e919e	Single chain manager simulation test: no bad projection transitions!	2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie	057f958bb1	WIP: chain manager simulation test	2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie	410c8ff7ce	WIP: chain manager simulation test	2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie	b8c87b23ad	WIP: chain manager simulation test	2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie	4ebc80dc39	Add src/machi_util.erl	2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie	a81552ed82	Makefile un-derp'ing	2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie	a5dc72834f	Fix proj0_test for concuerror, yay!	2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie	4969e019b2	Fix proj0_test for concuerror, yay!	2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie	e50e669b79	TODO left off here	2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie	97c5789b44	WIP: eunit tests pass, but Concuerror loops forever then errs on max retries on proj0_test	2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie	f7447e8953	WIP: done (I hope) adding Lamport clocks	2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie	ee7bc2645b	WIP: in the middle of adding Lamport clocks	2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie	b443a15542	register op name sanity: write and _read_	2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie	921d90a69b	WIP: enforce wedging and new projection writes	2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie	bebce51ab9	WIP: minimal write-once projection store in FLU	2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie	34c8c6490a	WIP: add Name arg to start_link()	2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie	2d3a29471d	Minimal FLU0 single register, plus Concuerror tests	2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie	f378204a91	Add fledgling log implementation based on CORFU papers (corfurl stuff)	2015-03-02 20:20:07 +09:00
Scott Lystig Fritchie	370f70303d	Merge branch 'merge/tango-prototype'	2015-03-02 20:07:25 +09:00
Scott Lystig Fritchie	94ebd4bb6f	Rename prototype/tango-prototype -> prototype/tango	2015-03-02 20:06:45 +09:00
Scott Lystig Fritchie	3f3f3e4f5d	Update README.tango.md with latest checkpoint implementation fix notes	2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie	c5ed355dac	Rename tango readme	2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie	8da46f78fe	BAH! Checkpoint is quite broken, see new README.tango.md	2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie	7bf98fa648	All tests pass, but checkpointing does not truncate history	2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie	fed2f43783	WIP: all but queue checkpointing now passes	2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie	0b3bb3ee7c	WIP: tango_oid_test now passes	2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie	a0bb7ee23d	WIP: tango_oid refactoring, all broken: infinite loop	2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie	9a3ac02413	WIP: first round of tango_oid refactoring, all broken horribly	2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie	03f071316c	Gadz, more sequencer cleanup. corfurl_test now passes	2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie	30fc62ab22	Gadz, more sequencer cleanup. corfurl_sequencer_test now passes	2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie	b8c051c89f	Fix broken sequencer semantics. It occurred to me today that I implemented the sequencer incorrectly and hadn't yet noticed because I don't have any tests that are complex/interleaved/perhaps-non-deterministic to find the problem. The problem is that the sequencer's current implementation only keeps track of the last LPN for any Tango stream. The fix is to do what the paper actually says: the sequencer keeps a list of the last $K$ LPNs for each stream. Derp. Yes, that's really necessary to avoid a pretty simple race condition with 2 actors simultaneously updating a single Tango stream. 1st commit: fix the implementation and the smoke test. The broken-everything-else will be repaired in later commits.	2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie	940012cef1	Add checkpoint support for tango_dt_map	2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie	4cf8ac7ed8	Add checkpoint support for tango_dt_queue	2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie	970eb263db	Fix bug in backpointer handling, derp!	2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie	004a18d948	Add checkpoint support for tango_dt_register	2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie	7b9c94553c	Add skeleton support for single-page checkpointing	2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie	1c1e1368dd	Added src/tango_dt_queue.erl plus test	2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie	6caeaeb6b5	Ha! Damn quick and easy to add tango_dt_map.erl	2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie	df53ec0a4e	Refactor register DT into tango_dt.erl and tango_dt_register.erl	2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie	c068057c96	Add missing func corfurl_client:append_page/3, then fix tango_dt_register_test	2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie	cdeddbb582	Heh, demonstrate a concurrency bug that I knew was there, yay, fixit time!	2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie	18b38c249e	First draft of tango_dt_register	2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie	6067e26201	Change semantics of OID map, silly me, to match what's needed	2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie	436c6ac14b	Minor type fixup	2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie	4fe4758d7a	Generic parameterization of the map, done badly, part 1	2015-03-02 20:03:44 +09:00
Scott Lystig Fritchie	9c73872d20	Fix TEST vs PULSE tests	2015-03-02 20:03:44 +09:00
Scott Lystig Fritchie	e9f16d7b1b	Dialyzer clean	2015-03-02 20:03:44 +09:00

1 2 3 4 5

228 commits