Commit graph

404 commits

Author SHA1 Message Date
Scott Lystig Fritchie
d45c249e89 Add admin down status API to fitness server 2015-09-10 17:30:11 +09:00
Scott Lystig Fritchie
c14b9ce50f Minor cleanup, add more partitions to converge demo 2015-09-10 16:39:15 +09:00
Scott Lystig Fritchie
af94d1c1c3 Bugfix: ExpectedUPI error in A40 2015-09-10 02:15:49 +09:00
Scott Lystig Fritchie
daf3a3d65a Remove some verbose debugging cruft 2015-09-10 01:47:46 +09:00
Scott Lystig Fritchie
329a5e0682 Bugfix: damn, no idea how many problems this 5 month old bug caused 2015-09-10 01:33:55 +09:00
Scott Lystig Fritchie
5943494d54 Add ExpectedUPI to A40's AmHosedP clause 2015-09-10 00:43:37 +09:00
Scott Lystig Fritchie
10c655ebfe WIP: fix one source of problems, now shift back to 'TODO this clause needs more review' 2015-09-09 23:59:40 +09:00
Scott Lystig Fritchie
b7aa33c617 Yeah, nearly there. AP fails occasionally in multiple-asymmetric-partition sequence 2015-09-09 23:10:39 +09:00
Scott Lystig Fritchie
72141c8ecb WIP: split A30 into A30/A31 based on AllHosed 2015-09-09 21:06:40 +09:00
Scott Lystig Fritchie
5029911b52 WIP: remove verbose goop 2015-09-09 20:46:52 +09:00
Scott Lystig Fritchie
38ea36fc1c WIP: Stand back, I'm going to try math! ... It works, {redacted}! 2015-09-09 20:45:57 +09:00
Scott Lystig Fritchie
27891bc5e9 WIP: 'broadcast'/spam works! async reminder ticks remain! 2015-09-09 19:14:52 +09:00
Scott Lystig Fritchie
dd095f117f Derp, fix smoke_test() for machi_fitness:map_set() 2015-09-09 16:49:27 +09:00
Scott Lystig Fritchie
21015efcbb WIP: Stand back, I'm going to try CRDTs! 2015-09-08 19:13:03 +09:00
Scott Lystig Fritchie
7af863d840 Add stubs of machi_fitness server 2015-09-08 16:13:07 +09:00
Scott Lystig Fritchie
185c9eb313 WIP: add failing eunit placeholder for spam 2015-09-07 15:38:23 +09:00
Scott Lystig Fritchie
c7684f660c WIP: Friday evening/Monday morning, laying groundwork for spam "broadcast" 2015-09-07 15:20:10 +09:00
Scott Lystig Fritchie
4376ce9ec1 Remove all flap counting and inner projection stuff 2015-09-04 17:17:49 +09:00
Scott Lystig Fritchie
42aeecd9db Fix machi_projection_store_test error 2015-09-04 15:24:16 +09:00
Scott Lystig Fritchie
3c1026da28 WIP: too tired to continue tonight 2015-09-01 22:10:45 +09:00
Scott Lystig Fritchie
4378ef7b54 Bugfix: inner->outer proj @ A30 2015-09-01 00:51:46 +09:00
Scott Lystig Fritchie
e79265228e Bugfix: more correct for inner->outer sanity transition 2015-08-31 22:14:28 +09:00
Scott Lystig Fritchie
1e5d58b22d Bugfix: more to ignore in make_basic_comparison_stable() 2015-08-31 17:57:37 +09:00
Scott Lystig Fritchie
bce225a200 Bugfix: a30_make_inner_projection() ignore newprop down list if none proj 2015-08-31 17:03:12 +09:00
Scott Lystig Fritchie
a095e0cfc3 Bugfix: ignore creation_time in make_comparison_stable() 2015-08-31 15:40:19 +09:00
Scott Lystig Fritchie
c637939cc2 Bugfix: A29 should trigger if EpochID (not Epoch# alone) differs 2015-08-31 15:21:17 +09:00
Scott Lystig Fritchie
5422dc45c2 Bugfix: derp in A29 revival 2015-08-31 14:44:05 +09:00
Scott Lystig Fritchie
004c686c8c WIP: remove make_zerf() from calc_projection(); add make_zerf() to resurrected A29. Status: broken, needs work 2015-08-30 20:39:58 +09:00
Scott Lystig Fritchie
a449025e8b Bugfix: epoch handling around none proj: epoch 0 only at first bootstrap! 2015-08-30 19:53:47 +09:00
Scott Lystig Fritchie
ec2e7b5669 Sunday experiment: all-but-remove A29, feels right but definitely not sure yet 2015-08-30 16:08:14 +09:00
Scott Lystig Fritchie
0dc53274d1 Get more aggressive about AllHosed+down nodes for inner proj 2015-08-30 02:22:59 +09:00
Scott Lystig Fritchie
771164b82f Bugfix: Flapping manifesto, leaving #2: only if not me 2015-08-30 00:50:23 +09:00
Scott Lystig Fritchie
4b83893047 Bugfix: minor flap count bookeeping error 2015-08-30 00:50:03 +09:00
Scott Lystig Fritchie
a7db3a26c6 Bugfix: a30_make_inner_projection() compatible inner if not none proj 2015-08-30 00:04:13 +09:00
Scott Lystig Fritchie
53d865b247 Bugfix: serious derp fix for A30's inner->outer 2015-08-29 23:42:47 +09:00
Scott Lystig Fritchie
5c8b255da9 Bugfix: first new CP experiments with chain len=5 2015-08-29 22:40:18 +09:00
Scott Lystig Fritchie
94394d3429 Bugfix: allow none proj to re-emerge from flapping (more)
See comments added in this commit at A40.

So far, I've been doing CP mode testing with a handful of (very useful)
network partition combinations using:

    machi_chain_manager1_converge_demo:t(3, [{private_write_verbose,true}, {consistency_mode, cp_mode}, {witnesses, [a]}]).

Next steps:

* Expand number & types of partitions
* Expand to chain lengths of 5 and beyond
2015-08-29 21:36:53 +09:00
Scott Lystig Fritchie
ee19a0856b WIP: justincase 2015-08-29 19:59:46 +09:00
Scott Lystig Fritchie
6b84cd6e6a Reduce poll sleep time when running with partition simulator 2015-08-29 18:30:53 +09:00
Scott Lystig Fritchie
dc5ae4047a Bugfix: react_to_env_A30 inner->norm fix, make_zerf() none proj derp fix 2015-08-29 18:01:13 +09:00
Scott Lystig Fritchie
c9340a662d Bugfix: force stable creation_time on inner none proj 2015-08-29 15:06:57 +09:00
Scott Lystig Fritchie
6d9526b379 Add more ?REACT() 2015-08-29 13:13:31 +09:00
Scott Lystig Fritchie
f21fcdd7be Bugfix: none proj must flap, undo previous commits, which may cause mess later 2015-08-29 13:13:23 +09:00
Scott Lystig Fritchie
af0ade9840 Bugfix: projection checksum fix in A30 2015-08-29 12:33:41 +09:00
Scott Lystig Fritchie
582f9e5eab Bugfix: fix effectively-none-projection transition to C100. Still buggy 2015-08-28 23:08:38 +09:00
Scott Lystig Fritchie
403cb5b7a6 WIP: improvements, but now flapping inner epoch keeps increasing {sigh} 2015-08-28 21:13:54 +09:00
Scott Lystig Fritchie
9edd91f48e Bugfixes for a->b column transition & flap dampening 2015-08-28 20:06:09 +09:00
Scott Lystig Fritchie
18aac6e489 WIP: undo AmFlappingNow_p condition added at commit 3dfe5c2 2015-08-28 18:39:18 +09:00
Scott Lystig Fritchie
3dfe5c2677 WIP: fix annotation history on disk 2015-08-28 18:37:11 +09:00
Scott Lystig Fritchie
8ca1ffdb13 WIP: bugfixes and lots of verbose goop added 2015-08-28 01:55:31 +09:00
Scott Lystig Fritchie
deb2cdee2c Bugfix: correct epoch number checking when inner proj 2015-08-27 22:22:15 +09:00
Scott Lystig Fritchie
93b9b948fc WIP: debugging, uff da 2015-08-27 22:02:23 +09:00
Scott Lystig Fritchie
efb89efb0d Reduce verbosity 2015-08-27 20:27:33 +09:00
Scott Lystig Fritchie
0eaa008810 Change checksum algorithm to exclude 'flap' also 2015-08-27 20:27:24 +09:00
Scott Lystig Fritchie
12b74a52fd WIP: pre-dinner paranoid checkin 2015-08-27 18:45:27 +09:00
Scott Lystig Fritchie
65cd18939c WIP: changes to annotation management 2015-08-27 17:58:43 +09:00
Scott Lystig Fritchie
8a61a85ae0 WIP: rewrite make_zerf() to use new annotation scheme 2015-08-27 16:19:22 +09:00
Scott Lystig Fritchie
28335a1310 Add CP mode unwedge. All eunit tests are passing again. 2015-08-26 18:47:39 +09:00
Scott Lystig Fritchie
9222881689 Oops, bugfixes 2015-08-26 17:51:43 +09:00
Scott Lystig Fritchie
568e165f4f Allow pstore -> FLU unwedge only in ap_mode, machi_cr_client_test broken (uses cp_mode) 2015-08-26 15:51:14 +09:00
Scott Lystig Fritchie
e8f3ab381d Add set_consistency_mode() to projection store API, use it 2015-08-26 14:57:51 +09:00
Scott Lystig Fritchie
c0ee323637 Our new unit test works, yay 2015-08-25 19:42:33 +09:00
Scott Lystig Fritchie
83f49472db WIP: intermediate refactoring 2015-08-25 19:31:05 +09:00
Scott Lystig Fritchie
6dbe887298 Remove old cruft, including hugly HTTP server hack 2015-08-25 18:49:48 +09:00
Scott Lystig Fritchie
1c5a17b708 WIP: adjust throttle of flapping 'shut up' 2015-08-25 17:01:14 +09:00
Scott Lystig Fritchie
9a86453753 WIP: half-baked idea, stopping for the night (more)
So, I'm 50% sure this is a good idea for CP mode: if there's
a later public projection than P_current, then who knows what
we might have missed.  So, call make_zerf() to find out the
absolute latest.  Problem: flapping state appears to be lost,
booo.
2015-08-24 21:54:30 +09:00
Scott Lystig Fritchie
ea61fe78bf Add flap disabler for 3 seconds after up/down change 2015-08-24 20:38:54 +09:00
Scott Lystig Fritchie
2f82fe0487 WIP: cp_mode improvements 2015-08-24 19:04:26 +09:00
Scott Lystig Fritchie
66cafe066e Remove proj_i_history, tweak AllAreFlapping_and_IamBad_and_NotRelevant_p in B10 2015-08-23 20:47:43 +09:00
Scott Lystig Fritchie
70022d11ce Add damper check for flapping of *inner* projections, whee! 2015-08-23 20:00:19 +09:00
Scott Lystig Fritchie
561e60a7ac WIP: start adding support to detect flapping of inner projections (ha!) 2015-08-23 17:50:25 +09:00
Scott Lystig Fritchie
0136fccff7 CP mode fix a30_make_inner_projection 2015-08-23 16:43:15 +09:00
Scott Lystig Fritchie
2d050ff7a6 Fix ?REACT() FSM names: a30->a40 2015-08-23 15:46:57 +09:00
Scott Lystig Fritchie
34d35fab63 Shorten the verbose output of private_write_verbose 2015-08-22 23:30:30 +09:00
Scott Lystig Fritchie
51a06844d5 Fix epoch number reuse bug when transiting C103 2015-08-22 21:40:21 +09:00
Scott Lystig Fritchie
0414da783a Fix repairs when everyone is in stable flapping state 2015-08-22 21:27:01 +09:00
Scott Lystig Fritchie
a0477d62c0 WIP: bugfix for checking latest proj's flap count 2015-08-22 14:50:10 +09:00
Scott Lystig Fritchie
0278d7254b Add A29 state for shouting circuit breaker for long long loops 2015-08-20 23:04:27 +09:00
Scott Lystig Fritchie
b46730eb2c WIP: adjust the flapping manifest: delete clause 3 2015-08-20 21:28:56 +09:00
Scott Lystig Fritchie
71decc5dc0 WIP: AP mode less bad again 2015-08-20 18:47:50 +09:00
Scott Lystig Fritchie
4e7d1f2310 WIP: egadz, a refactoring mess, but finally AP mode not sucky 2015-08-20 17:32:46 +09:00
Scott Lystig Fritchie
a71e9543fe WIP: refactoring inner handling, but ... (more)
There are a couple of weird things in the snippet below (AP mode):

    22:32:58.209 b uses inner: [{epoch,136},{author,c},{mode,ap_mode},{witnesses,[]},{upi,[b,c]},{repair,[]},{down,[a]},{flap,undefined},{d,[d_foo1,{ps,[{a,b}]},{nodes_up,[b,c]}]},{d2,[]}] (outer flap epoch 136: {flap_i,{{{epk,115},{1439,904777,11627}},28},[a,{a,problem_with,b},{b,problem_with,a}],[{a,{{{epk,126},{1439,904777,149865}},16}},{b,{{{epk,115},{1439,904777,11627}},28}},{c,{{{epk,121},{1439,904777,134392}},15}}]}) (my flap {{epk,115},{1439,904777,11627}} 29 [{a,{{{epk,126},{1439,904777,149865}},28}},{b,{{{epk,115},{1439,904777,11627}},29}},{c,{{{epk,121},{1439,904777,134392}},26}}])

    22:32:58.224 c uses inner: [{epoch,136},{author,c},{mode,ap_mode},{witnesses,[]},{upi,[b,c]},{repair,[]},{down,[a]},{flap,undefined},{d,[d_foo1,{ps,[{a,b}]},{nodes_up,[b,c]}]},{d2,[]}] (outer flap epoch 136: {flap_i,{{{epk,115},{1439,904777,11627}},28},[a,{a,problem_with,b},{b,problem_with,a}],[{a,{{{epk,126},{1439,904777,149865}},16}},{b,{{{epk,115},{1439,904777,11627}},28}},{c,{{{epk,121},{1439,904777,134392}},15}}]}) (my flap {{epk,121},{1439,904777,134392}} 28 [{a,{{{epk,126},{1439,904777,149865}},28}},{b,{{{epk,115},{1439,904777,11627}},28}},{c,{{{epk,121},{1439,904777,134392}},28}}])

    CONFIRM by epoch inner 136 <<103,64,252,...>> at [b,c] []

    Priv1 [{a,{{132,<<"Cï|ÿzKX:Á"...>>},[a],[c],[b],[],false}},
           {b,{{127,<<185,139,3,2,96,189,...>>},[b,c],[],[a],[],false}},
           {c,{{133,<<145,71,223,6,177,...>>},[b,c],[a],[],[],false}}] agree false
    Pubs: [{a,136},{b,136},{c,136}]
    DoIt,

1. Both the "uses inner" messages and also the "CONFIRM by epoch inner 136"
   show that B & C are using the same inner projection.

   However, the 'Priv1' output shows b & c on different epochs, 127 & 133.
   Weird.

2. I've added an infinite loop, probably in this commit.  :-(
2015-08-18 22:35:57 +09:00
Scott Lystig Fritchie
9bf0eedb64 WIP: add the flapping manifesto, much is muchmuch better now 2015-08-18 20:49:36 +09:00
Scott Lystig Fritchie
e9268080af Finish/catchup commit from end of last week, silly me 2015-08-17 20:14:29 +09:00
Scott Lystig Fritchie
48e82ac1a4 WIP: use digraph to calculate better AllHosed 2015-08-14 22:29:20 +09:00
Scott Lystig Fritchie
20f2bf4b92 WIP: more ?REACT() tracing 2015-08-14 22:28:50 +09:00
Scott Lystig Fritchie
d2ce8f8447 Fix repair bug that has survived witness additions, oops 2015-08-14 19:30:36 +09:00
Scott Lystig Fritchie
9e02a1ea73 Add more ?REACT() tracing 2015-08-14 19:30:05 +09:00
Scott Lystig Fritchie
5aff775383 WIP: it's ugly, but CP+witnesses is mostly working? 2015-08-14 17:05:16 +09:00
Scott Lystig Fritchie
4e66d7bd91 WIP: keep CMode propagation consistent, but still violating CP transition safety 2015-08-14 00:12:13 +09:00
Scott Lystig Fritchie
14fad2d704 End-to-end chain state checking is still broken (more)
If we use verbose output from:

    machi_chain_manager1_converge_demo:t(3, [{private_write_verbose,true}, {consistency_mode, cp_mode}, {witnesses, [a]}]).

And use:

    tail -f typescript_file | egrep --line-buffered 'SET|attempted|CONFIRM'

... then we can clearly see a chain safety violation when moving from
epoch 81 -> 83.  I need to add more smarts to the safety checking,
both at the individual transition sanity check and at the converge_demo
overall rolling sanity check.

Key to output: CONFIRM by epoch {num} {csum} at {UPI} {Repairing}

    SET # of FLUs = 3 members [a,b,c]).
    CONFIRM by epoch 1 <<96,161,96,...>> at [a,b] [c]
    CONFIRM by epoch 5 <<134,243,175,...>> at [b,c] []
    CONFIRM by epoch 7 <<207,93,225,...>> at [b,c] []
    CONFIRM by epoch 47 <<60,142,248,...>> at [b,c] []
    SET partitions = [{c,b},{c,a}] (1 of 2) at {22,3,34}
    CONFIRM by epoch 81 <<223,58,184,...>> at [a,b] []
    SET partitions = [{b,c},{b,a}] (2 of 2) at {22,3,38}
    CONFIRM by epoch 83 <<33,208,224,...>> at [a,c] []
    SET partitions = []
    CONFIRM by epoch 85 <<173,179,149,...>> at [a,c] [b]
2015-08-13 22:16:28 +09:00
Scott Lystig Fritchie
f7121f8845 Witness + flapping seems to mostly work, yay! 2015-08-13 21:24:56 +09:00
Scott Lystig Fritchie
425b9c8f60 Merge slf/projection-conditional-write branch 2015-08-13 19:10:48 +09:00
Scott Lystig Fritchie
dcbc3b45ff C110: handle proj store private write failure when conditional fails 2015-08-13 18:45:15 +09:00
Scott Lystig Fritchie
9768f3c035 Projection store private write returns bad_arg if max_public_epochid is greater 2015-08-13 18:44:25 +09:00
Scott Lystig Fritchie
58d840ef7e Minor react changes, minor fix for return val of A50 2015-08-13 18:43:41 +09:00
Scott Lystig Fritchie
d4275e5460 WIP: zerf_find_last_common() fix, eunit passes & very basic len=3 converge demo works 2015-08-13 15:41:18 +09:00
Scott Lystig Fritchie
0b8de235a9 WIP: zerf_find_last_common(), but is confused/broken by partial write @ private 2015-08-13 14:21:31 +09:00
Scott Lystig Fritchie
054397d187 WIP: find last common majority epoch 2015-08-12 17:53:39 +09:00
Scott Lystig Fritchie
d340b6a706 WIP: Duh, fix think-o in a40_latest_author_down() 2015-08-12 17:37:45 +09:00