Scott Lystig Fritchie
7bf1132142
Bugfix: IsRelevantToMe_p adjustment for P_latest.upi == []
2015-09-14 17:28:50 +09:00
Scott Lystig Fritchie
b4f8bc8058
Add pretty_time(). Add CONFIRM verbose logging for none proj
2015-09-14 17:00:09 +09:00
Scott Lystig Fritchie
4e11cdd50f
Bugfix: derp, pattern match for UniqueHistoryTrigger_p
2015-09-14 16:59:58 +09:00
Scott Lystig Fritchie
a036f119a6
Add send_spam_to_everyone(), add 1% chance of using it
2015-09-14 16:01:26 +09:00
Scott Lystig Fritchie
6c543dfc18
Re-use the flapping criteria for a different use (more)
...
Hooray, very early I ended up with a simulator example which kicked
in and tested this change. (A deterministice fault injection method
for testing would also be valuable, probably.)
machi_chain_manager1_converge_demo:t(7, [{private_write_verbose,true}]).
We switched partitions in the simulator like this:
SET partitions = [{b,f},{c,f},{d,e},{f,e}] (2 of 90252) at {14,37,5}
...
Stable projection at epoch 1429 upi=[b,c,g,a,d],repairing=[]
...
SET partitions = [{b,d},{c,b},{d,c},{f,a}] (3 of 90252) at {14,37,44}
Part of the chain reassembled quickly from the following UPIs: [g], then
[g,e], then [g,e,f] via a series of successful simulated repairs. For
the first two repairs, all parties (e & f & g) are unanimous about the
projections. For the final repair, very strange, not all three adopt
[g,e,f] chain: e says nothing, f & g use it.
Also weird, then g immediately moves f! upi=[g,e],repairing=[f].
Then e also adopts this chain of 2. From that point forward, f keeps
trying to use upi=[g,e,f],[] and the others try using only upi=[g,e],[f].
There are lots of messages from g saying that it's insane (correctly!)
to try calc=1487:[g,e],[f] -> 1494:[g,e,f],[] without a valid repair
author.
It's worth checking why g dropped from [g,e,f] -> [g,e]. But even
still, this new use for the flapping counter & reset via C103 is
working. ... Ah, now I understand. The very occasional undefined
socket bug in machi_flu1_client appears to be the cause: g had a
one-time problem talking with f and so decided f was down long enough to
make the shorter UPI. The other participants didn't have any such
problem with f and so kept f in the UPI. This would have been a
deadlock/infinite loop case without someone deciding to reset state.
2015-09-14 15:41:48 +09:00
Scott Lystig Fritchie
23554ffccc
Handle timeout/paritition failures in C110
2015-09-14 13:54:47 +09:00
Scott Lystig Fritchie
fdf78bdbbc
Tweak IsRelevantToMe_p in B10 (more)
...
Last night we hit a rare case of failed convergence.
f was out of sync with the rest of the world.
f: upi=[b,g,f] repairing=[a,c]
The "rest of the world" used a larger chain at:
*: upi=[c,b,g,a], repairing=[f]
And f refused to join the larger chain because of the way that
IsRelevantToMe_p was being calculated before this commit.
Hrrrm, though, I'm not convinced that this particular problem
is fixed 100% by this patch. What if the chain lengths were
the same but also UPI incompatible? e.g. if I remove 'a' from
the "real world (in the partition simulator)" example above:
f: upi=[b,g,f] repairing=[c]
*: upi=[c,b,g], repairing=[f]
Hrmmmmm, I may need to reintroduce the my-recent-adopted-projection-
flapping-like-counter thingie to try to break this kind of
incompatible deadlock.
2015-09-14 13:40:34 +09:00
Scott Lystig Fritchie
62186395ed
Hooray! The weekend's CP work hasn't broken AP, I believe.
2015-09-14 00:04:53 +09:00
Scott Lystig Fritchie
f5901c6cd3
Hey, appears to work for CP mode chain len=3, hooray!
2015-09-13 21:51:20 +09:00
Scott Lystig Fritchie
89f57616a8
Avoid some churn when both latest & newprop are none proj
2015-09-13 17:44:23 +09:00
Scott Lystig Fritchie
f3a0ee91cf
WIP: thread P_calc_current all the way to C100 for CP mode assist
2015-09-13 15:58:45 +09:00
Scott Lystig Fritchie
0a20417682
Adjustments for CP mode (still slightly experimental)
2015-09-13 14:56:28 +09:00
Scott Lystig Fritchie
32c4d39156
Bugfix: set consistency_mode at set_chain_members
2015-09-13 14:16:02 +09:00
Scott Lystig Fritchie
b3ce9f9ab8
A bit less verbose output
2015-09-11 23:08:47 +09:00
Scott Lystig Fritchie
5efec1b6cd
Add upi_unanimous annotation to AP mode
2015-09-11 21:47:05 +09:00
Scott Lystig Fritchie
fe8ff6033d
Make better state transition choices in AP mode
2015-09-11 19:14:41 +09:00
Scott Lystig Fritchie
a0c129c16d
Bugfix: wow, a chain state transition sanity check bug
2015-09-11 17:32:52 +09:00
Scott Lystig Fritchie
8df7d58365
Add partition simulator support to fitness service
2015-09-11 16:45:29 +09:00
Scott Lystig Fritchie
efe6ce7894
WIP: small refactoring to prepare for fitness server 'use' of partition simulator
2015-09-11 16:03:49 +09:00
Scott Lystig Fritchie
35e8efeb96
Add timer:sleep() to accomodate machi_chain_manager1_converge_demo
2015-09-11 15:56:02 +09:00
Scott Lystig Fritchie
bbf925d132
Add fault injection method via C100 to test C103 admin down cycle
2015-09-10 18:05:55 +09:00
Scott Lystig Fritchie
41737ae62a
Add delete_admin_down API implementation, oops!
2015-09-10 18:05:18 +09:00
Scott Lystig Fritchie
d45c249e89
Add admin down status API to fitness server
2015-09-10 17:30:11 +09:00
Scott Lystig Fritchie
c14b9ce50f
Minor cleanup, add more partitions to converge demo
2015-09-10 16:39:15 +09:00
Scott Lystig Fritchie
af94d1c1c3
Bugfix: ExpectedUPI error in A40
2015-09-10 02:15:49 +09:00
Scott Lystig Fritchie
daf3a3d65a
Remove some verbose debugging cruft
2015-09-10 01:47:46 +09:00
Scott Lystig Fritchie
329a5e0682
Bugfix: damn, no idea how many problems this 5 month old bug caused
2015-09-10 01:33:55 +09:00
Scott Lystig Fritchie
5943494d54
Add ExpectedUPI to A40's AmHosedP clause
2015-09-10 00:43:37 +09:00
Scott Lystig Fritchie
10c655ebfe
WIP: fix one source of problems, now shift back to 'TODO this clause needs more review'
2015-09-09 23:59:40 +09:00
Scott Lystig Fritchie
b7aa33c617
Yeah, nearly there. AP fails occasionally in multiple-asymmetric-partition sequence
2015-09-09 23:10:39 +09:00
Scott Lystig Fritchie
72141c8ecb
WIP: split A30 into A30/A31 based on AllHosed
2015-09-09 21:06:40 +09:00
Scott Lystig Fritchie
5029911b52
WIP: remove verbose goop
2015-09-09 20:46:52 +09:00
Scott Lystig Fritchie
38ea36fc1c
WIP: Stand back, I'm going to try math! ... It works, {redacted}!
2015-09-09 20:45:57 +09:00
Scott Lystig Fritchie
27891bc5e9
WIP: 'broadcast'/spam works! async reminder ticks remain!
2015-09-09 19:14:52 +09:00
Scott Lystig Fritchie
dd095f117f
Derp, fix smoke_test() for machi_fitness:map_set()
2015-09-09 16:49:27 +09:00
Scott Lystig Fritchie
21015efcbb
WIP: Stand back, I'm going to try CRDTs!
2015-09-08 19:13:03 +09:00
Scott Lystig Fritchie
7af863d840
Add stubs of machi_fitness server
2015-09-08 16:13:07 +09:00
Scott Lystig Fritchie
185c9eb313
WIP: add failing eunit placeholder for spam
2015-09-07 15:38:23 +09:00
Scott Lystig Fritchie
c7684f660c
WIP: Friday evening/Monday morning, laying groundwork for spam "broadcast"
2015-09-07 15:20:10 +09:00
Scott Lystig Fritchie
4376ce9ec1
Remove all flap counting and inner projection stuff
2015-09-04 17:17:49 +09:00
Scott Lystig Fritchie
42aeecd9db
Fix machi_projection_store_test error
2015-09-04 15:24:16 +09:00
Scott Lystig Fritchie
3c1026da28
WIP: too tired to continue tonight
2015-09-01 22:10:45 +09:00
Scott Lystig Fritchie
4378ef7b54
Bugfix: inner->outer proj @ A30
2015-09-01 00:51:46 +09:00
Scott Lystig Fritchie
e79265228e
Bugfix: more correct for inner->outer sanity transition
2015-08-31 22:14:28 +09:00
Scott Lystig Fritchie
1e5d58b22d
Bugfix: more to ignore in make_basic_comparison_stable()
2015-08-31 17:57:37 +09:00
Scott Lystig Fritchie
bce225a200
Bugfix: a30_make_inner_projection() ignore newprop down list if none proj
2015-08-31 17:03:12 +09:00
Scott Lystig Fritchie
a095e0cfc3
Bugfix: ignore creation_time in make_comparison_stable()
2015-08-31 15:40:19 +09:00
Scott Lystig Fritchie
c637939cc2
Bugfix: A29 should trigger if EpochID (not Epoch# alone) differs
2015-08-31 15:21:17 +09:00
Scott Lystig Fritchie
5422dc45c2
Bugfix: derp in A29 revival
2015-08-31 14:44:05 +09:00
Scott Lystig Fritchie
004c686c8c
WIP: remove make_zerf() from calc_projection(); add make_zerf() to resurrected A29. Status: broken, needs work
2015-08-30 20:39:58 +09:00
Scott Lystig Fritchie
a449025e8b
Bugfix: epoch handling around none proj: epoch 0 only at first bootstrap!
2015-08-30 19:53:47 +09:00
Scott Lystig Fritchie
ec2e7b5669
Sunday experiment: all-but-remove A29, feels right but definitely not sure yet
2015-08-30 16:08:14 +09:00
Scott Lystig Fritchie
0dc53274d1
Get more aggressive about AllHosed+down nodes for inner proj
2015-08-30 02:22:59 +09:00
Scott Lystig Fritchie
771164b82f
Bugfix: Flapping manifesto, leaving #2 : only if not me
2015-08-30 00:50:23 +09:00
Scott Lystig Fritchie
4b83893047
Bugfix: minor flap count bookeeping error
2015-08-30 00:50:03 +09:00
Scott Lystig Fritchie
a7db3a26c6
Bugfix: a30_make_inner_projection() compatible inner if not none proj
2015-08-30 00:04:13 +09:00
Scott Lystig Fritchie
53d865b247
Bugfix: serious derp fix for A30's inner->outer
2015-08-29 23:42:47 +09:00
Scott Lystig Fritchie
5c8b255da9
Bugfix: first new CP experiments with chain len=5
2015-08-29 22:40:18 +09:00
Scott Lystig Fritchie
94394d3429
Bugfix: allow none proj to re-emerge from flapping (more)
...
See comments added in this commit at A40.
So far, I've been doing CP mode testing with a handful of (very useful)
network partition combinations using:
machi_chain_manager1_converge_demo:t(3, [{private_write_verbose,true}, {consistency_mode, cp_mode}, {witnesses, [a]}]).
Next steps:
* Expand number & types of partitions
* Expand to chain lengths of 5 and beyond
2015-08-29 21:36:53 +09:00
Scott Lystig Fritchie
ee19a0856b
WIP: justincase
2015-08-29 19:59:46 +09:00
Scott Lystig Fritchie
6b84cd6e6a
Reduce poll sleep time when running with partition simulator
2015-08-29 18:30:53 +09:00
Scott Lystig Fritchie
dc5ae4047a
Bugfix: react_to_env_A30 inner->norm fix, make_zerf() none proj derp fix
2015-08-29 18:01:13 +09:00
Scott Lystig Fritchie
c9340a662d
Bugfix: force stable creation_time on inner none proj
2015-08-29 15:06:57 +09:00
Scott Lystig Fritchie
6d9526b379
Add more ?REACT()
2015-08-29 13:13:31 +09:00
Scott Lystig Fritchie
f21fcdd7be
Bugfix: none proj must flap, undo previous commits, which may cause mess later
2015-08-29 13:13:23 +09:00
Scott Lystig Fritchie
af0ade9840
Bugfix: projection checksum fix in A30
2015-08-29 12:33:41 +09:00
Scott Lystig Fritchie
582f9e5eab
Bugfix: fix effectively-none-projection transition to C100. Still buggy
2015-08-28 23:08:38 +09:00
Scott Lystig Fritchie
403cb5b7a6
WIP: improvements, but now flapping inner epoch keeps increasing {sigh}
2015-08-28 21:13:54 +09:00
Scott Lystig Fritchie
9edd91f48e
Bugfixes for a->b column transition & flap dampening
2015-08-28 20:06:09 +09:00
Scott Lystig Fritchie
18aac6e489
WIP: undo AmFlappingNow_p condition added at commit 3dfe5c2
2015-08-28 18:39:18 +09:00
Scott Lystig Fritchie
3dfe5c2677
WIP: fix annotation history on disk
2015-08-28 18:37:11 +09:00
Scott Lystig Fritchie
8ca1ffdb13
WIP: bugfixes and lots of verbose goop added
2015-08-28 01:55:31 +09:00
Scott Lystig Fritchie
deb2cdee2c
Bugfix: correct epoch number checking when inner proj
2015-08-27 22:22:15 +09:00
Scott Lystig Fritchie
93b9b948fc
WIP: debugging, uff da
2015-08-27 22:02:23 +09:00
Scott Lystig Fritchie
efb89efb0d
Reduce verbosity
2015-08-27 20:27:33 +09:00
Scott Lystig Fritchie
0eaa008810
Change checksum algorithm to exclude 'flap' also
2015-08-27 20:27:24 +09:00
Scott Lystig Fritchie
12b74a52fd
WIP: pre-dinner paranoid checkin
2015-08-27 18:45:27 +09:00
Scott Lystig Fritchie
65cd18939c
WIP: changes to annotation management
2015-08-27 17:58:43 +09:00
Scott Lystig Fritchie
8a61a85ae0
WIP: rewrite make_zerf() to use new annotation scheme
2015-08-27 16:19:22 +09:00
Scott Lystig Fritchie
28335a1310
Add CP mode unwedge. All eunit tests are passing again.
2015-08-26 18:47:39 +09:00
Scott Lystig Fritchie
9222881689
Oops, bugfixes
2015-08-26 17:51:43 +09:00
Scott Lystig Fritchie
568e165f4f
Allow pstore -> FLU unwedge only in ap_mode, machi_cr_client_test broken (uses cp_mode)
2015-08-26 15:51:14 +09:00
Scott Lystig Fritchie
e8f3ab381d
Add set_consistency_mode() to projection store API, use it
2015-08-26 14:57:51 +09:00
Scott Lystig Fritchie
c0ee323637
Our new unit test works, yay
2015-08-25 19:42:33 +09:00
Scott Lystig Fritchie
83f49472db
WIP: intermediate refactoring
2015-08-25 19:31:05 +09:00
Scott Lystig Fritchie
6dbe887298
Remove old cruft, including hugly HTTP server hack
2015-08-25 18:49:48 +09:00
Scott Lystig Fritchie
1c5a17b708
WIP: adjust throttle of flapping 'shut up'
2015-08-25 17:01:14 +09:00
Scott Lystig Fritchie
9a86453753
WIP: half-baked idea, stopping for the night (more)
...
So, I'm 50% sure this is a good idea for CP mode: if there's
a later public projection than P_current, then who knows what
we might have missed. So, call make_zerf() to find out the
absolute latest. Problem: flapping state appears to be lost,
booo.
2015-08-24 21:54:30 +09:00
Scott Lystig Fritchie
ea61fe78bf
Add flap disabler for 3 seconds after up/down change
2015-08-24 20:38:54 +09:00
Scott Lystig Fritchie
2f82fe0487
WIP: cp_mode improvements
2015-08-24 19:04:26 +09:00
Scott Lystig Fritchie
66cafe066e
Remove proj_i_history, tweak AllAreFlapping_and_IamBad_and_NotRelevant_p in B10
2015-08-23 20:47:43 +09:00
Scott Lystig Fritchie
70022d11ce
Add damper check for flapping of *inner* projections, whee!
2015-08-23 20:00:19 +09:00
Scott Lystig Fritchie
561e60a7ac
WIP: start adding support to detect flapping of inner projections (ha!)
2015-08-23 17:50:25 +09:00
Scott Lystig Fritchie
0136fccff7
CP mode fix a30_make_inner_projection
2015-08-23 16:43:15 +09:00
Scott Lystig Fritchie
2d050ff7a6
Fix ?REACT() FSM names: a30->a40
2015-08-23 15:46:57 +09:00
Scott Lystig Fritchie
34d35fab63
Shorten the verbose output of private_write_verbose
2015-08-22 23:30:30 +09:00
Scott Lystig Fritchie
51a06844d5
Fix epoch number reuse bug when transiting C103
2015-08-22 21:40:21 +09:00
Scott Lystig Fritchie
0414da783a
Fix repairs when everyone is in stable flapping state
2015-08-22 21:27:01 +09:00
Scott Lystig Fritchie
a0477d62c0
WIP: bugfix for checking latest proj's flap count
2015-08-22 14:50:10 +09:00
Scott Lystig Fritchie
0278d7254b
Add A29 state for shouting circuit breaker for long long loops
2015-08-20 23:04:27 +09:00