Commit graph

826 commits

Author SHA1 Message Date
Mark Allen
f3e6d46e36 Fix chain manager failures disabling active mode
The FLU psup starts the chain manager in active mode by default
(as it should for normal run-time operation.) By adding the
{active_mode, false} tuple to the options list, we can
tell the chain manager that it should be explicitly manipulated
during tests.
2015-10-11 23:05:44 -05:00
Mark Allen
da0b331936 WIP 2015-10-11 23:05:27 -05:00
Mark Allen
855f94925c Validate semantics on partial reads 2015-10-11 23:05:00 -05:00
Mark Allen
8187e01fe0 Use psup startup 2015-10-11 23:04:43 -05:00
Mark Allen
289b2bcc7c Debug WIP 2015-10-11 23:04:29 -05:00
Mark Allen
5926cef44a Make test start up more reliable 2015-10-08 15:49:22 -05:00
Mark Allen
d9ede473dd Bump lager to 2.2.0
That brings it up to the latest release of 2.x; should consider
using lager 3 though.
2015-10-08 15:48:04 -05:00
Mark Allen
c1b9038447 The return value of ets is generally 'true' 2015-10-08 15:47:11 -05:00
Mark Allen
aca3759e45 Bug fixes found during testing runs 2015-10-08 15:46:40 -05:00
Mark Allen
1ecbb5cffe Fixed order of start_link parameters 2015-10-08 15:45:04 -05:00
Mark Allen
303aad97e9 Use {error, bad_checksum} directly
We previously copied {error, bad_csum} as it was used in the main
FLU code.  The protobufs stuff expects the full atom bad_checksum
though.
2015-10-08 15:43:54 -05:00
Mark Allen
679046600f Merge remote-tracking branch 'origin/bug/from-bp-request-error' into mra/write-once-clean 2015-10-07 23:02:03 -05:00
Scott Lystig Fritchie
796937fe75 Add LL generic error PB response decoding 2015-10-08 12:33:55 +09:00
Scott Lystig Fritchie
0054445f13 Delete spammy message from fitness servers every 5 seconds 2015-10-07 18:52:24 +09:00
Mark Allen
d627f238bf Cache generated names until disk files are written 2015-10-06 22:44:31 -05:00
Mark Allen
f83b0973f2 Have to call filename mgr with FluName 2015-10-06 22:43:19 -05:00
Mark Allen
7a6999465a Make sure we use '^' as filename separators 2015-10-06 22:02:31 -05:00
Mark Allen
2d0c03ef35 Integration with current FLU implementation 2015-10-05 22:18:29 -05:00
Mark Allen
36c11e7d08 Add a metadata manager supervisor 2015-10-05 16:37:53 -05:00
Mark Allen
d3fe7ee181 Pull write-once files over to clean branch
I am treating the original write-once branch as a prototype
which I am now throwing away. I had too much work interleved
in there, so I felt like the best thing to do would be to cut
a new clean branch and pull the files over and start over
against a recent-ish master.

We will have to refactor the other things in FLU in a more
piecemeal fashion.
2015-10-02 16:29:09 -05:00
Scott Lystig Fritchie
d7daf203fb Update TODO-shortterm.org for completion of fitness work 2015-09-22 16:44:49 +09:00
Scott Lystig Fritchie
3fb3890788 Merge branch 'slf/cp-mode-adjustments' to 'master' 2015-09-22 16:19:58 +09:00
Scott Lystig Fritchie
6d5b61f747 Tweaks to sleep_ranked_order() call in C200 2015-09-21 21:47:25 +09:00
Scott Lystig Fritchie
5eecb2b935 Change to P_current_calc epoch @ C100 2015-09-21 21:44:03 +09:00
Scott Lystig Fritchie
6425cca13f Fix broken eunit test 2015-09-21 21:44:03 +09:00
Scott Lystig Fritchie
340af05f0f WIP: server-side of CP mode repairing-as-witness 2015-09-21 21:44:03 +09:00
Scott Lystig Fritchie
d9b9397e75 Avoid some projection churn in C100's sanity check 2015-09-21 21:44:03 +09:00
Scott Lystig Fritchie
5010d03677 Call manage_last_down_list() at C220 and C310 2015-09-21 15:36:54 +09:00
Scott Lystig Fritchie
69a304102e Write public proj in all_members order only 2015-09-21 15:09:16 +09:00
Scott Lystig Fritchie
58b19e76be Merge branch temp integration branch 'slf/tmp/merge0920' 2015-09-20 22:45:21 +09:00
Scott Lystig Fritchie
41836b01e6 Merge branch 'slf/chain-manager/remove-inner' into slf/tmp/merge0920 2015-09-20 20:19:00 +09:00
Scott Lystig Fritchie
83e878eb07 More verbosity, whee 2015-09-20 14:06:55 +09:00
Scott Lystig Fritchie
6b4ed1c061 Verbose debugging cruft 2015-09-19 14:25:07 +09:00
Scott Lystig Fritchie
72bfa163ba Small test bugfixes & verbose/debugging cruft 2015-09-19 14:16:54 +09:00
Scott Lystig Fritchie
d695f30e4f Avoid using host/port combo for machi_fitness (ab)use of machi_projection 2015-09-17 16:43:08 +09:00
Scott Lystig Fritchie
09ae2db0ba Bugfix: double-check local private projection write with a read 2015-09-16 16:31:10 +09:00
Scott Lystig Fritchie
79b1d156c4 Add backlog option to gen_tcp:listen 2015-09-16 13:52:36 +09:00
Scott Lystig Fritchie
778bd015ee Bugfix: pattern matching error in C110 2015-09-16 12:41:53 +09:00
Scott Lystig Fritchie
d3b116bd9e Bugfix: CP mode: ignore P_latest if it has UPI or down server in my down list 2015-09-15 17:55:18 +09:00
Scott Lystig Fritchie
5001406499 Add proplist-based configuration for TCP port and tmp dir for converge demo 2015-09-15 17:54:27 +09:00
Scott Lystig Fritchie
75c94420e0 Add test_ets_table to give programmatic slowdown 2015-09-14 22:52:41 +09:00
Scott Lystig Fritchie
7bf1132142 Bugfix: IsRelevantToMe_p adjustment for P_latest.upi == [] 2015-09-14 17:28:50 +09:00
Scott Lystig Fritchie
b4f8bc8058 Add pretty_time(). Add CONFIRM verbose logging for none proj 2015-09-14 17:00:09 +09:00
Scott Lystig Fritchie
4e11cdd50f Bugfix: derp, pattern match for UniqueHistoryTrigger_p 2015-09-14 16:59:58 +09:00
Scott Lystig Fritchie
a036f119a6 Add send_spam_to_everyone(), add 1% chance of using it 2015-09-14 16:01:26 +09:00
Scott Lystig Fritchie
6c543dfc18 Re-use the flapping criteria for a different use (more)
Hooray, very early I ended up with a simulator example which kicked
in and tested this change.  (A deterministice fault injection method
for testing would also be valuable, probably.)

    machi_chain_manager1_converge_demo:t(7, [{private_write_verbose,true}]).

We switched partitions in the simulator like this:

    SET partitions = [{b,f},{c,f},{d,e},{f,e}] (2 of 90252) at {14,37,5}
    ...
    Stable projection at epoch 1429 upi=[b,c,g,a,d],repairing=[]
    ...
    SET partitions = [{b,d},{c,b},{d,c},{f,a}] (3 of 90252) at {14,37,44}

Part of the chain reassembled quickly from the following UPIs: [g], then
[g,e], then [g,e,f] via a series of successful simulated repairs.  For
the first two repairs, all parties (e & f & g) are unanimous about the
projections.  For the final repair, very strange, not all three adopt
[g,e,f] chain: e says nothing, f & g use it.

Also weird, then g immediately moves f!  upi=[g,e],repairing=[f].
Then e also adopts this chain of 2.  From that point forward, f keeps
trying to use upi=[g,e,f],[] and the others try using only upi=[g,e],[f].
There are lots of messages from g saying that it's insane (correctly!)
to try calc=1487:[g,e],[f] -> 1494:[g,e,f],[] without a valid repair
author.

It's worth checking why g dropped from [g,e,f] -> [g,e].  But even
still, this new use for the flapping counter & reset via C103 is
working.  ... Ah, now I understand.  The very occasional undefined
socket bug in machi_flu1_client appears to be the cause: g had a
one-time problem talking with f and so decided f was down long enough to
make the shorter UPI.  The other participants didn't have any such
problem with f and so kept f in the UPI.  This would have been a
deadlock/infinite loop case without someone deciding to reset state.
2015-09-14 15:41:48 +09:00
Scott Lystig Fritchie
23554ffccc Handle timeout/paritition failures in C110 2015-09-14 13:54:47 +09:00
Scott Lystig Fritchie
fdf78bdbbc Tweak IsRelevantToMe_p in B10 (more)
Last night we hit a rare case of failed convergence.

f was out of sync with the rest of the world.
f: upi=[b,g,f] repairing=[a,c]
The "rest of the world" used a larger chain at:
*: upi=[c,b,g,a], repairing=[f]

And f refused to join the larger chain because of the way that
IsRelevantToMe_p was being calculated before this commit.

Hrrrm, though, I'm not convinced that this particular problem
is fixed 100% by this patch.  What if the chain lengths were
the same but also UPI incompatible?  e.g. if I remove 'a' from
the "real world (in the partition simulator)" example above:

f: upi=[b,g,f] repairing=[c]
*: upi=[c,b,g], repairing=[f]

Hrmmmmm, I may need to reintroduce the my-recent-adopted-projection-
flapping-like-counter thingie to try to break this kind of
incompatible deadlock.
2015-09-14 13:40:34 +09:00
Scott Lystig Fritchie
62186395ed Hooray! The weekend's CP work hasn't broken AP, I believe. 2015-09-14 00:04:53 +09:00
Scott Lystig Fritchie
f5901c6cd3 Hey, appears to work for CP mode chain len=3, hooray! 2015-09-13 21:51:20 +09:00