Mark Allen
289b2bcc7c
Debug WIP
2015-10-11 23:04:29 -05:00
Mark Allen
c1b9038447
The return value of ets is generally 'true'
2015-10-08 15:47:11 -05:00
Mark Allen
aca3759e45
Bug fixes found during testing runs
2015-10-08 15:46:40 -05:00
Mark Allen
1ecbb5cffe
Fixed order of start_link parameters
2015-10-08 15:45:04 -05:00
Mark Allen
303aad97e9
Use {error, bad_checksum} directly
...
We previously copied {error, bad_csum} as it was used in the main
FLU code. The protobufs stuff expects the full atom bad_checksum
though.
2015-10-08 15:43:54 -05:00
Scott Lystig Fritchie
952d2fa508
Change flag_checksum -> flag_no_checksum for consistency
2015-10-08 20:41:59 +09:00
Mark Allen
679046600f
Merge remote-tracking branch 'origin/bug/from-bp-request-error' into mra/write-once-clean
2015-10-07 23:02:03 -05:00
Scott Lystig Fritchie
796937fe75
Add LL generic error PB response decoding
2015-10-08 12:33:55 +09:00
Scott Lystig Fritchie
0054445f13
Delete spammy message from fitness servers every 5 seconds
2015-10-07 18:52:24 +09:00
Mark Allen
d627f238bf
Cache generated names until disk files are written
2015-10-06 22:44:31 -05:00
Mark Allen
f83b0973f2
Have to call filename mgr with FluName
2015-10-06 22:43:19 -05:00
Mark Allen
7a6999465a
Make sure we use '^' as filename separators
2015-10-06 22:02:31 -05:00
Mark Allen
2d0c03ef35
Integration with current FLU implementation
2015-10-05 22:18:29 -05:00
Mark Allen
36c11e7d08
Add a metadata manager supervisor
2015-10-05 16:37:53 -05:00
Mark Allen
d3fe7ee181
Pull write-once files over to clean branch
...
I am treating the original write-once branch as a prototype
which I am now throwing away. I had too much work interleved
in there, so I felt like the best thing to do would be to cut
a new clean branch and pull the files over and start over
against a recent-ish master.
We will have to refactor the other things in FLU in a more
piecemeal fashion.
2015-10-02 16:29:09 -05:00
Scott Lystig Fritchie
6d5b61f747
Tweaks to sleep_ranked_order() call in C200
2015-09-21 21:47:25 +09:00
Scott Lystig Fritchie
5eecb2b935
Change to P_current_calc epoch @ C100
2015-09-21 21:44:03 +09:00
Scott Lystig Fritchie
340af05f0f
WIP: server-side of CP mode repairing-as-witness
2015-09-21 21:44:03 +09:00
Scott Lystig Fritchie
d9b9397e75
Avoid some projection churn in C100's sanity check
2015-09-21 21:44:03 +09:00
Scott Lystig Fritchie
5010d03677
Call manage_last_down_list() at C220 and C310
2015-09-21 15:36:54 +09:00
Scott Lystig Fritchie
69a304102e
Write public proj in all_members order only
2015-09-21 15:09:16 +09:00
Scott Lystig Fritchie
6b4ed1c061
Verbose debugging cruft
2015-09-19 14:25:07 +09:00
Scott Lystig Fritchie
72bfa163ba
Small test bugfixes & verbose/debugging cruft
2015-09-19 14:16:54 +09:00
Scott Lystig Fritchie
d695f30e4f
Avoid using host/port combo for machi_fitness (ab)use of machi_projection
2015-09-17 16:43:08 +09:00
Scott Lystig Fritchie
09ae2db0ba
Bugfix: double-check local private projection write with a read
2015-09-16 16:31:10 +09:00
Scott Lystig Fritchie
79b1d156c4
Add backlog option to gen_tcp:listen
2015-09-16 13:52:36 +09:00
Scott Lystig Fritchie
778bd015ee
Bugfix: pattern matching error in C110
2015-09-16 12:41:53 +09:00
Scott Lystig Fritchie
d3b116bd9e
Bugfix: CP mode: ignore P_latest if it has UPI or down server in my down list
2015-09-15 17:55:18 +09:00
Scott Lystig Fritchie
75c94420e0
Add test_ets_table to give programmatic slowdown
2015-09-14 22:52:41 +09:00
Scott Lystig Fritchie
7bf1132142
Bugfix: IsRelevantToMe_p adjustment for P_latest.upi == []
2015-09-14 17:28:50 +09:00
Scott Lystig Fritchie
b4f8bc8058
Add pretty_time(). Add CONFIRM verbose logging for none proj
2015-09-14 17:00:09 +09:00
Scott Lystig Fritchie
4e11cdd50f
Bugfix: derp, pattern match for UniqueHistoryTrigger_p
2015-09-14 16:59:58 +09:00
Scott Lystig Fritchie
a036f119a6
Add send_spam_to_everyone(), add 1% chance of using it
2015-09-14 16:01:26 +09:00
Scott Lystig Fritchie
6c543dfc18
Re-use the flapping criteria for a different use (more)
...
Hooray, very early I ended up with a simulator example which kicked
in and tested this change. (A deterministice fault injection method
for testing would also be valuable, probably.)
machi_chain_manager1_converge_demo:t(7, [{private_write_verbose,true}]).
We switched partitions in the simulator like this:
SET partitions = [{b,f},{c,f},{d,e},{f,e}] (2 of 90252) at {14,37,5}
...
Stable projection at epoch 1429 upi=[b,c,g,a,d],repairing=[]
...
SET partitions = [{b,d},{c,b},{d,c},{f,a}] (3 of 90252) at {14,37,44}
Part of the chain reassembled quickly from the following UPIs: [g], then
[g,e], then [g,e,f] via a series of successful simulated repairs. For
the first two repairs, all parties (e & f & g) are unanimous about the
projections. For the final repair, very strange, not all three adopt
[g,e,f] chain: e says nothing, f & g use it.
Also weird, then g immediately moves f! upi=[g,e],repairing=[f].
Then e also adopts this chain of 2. From that point forward, f keeps
trying to use upi=[g,e,f],[] and the others try using only upi=[g,e],[f].
There are lots of messages from g saying that it's insane (correctly!)
to try calc=1487:[g,e],[f] -> 1494:[g,e,f],[] without a valid repair
author.
It's worth checking why g dropped from [g,e,f] -> [g,e]. But even
still, this new use for the flapping counter & reset via C103 is
working. ... Ah, now I understand. The very occasional undefined
socket bug in machi_flu1_client appears to be the cause: g had a
one-time problem talking with f and so decided f was down long enough to
make the shorter UPI. The other participants didn't have any such
problem with f and so kept f in the UPI. This would have been a
deadlock/infinite loop case without someone deciding to reset state.
2015-09-14 15:41:48 +09:00
Scott Lystig Fritchie
23554ffccc
Handle timeout/paritition failures in C110
2015-09-14 13:54:47 +09:00
Scott Lystig Fritchie
fdf78bdbbc
Tweak IsRelevantToMe_p in B10 (more)
...
Last night we hit a rare case of failed convergence.
f was out of sync with the rest of the world.
f: upi=[b,g,f] repairing=[a,c]
The "rest of the world" used a larger chain at:
*: upi=[c,b,g,a], repairing=[f]
And f refused to join the larger chain because of the way that
IsRelevantToMe_p was being calculated before this commit.
Hrrrm, though, I'm not convinced that this particular problem
is fixed 100% by this patch. What if the chain lengths were
the same but also UPI incompatible? e.g. if I remove 'a' from
the "real world (in the partition simulator)" example above:
f: upi=[b,g,f] repairing=[c]
*: upi=[c,b,g], repairing=[f]
Hrmmmmm, I may need to reintroduce the my-recent-adopted-projection-
flapping-like-counter thingie to try to break this kind of
incompatible deadlock.
2015-09-14 13:40:34 +09:00
Scott Lystig Fritchie
62186395ed
Hooray! The weekend's CP work hasn't broken AP, I believe.
2015-09-14 00:04:53 +09:00
Scott Lystig Fritchie
f5901c6cd3
Hey, appears to work for CP mode chain len=3, hooray!
2015-09-13 21:51:20 +09:00
Scott Lystig Fritchie
89f57616a8
Avoid some churn when both latest & newprop are none proj
2015-09-13 17:44:23 +09:00
Scott Lystig Fritchie
f3a0ee91cf
WIP: thread P_calc_current all the way to C100 for CP mode assist
2015-09-13 15:58:45 +09:00
Scott Lystig Fritchie
0a20417682
Adjustments for CP mode (still slightly experimental)
2015-09-13 14:56:28 +09:00
Scott Lystig Fritchie
32c4d39156
Bugfix: set consistency_mode at set_chain_members
2015-09-13 14:16:02 +09:00
Scott Lystig Fritchie
b3ce9f9ab8
A bit less verbose output
2015-09-11 23:08:47 +09:00
Scott Lystig Fritchie
5efec1b6cd
Add upi_unanimous annotation to AP mode
2015-09-11 21:47:05 +09:00
Scott Lystig Fritchie
fe8ff6033d
Make better state transition choices in AP mode
2015-09-11 19:14:41 +09:00
Scott Lystig Fritchie
a0c129c16d
Bugfix: wow, a chain state transition sanity check bug
2015-09-11 17:32:52 +09:00
Scott Lystig Fritchie
8df7d58365
Add partition simulator support to fitness service
2015-09-11 16:45:29 +09:00
Scott Lystig Fritchie
efe6ce7894
WIP: small refactoring to prepare for fitness server 'use' of partition simulator
2015-09-11 16:03:49 +09:00
Scott Lystig Fritchie
35e8efeb96
Add timer:sleep() to accomodate machi_chain_manager1_converge_demo
2015-09-11 15:56:02 +09:00
Scott Lystig Fritchie
bbf925d132
Add fault injection method via C100 to test C103 admin down cycle
2015-09-10 18:05:55 +09:00
Scott Lystig Fritchie
41737ae62a
Add delete_admin_down API implementation, oops!
2015-09-10 18:05:18 +09:00
Scott Lystig Fritchie
d45c249e89
Add admin down status API to fitness server
2015-09-10 17:30:11 +09:00
Scott Lystig Fritchie
c14b9ce50f
Minor cleanup, add more partitions to converge demo
2015-09-10 16:39:15 +09:00
Scott Lystig Fritchie
af94d1c1c3
Bugfix: ExpectedUPI error in A40
2015-09-10 02:15:49 +09:00
Scott Lystig Fritchie
daf3a3d65a
Remove some verbose debugging cruft
2015-09-10 01:47:46 +09:00
Scott Lystig Fritchie
329a5e0682
Bugfix: damn, no idea how many problems this 5 month old bug caused
2015-09-10 01:33:55 +09:00
Scott Lystig Fritchie
5943494d54
Add ExpectedUPI to A40's AmHosedP clause
2015-09-10 00:43:37 +09:00
Scott Lystig Fritchie
10c655ebfe
WIP: fix one source of problems, now shift back to 'TODO this clause needs more review'
2015-09-09 23:59:40 +09:00
Scott Lystig Fritchie
b7aa33c617
Yeah, nearly there. AP fails occasionally in multiple-asymmetric-partition sequence
2015-09-09 23:10:39 +09:00
Scott Lystig Fritchie
72141c8ecb
WIP: split A30 into A30/A31 based on AllHosed
2015-09-09 21:06:40 +09:00
Scott Lystig Fritchie
5029911b52
WIP: remove verbose goop
2015-09-09 20:46:52 +09:00
Scott Lystig Fritchie
38ea36fc1c
WIP: Stand back, I'm going to try math! ... It works, {redacted}!
2015-09-09 20:45:57 +09:00
Scott Lystig Fritchie
27891bc5e9
WIP: 'broadcast'/spam works! async reminder ticks remain!
2015-09-09 19:14:52 +09:00
Scott Lystig Fritchie
dd095f117f
Derp, fix smoke_test() for machi_fitness:map_set()
2015-09-09 16:49:27 +09:00
Scott Lystig Fritchie
21015efcbb
WIP: Stand back, I'm going to try CRDTs!
2015-09-08 19:13:03 +09:00
Scott Lystig Fritchie
7af863d840
Add stubs of machi_fitness server
2015-09-08 16:13:07 +09:00
Scott Lystig Fritchie
185c9eb313
WIP: add failing eunit placeholder for spam
2015-09-07 15:38:23 +09:00
Scott Lystig Fritchie
c7684f660c
WIP: Friday evening/Monday morning, laying groundwork for spam "broadcast"
2015-09-07 15:20:10 +09:00
Scott Lystig Fritchie
4376ce9ec1
Remove all flap counting and inner projection stuff
2015-09-04 17:17:49 +09:00
Scott Lystig Fritchie
42aeecd9db
Fix machi_projection_store_test error
2015-09-04 15:24:16 +09:00
Scott Lystig Fritchie
3c1026da28
WIP: too tired to continue tonight
2015-09-01 22:10:45 +09:00
Scott Lystig Fritchie
4378ef7b54
Bugfix: inner->outer proj @ A30
2015-09-01 00:51:46 +09:00
Scott Lystig Fritchie
e79265228e
Bugfix: more correct for inner->outer sanity transition
2015-08-31 22:14:28 +09:00
Scott Lystig Fritchie
1e5d58b22d
Bugfix: more to ignore in make_basic_comparison_stable()
2015-08-31 17:57:37 +09:00
Scott Lystig Fritchie
bce225a200
Bugfix: a30_make_inner_projection() ignore newprop down list if none proj
2015-08-31 17:03:12 +09:00
Scott Lystig Fritchie
a095e0cfc3
Bugfix: ignore creation_time in make_comparison_stable()
2015-08-31 15:40:19 +09:00
Scott Lystig Fritchie
c637939cc2
Bugfix: A29 should trigger if EpochID (not Epoch# alone) differs
2015-08-31 15:21:17 +09:00
Scott Lystig Fritchie
5422dc45c2
Bugfix: derp in A29 revival
2015-08-31 14:44:05 +09:00
Scott Lystig Fritchie
004c686c8c
WIP: remove make_zerf() from calc_projection(); add make_zerf() to resurrected A29. Status: broken, needs work
2015-08-30 20:39:58 +09:00
Scott Lystig Fritchie
a449025e8b
Bugfix: epoch handling around none proj: epoch 0 only at first bootstrap!
2015-08-30 19:53:47 +09:00
Scott Lystig Fritchie
ec2e7b5669
Sunday experiment: all-but-remove A29, feels right but definitely not sure yet
2015-08-30 16:08:14 +09:00
Scott Lystig Fritchie
0dc53274d1
Get more aggressive about AllHosed+down nodes for inner proj
2015-08-30 02:22:59 +09:00
Scott Lystig Fritchie
771164b82f
Bugfix: Flapping manifesto, leaving #2 : only if not me
2015-08-30 00:50:23 +09:00
Scott Lystig Fritchie
4b83893047
Bugfix: minor flap count bookeeping error
2015-08-30 00:50:03 +09:00
Scott Lystig Fritchie
a7db3a26c6
Bugfix: a30_make_inner_projection() compatible inner if not none proj
2015-08-30 00:04:13 +09:00
Scott Lystig Fritchie
53d865b247
Bugfix: serious derp fix for A30's inner->outer
2015-08-29 23:42:47 +09:00
Scott Lystig Fritchie
5c8b255da9
Bugfix: first new CP experiments with chain len=5
2015-08-29 22:40:18 +09:00
Scott Lystig Fritchie
94394d3429
Bugfix: allow none proj to re-emerge from flapping (more)
...
See comments added in this commit at A40.
So far, I've been doing CP mode testing with a handful of (very useful)
network partition combinations using:
machi_chain_manager1_converge_demo:t(3, [{private_write_verbose,true}, {consistency_mode, cp_mode}, {witnesses, [a]}]).
Next steps:
* Expand number & types of partitions
* Expand to chain lengths of 5 and beyond
2015-08-29 21:36:53 +09:00
Scott Lystig Fritchie
ee19a0856b
WIP: justincase
2015-08-29 19:59:46 +09:00
Scott Lystig Fritchie
6b84cd6e6a
Reduce poll sleep time when running with partition simulator
2015-08-29 18:30:53 +09:00
Scott Lystig Fritchie
dc5ae4047a
Bugfix: react_to_env_A30 inner->norm fix, make_zerf() none proj derp fix
2015-08-29 18:01:13 +09:00
Scott Lystig Fritchie
c9340a662d
Bugfix: force stable creation_time on inner none proj
2015-08-29 15:06:57 +09:00
Scott Lystig Fritchie
6d9526b379
Add more ?REACT()
2015-08-29 13:13:31 +09:00
Scott Lystig Fritchie
f21fcdd7be
Bugfix: none proj must flap, undo previous commits, which may cause mess later
2015-08-29 13:13:23 +09:00
Scott Lystig Fritchie
af0ade9840
Bugfix: projection checksum fix in A30
2015-08-29 12:33:41 +09:00
Scott Lystig Fritchie
582f9e5eab
Bugfix: fix effectively-none-projection transition to C100. Still buggy
2015-08-28 23:08:38 +09:00
Scott Lystig Fritchie
403cb5b7a6
WIP: improvements, but now flapping inner epoch keeps increasing {sigh}
2015-08-28 21:13:54 +09:00
Scott Lystig Fritchie
9edd91f48e
Bugfixes for a->b column transition & flap dampening
2015-08-28 20:06:09 +09:00
Scott Lystig Fritchie
18aac6e489
WIP: undo AmFlappingNow_p condition added at commit 3dfe5c2
2015-08-28 18:39:18 +09:00
Scott Lystig Fritchie
3dfe5c2677
WIP: fix annotation history on disk
2015-08-28 18:37:11 +09:00