The prior commit wasn't sufficient: the range of transitions is wider than
assumed by that commit. So, we take one of two options, with a TODO task
of researching the other option.
Fix for today: We are going to game the system. We know that
C100 is going to be checking authorship relative to P_current's
UPI's tail. Therefore, we're just going to set it here.
Why??? Because we have been using this projection safely for
the entire flapping period! ... The only other way I see is to
allow C100 to carve out an exception if the repair finished
PLUS author_server check fails PLUS if we came from here, but
that feels a bit fragile to me: if some code factoring happens
in projection_transition_is_saneprojection_transition_is_sane()
or elsewhere that causes the author_server check to be
something-other-than-the-final-thing-checked, then such a
refactoring would likely cause an even harder bug to find &
fix. Conditions tested: 5 FLUs plus alternating partitions of:
[
[{a,b}], [], [{a,b}], [], [{a,b}], [], [{a,b}], [], [{a,b}], [],
[{b,a},{d,e}],
[{a,b}], [], [{a,b}], [], [{a,b}], [], [{a,b}], [], [{a,b}], []
].
%% We have a small problem for state transition sanity checking in the
%% case where we are flapping *and* a repair has finished. One of the
%% sanity checks in simple_chain_state_transition_is_sane(() is that
%% the author of P2 in this case must be the tail of P1's UPI: i.e.,
%% it's the tail's responsibility to perform repair, therefore the tail
%% must damn well be the author of any transition that says a repair
%% finished successfully.
%%
%% The problem is that author_server of the inner projection does not
%% reflect the actual author! See the comment with the text
%% "The inner projection will have a fake author" in
%react_to_env_A30().
%%
%% So, there's a special return value that tells us to try to check for
%% the correct authorship here.
Ha, famous last words, amirite?
%% The chain sequence/order checks at the bottom of this function aren't
%% as easy-to-read as they ought to be. However, I'm moderately confident
%% that it isn't buggy. TODO: refactor them for clarity.
So, now machi_chain_manager1:projection_transition_is_sane() is using
newer, far less buggy code to make sanity decisions.
TODO: Add support for Retrospective mode. TODO is it really needed?
Examples of how the old code sucks and the new code sucks less.
138> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))).
xxxxxxxxxxxx..x.xxxxxx..x.x....x..xx........................................................Failed! After 69 tests.
[a,b,c]
{c,[a,b,c],[c,b],b,[b,a],[b,a,c]}
Old_res ([335,192,166,160,153,139]): true
New_res: false (why line [1936])
Shrinking xxxxxxxxxxxx.xxxxxxx.xxx.xxxxxxxxxxxxxxxxx(3 times)
[a,b,c]
%% {Author1,UPI1, Repair1,Author2,UPI2, Repair2} %%
{c, [a,b,c],[], a, [b,a],[]}
Old_res ([338,185,160,153,147]): true
New_res: false (why line [1936])
false
Old code is wrong: we've swapped order of a & b, which is bad.
139> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))).
xxxxxxxxxx..x...xx..........xxx..x..............x......x............................................(x10)...(x1)........Failed! After 120 tests.
[b,c,a]
{c,[c,a],[c],a,[a,b],[b,a]}
Old_res ([335,192,185,160,153,123]): true
New_res: false (why line [1936])
Shrinking xx.xxxxxx.x.xxxxxxxx.xxxxxxxxxxx(4 times)
[b,a,c]
%% {Author1,UPI1,Repair1,Author2,UPI2, Repair2} %%
{a, [c], [], c, [c,b],[]}
Old_res ([338,185,160,153,147]): true
New_res: false (why line [1936])
false
Old code is wrong: b wasn't repairing in the previous state.
150> eqc:quickcheck(eqc:testing_time(10, machi_chain_manager1_test:prop_compare_legacy_with_v2_chain_transition_check(whole))).
xxxxxxxxxxx....x...xxxxx..xx.....x.......xxx..x.......xxx...................x................x......(x10).....(x1)........xFailed! After 130 tests.
[c,a,b]
{b,[c],[b,a,c],c,[c,a,b],[b]}
Old_res ([335,214,185,160,153,147]): true
New_res: false (why line [1936])
Shrinking xxxx.x.xxx.xxxxxxx.xxxxxxxxx(4 times)
[c,b,a]
%% {Author1,UPI1,Repair1,Author2,UPI2, Repair2} %%
{c, [c], [a,b], c, [c,b,a],[]}
Old_res ([335,328,185,160,153,111]): true
New_res: false (why line [1981,1679])
false
Old code is wrong: a & b were repairing but UPI2 has a & b in the wrong order.
%% Demo/exploratory hackery to check relative speeds of dealing with
%% checksum data in different ways.
%%
%% Summary:
%%
%% * Use compact binary encoding, with 1 byte header for entry length.
%% * Because the hex-style code is *far* slower just for enc & dec ops.
%% * For 1M entries of enc+dec: 0.215 sec vs. 15.5 sec.
%% * File sorter when sorting binaries as-is is only 30-40% slower
%% than an in-memory split (of huge binary emulated by file:read_file()
%% "big slurp") and sort of the same as-is sortable binaries.
%% * File sorter slows by a factor of about 2.5 if {order, fun compare/2}
%% function must be used, i.e. because the checksum entry lengths differ.
%% * File sorter + {order, fun compare/2} is still *far* faster than external
%% sort by OS X's sort(1) of sortable ASCII hex-style:
%% 4.5 sec vs. 21 sec.
%% * File sorter {order, fun compare/2} is faster than in-memory sort
%% of order-friendly 3-tuple-style: 4.5 sec vs. 15 sec.
So, the PB style encoding of the Mpb_LL_WriteProjectionReq message
is about 35-36 times slower than using Erlang's term_to_binary()
and binary_to_term(). {sigh}
So, there's some cheating going on, because some of the parts of
the #projection_v1{} and #p_srvr{} records aren't fully specified.
Those parts are being specified as "opaque" in the field names, e.g.
optional bytes opaque_flap = 10;
optional bytes opaque_inner = 11;
required bytes opaque_dbg = 12;
required bytes opaque_dbg2 = 13;
The serialization that's being used is erlang term sexprs. That isn't
portable. So if/when we really need to deal with a non-Erlang
language, we'll have to straighten this out further.
This goes to show that mixing implementation and protocol and API
and lots of other stuff ... is cool for the quick hack to do one thing
but really sucks when trying to do more than one thing.
* Proof-of-concept only: add HTTP/1.0'ish 'PUT' interface to be the
rough equivalent of machi_cr_client:append_chunk/3
* Proof-of-concept only: add HTTP/1.0'ish 'GET' interface to be the
rough equivalent of machi_cr_client:read_chunk/4
Example use: `append_chunk`
% curl http://127.0.0.1:4444/foo -0 -T /etc/hosts -v
* Hostname was NOT found in DNS cache
* Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 4444 (#0)
> PUT /foo HTTP/1.0
> User-Agent: curl/7.37.1
> Host: 127.0.0.1:4444
> Accept: */*
> Content-Length: 338
>
* We are completely uploaded and fine
* HTTP 1.0, assume close after body
< HTTP/1.0 201 Created
< Location: foo.50EI18AX.21
< X-Offset: 3052
< X-Size: 338
<
* Closing connection 0
Example_use: `read_chunk`
curl 'http://127.0.0.1:4444/foo.50EI18AX.21?offset=3052&size=338' -0 -v
* Hostname was NOT found in DNS cache
* Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 4444 (#0)
> GET /foo.50EI18AX.21?offset=3052&size=338 HTTP/1.0
> User-Agent: curl/7.37.1
> Host: 127.0.0.1:4444
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Content-Length: 338
<
##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting. Do not change this entry.
##
127.0.0.1 localhost
127.0.0.1 test.localhost
255.255.255.255 broadcasthost
::1 localhost
fe80::1%lo0 localhost
# Xxxxxxx Yyyyy
192.168.99.222 zzzzz
127.0.0.1 aaaaaaaa.bb.ccccccccc.com
* Closing connection 0
=INFO REPORT==== 11-May-2015::19:50:09 ===
Chain tail a of [a] starting repair of [c]
=INFO REPORT==== 11-May-2015::19:50:12 ===
Chain tail a of [a]: repair finished in 2.438 seconds: todo_yo
* Set max length of a chain at -define(MAX_CHAIN_LENGTH, 64).
* Perturb tick sleep time of each manager
* If a chain manager L has zero members in its chain, and then its local
public projection store (authored by some remote author R) has a projection
that contains L, then adopt R's projection and start humming consensus.
* Handle "cross-talk" across projection stores, when chain membership
is changed administratively, e.g. chain was [a,b,c] then changed to merely
[a], but that change only happens on a. Servers b & c continue to use
stale projections and scribble their projection suggestions to a, causing
it to flap.
What's really cool about the flapping handling is that it *works*. I
wasn't thinking about this scenario when designing the flapping logic, but
it's really nifty that this extra scenario causes a to flap and then a's
inner projection remains stable, yay!
* Add complaints when "cross-talk" is observed.
* Fix flapping sleep time throttle.
* Fix bug in the machi_projection_store.erl's bookkeeping of the
max epoch number when flapping.
Introduce machi_flu_psup:start_flu_package/4 as a way to start all
related FLU processes
* The projection store
* The chain manager
* The FLU itself
... as well as linked processes.
http://www.snookles.com/scotttmp/flu-tree-20150430.png shows one FLU
running, "a". The process registered "a" is the append server,
"some-prefix" for the sequencer & writer for the current <<"some-prefix">>
file, and a process each for 3 active TCP connections to that FLU.
I'd first thought that having that code there would be a kind of
useful reminder: please move me somewhere else. However, there's
quite a bit there that's "cluster of clusters" stuff and not
appropriate for the current short-term work.