I'd first thought that having that code there would be a kind of
useful reminder: please move me somewhere else. However, there's
quite a bit there that's "cluster of clusters" stuff and not
appropriate for the current short-term work.
Small cleanups
Small cleanups
Refactoring argnames & order for more consistency
Add server-side-calculated MD5 checksum + logging
file:consult() style checksum management, too slow! 513K csums = 105 seconds, ouch
Much faster checksum recording
Add checksum_list. Alas, line-by-line I/O is slow, neh?
Much faster checksum listing
Add file0_verify_checksums.escript and supporting code
Adjust escript +A and -smp flags
Add file0_compare_filelists.escript
First draft of file0_repair_server.escript
First draft of file0_repair_server.escript, part 2
WIP of file0_repair_server.escript, part 3
WIP of file0_repair_server.escript, part 4
Basic repair works, it seems, hooray!
When checksum file ordering is different, try a cheap(?) 'cmp' on sorted results instead
Add README.md
Initial import of szone_chash.erl
Add file0_cc_make_projection.escript and supporting code
Add file0_cc_map_prefix.escript and supporting code
Change think-o: hash output is a chain, silly boy
Add file0_cc_1file_write_redundant.escript and support
Add file0_cc_read_client.escript and supporting code
Add examples/servers.map & file0_start_servers.escript
WIP: working on file0_cc_migrate_files.escript
File migration finished, works, yay!
Add basic 'what am I' docs to each script
Add file0_server_daemon.escript
Minor fixes
Fix broken unit test
Add basho_bench run() commands for append & read ops with projection
Add to examples dir
WIP: erasure coding hack, part 1
Fix broken unit test
WIP: erasure coding hack, part 2
WIP: erasure coding hack, part 3, EC data write is finished!
WIP: erasure coding hack, part 4, EC data read still in progress
WIP: erasure coding hack, part 5, EC data read still in progress
WIP: erasure coding hack, part 5b, EC data read still in progress
WIP: erasure coding hack, EC data read finished!
README update, part 1
README update, part 2
Oops, put back the printed ouput for file-write-client and 1file-write-redundant-client
README update, part 3
Fix 'user' output bug in list-client
Ugly hacks to get output/no-output from write clients
Clean up minor output bugs
Clean up minor output bugs, part 2
README update, part 4
Clean up minor output bugs, part 3
Clean up minor output bugs, part 5
Clean up minor output bugs, part 6
README update, part 6
README update, part 7
README update, part 7
README update, part 8
Final edits/fixes for demo day
Fix another oops in the README/demo day script
So, it definitely works, in that it stops a low(er) ranking flapping
process from continuing to make new proposals, so then the cycle of
flapping stops. Whenever an up/down state changes and a new/different
proposal is made, then things immediately resume, yay.
However, there's still a problem of the chain state at this time,
I believe. Here's a session that's damped by the flap counter:
SET always_last_partitions ON ... we should see convergence to correct chains.
21:23:03.170 d uses: [{epoch,457},{author,a},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,3}}}]}]
21:23:03.270 c uses: [{epoch,457},{author,a},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,3}}}]}]
21:23:03.471 a uses: [{epoch,459},{author,a},{upi,[a,d]},{repair,[c]},{down,[b]},{d,[{repair_airquote_done,{we_agree,457}},{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,3}}}]}]
21:23:03.611 b uses: [{epoch,460},{author,b},{upi,[b]},{repair,[c,d]},{down,[a]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,3}}}]}]
21:23:03.635 d uses: [{epoch,461},{author,d},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,3}}}]}]
21:23:03.672 c uses: [{epoch,461},{author,d},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,3}}}]}]
21:23:03.873 a uses: [{epoch,462},{author,a},{upi,[a,d]},{repair,[c]},{down,[b]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,3}}}]}]
21:23:04.155 d uses: [{epoch,463},{author,d},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,4}}}]}]
21:23:04.198 c uses: [{epoch,463},{author,d},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,4}}}]}]
21:23:04.270 b uses: [{epoch,464},{author,b},{upi,[b]},{repair,[c,d]},{down,[a]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,4}}}]}]
21:23:04.276 a uses: [{epoch,465},{author,a},{upi,[a,d]},{repair,[c]},{down,[b]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,4}}}]}]
21:23:04.652 d uses: [{epoch,466},{author,d},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,4}}}]}]
21:23:04.660 c uses: [{epoch,466},{author,d},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,4}}}]}]
21:23:04.679 a uses: [{epoch,467},{author,a},{upi,[a,d]},{repair,[c]},{down,[b]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,4}}}]}]
21:23:04.914 b uses: [{epoch,468},{author,b},{upi,[b]},{repair,[c,d]},{down,[a]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,4}}}]}]
21:23:05.058 d uses: [{epoch,469},{author,d},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,5}}}]}]
21:23:05.062 c uses: [{epoch,469},{author,d},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,5}}}]}]
21:23:05.081 a uses: [{epoch,470},{author,a},{upi,[a,d]},{repair,[c]},{down,[b]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,5}}}]}]
21:23:05.579 b uses: [{epoch,471},{author,b},{upi,[b]},{repair,[c,d]},{down,[a]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,5}}}]}]
21:23:05.581 d uses: [{epoch,472},{author,d},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,5}}}]}]
21:23:05.590 c uses: [{epoch,472},{author,d},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,5}}}]}]
21:23:05.885 a uses: [{epoch,473},{author,a},{upi,[a,d]},{repair,[c]},{down,[b]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,5}}}]}]
21:23:06.102 d uses: [{epoch,474},{author,d},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,6}}}]}]
21:23:06.159 c uses: [{epoch,474},{author,d},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,6}}}]}]
21:23:06.250 b uses: [{epoch,475},{author,b},{upi,[b]},{repair,[c,d]},{down,[a]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,6}}}]}]
21:23:06.288 a uses: [{epoch,476},{author,a},{upi,[a,d]},{repair,[c]},{down,[b]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,6}}}]}]
21:23:06.612 d uses: [{epoch,477},{author,d},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,6}}}]}]
21:23:06.620 c uses: [{epoch,477},{author,d},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,6}}}]}]
21:23:06.691 a uses: [{epoch,478},{author,a},{upi,[a,d]},{repair,[c]},{down,[b]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,6}}}]}]
21:23:06.893 b uses: [{epoch,479},{author,b},{upi,[b]},{repair,[c,d]},{down,[a]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,6}}}]}]
21:23:07.015 d uses: [{epoch,480},{author,d},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,7}}}]}]
21:23:07.022 c uses: [{epoch,480},{author,d},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,7}}}]}]
21:23:07.094 a uses: [{epoch,481},{author,a},{upi,[a,d]},{repair,[c]},{down,[b]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,7}}}]}]
21:23:07.516 d uses: [{epoch,482},{author,d},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,7}}}]}]
21:23:07.550 b uses: [{epoch,483},{author,b},{upi,[b]},{repair,[c,d]},{down,[a]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,7}}}]}]
{FLAP: c flaps 4}!
{FLAP: c flaps 5}!
21:23:07.898 a uses: [{epoch,484},{author,a},{upi,[a,d]},{repair,[c]},{down,[b]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,7}}}]}]
21:23:08.010 d uses: [{epoch,485},{author,d},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,8}}}]}]
21:23:08.013 c uses: [{epoch,485},{author,d},{upi,[a]},{repair,[b,d,c]},{down,[]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,8}}}]}]
21:23:08.221 b uses: [{epoch,486},{author,b},{upi,[b]},{repair,[c,d]},{down,[a]},{d,[{author_proc,react},{ps,[{a,b}]},{nodes_up,[b,c,d]}]},{d2,[{network_islands,[na_reset_by_always]},{hooray,{v2,{2014,11,8},{21,23,8}}}]}]
{FLAP: a flaps 5}!
{FLAP: a flaps 6}!
SET always_last_partitions OFF ... let loose the dogs of war!
21:23:17.349 b uses: [{epoch,495},{author,b},{upi,[b]},{repair,[c,d,a]},{down,[]},{d,[{author_proc,react},{ps,[]},{nodes_up,[a,b,c,d]}]},{d2,[{network_islands,[islands_not_supported]},{hooray,{v2,{2014,11,8},{21,23,17}}}]}]
So, the state of the chains at 21:23:11.221, three seconds after
the flapping detector finished, is:
epoch=484, UPI=[a,d], repair=[c], nodes_up=[a,c,d]
epoch=485, UPI=[a], repair=[b,d,c], nodes_up=[a,b,c,d]
epoch=486, UPI=[b], repair=[c,d], nodes_up=[b,c,d]
The UPIs are overlapping, derp, that won't work, thanks to the magic
of epoch version # enforcement, However, the clients need to concern
themselves with the repairing members, also. As soon as a client
in the epoch=486 sends an op to FLU c or FLU d, those nodes will
wedge themselves because they're in a different epoch. Everyone
will get stuck, and then life sucks.
Future work TBD!