Commit graph

310 commits

Author SHA1 Message Date
Mark Allen
4ce7a87d56 Remove merklet 2015-11-20 16:29:17 -06:00
UENISHI Kota
9f6b53fc15 Make bigger offset and a bit code cleanup 2015-11-18 14:38:04 +09:00
UENISHI Kota
e11cdfe95c Add stop and trim command to eqc_statem test on file_proxy 2015-11-18 14:11:25 +09:00
Shunichi Shinohara
ad419ada50 Refactoring, cosmetics, comments 2015-11-17 12:58:54 +09:00
Shunichi Shinohara
049311614f Fix process leak of CR clients and FLU1 proxy clients 2015-11-17 12:58:50 +09:00
UENISHI Kota
6786820401 Merge pull request #35 from basho/ku/making-file-proxy-spec
Add eqc trim tests to machi_file_proxy
2015-11-05 16:27:48 +09:00
UENISHI Kota
81fae32539 Remove unused test function 2015-11-05 16:19:46 +09:00
UENISHI Kota
ce41f9005e Fix machi_file_proxy_eqc:write_post to proper assertion 2015-11-05 14:48:35 +09:00
Shunichi Shinohara
9e4dc83f2a Add missing cleanup tasks, suppress some not-so-useful logs 2015-11-05 11:47:47 +09:00
UENISHI Kota
f56037240e Plan trim commands in eqc tests 2015-11-04 16:32:53 +09:00
UENISHI Kota
c1e5426034 Address PR comments 2015-11-04 16:08:09 +09:00
UENISHI Kota
3b087c0388 Add eqc trim tests to machi_file_proxy
* Add description on high client APIs
* Add notes to rethink high client specification
2015-11-04 16:02:29 +09:00
UENISHI Kota
62c8dacc65 Merge pull request #33 from basho/ss-repair-with-partition-simulator
Add test for append and repair with partition simulator
2015-11-04 14:44:04 +09:00
Scott Lystig Fritchie
30000d6602 Add doc/machi_chain_manager1_converge_demo.md 2015-11-03 00:27:09 +09:00
Mark Allen
3c5a9e6f53 Torture tests for merkle tree
1,000,000 entries - timings and size
2015-11-02 00:12:58 -06:00
Shunichi Shinohara
b5005c3526 Add EQC test case for AP mode repair w/ part. sim. 2015-10-30 10:25:51 +09:00
Mark Allen
7086899941 Reorg merkle tree code into a library
Was a service previously. Now contains both merklet
and the naive implementations.  Put construction
timing stuff into the test.

Tests are not truly meaningful yet.
2015-10-28 16:59:49 -05:00
UENISHI Kota
f7358424e4 Trim command and GC prototype implementation
* maybe_gc/2 is triggered at machi_file_proxy, when chunk is deleted
  and the file is larger than `max_file_size`
* A file is deleted if all chunks except 1024 bytes header are trimmed
* If a file is going to be deleted, file_proxy notifies metadata_mgr
  to remember the filename persistently, whose filename is
  `known_files_<FluName>`
* Such trimmed filenames are stored in a machi_plist file per flu
* machi_file_proxy could not be started if the filename is in the
  manager's list. Consequently, any write, read and trim operations
  cannot happen against deleted file.
* After the file was trimmed, any read request to the file returns
  `{error, trimmed}`
* Disclaimer: no tests written yet and machi_plist does not support
  any recovery from partial writes.
* Add some thoughts as comments for repairing trims.

* State diagram of every byte is as follows:

```
state\action| write/append   | read_chunk       | trim_chunk
------------+----------------+------------------+---------------
 unwritten  |  -> written    | fail (+repair)   | -> trimmed
 written    | noop or repair | return content   | -> trimmed
 trimmed    |  fail          | fail             | noop
```
2015-10-28 12:34:03 +09:00
Mark Allen
5e571f6009 Switch to merklet
Still a WIP
2015-10-27 16:33:18 -05:00
Mark Allen
1b8401e7de Initial smoke test 2015-10-27 11:57:38 -05:00
UENISHI Kota
8a61055f55 Support arbitrary bytes write by using find_(left|right)neighbor/2 2015-10-27 13:43:45 +09:00
UENISHI Kota
3d6d4d8be3 Do the slicing in flu server rather than in CR client 2015-10-23 18:49:49 +09:00
UENISHI Kota
0f688d6279 Update read_chunk() PB protocol to return trimmed chunks 2015-10-22 23:11:43 +09:00
Scott Lystig Fritchie
30d7e592a3 Merge pull request #20 from basho/ku/read-all-chunks
Allow reading multiple chunks at once
2015-10-21 15:28:10 +09:00
Scott Lystig Fritchie
1c8e436a64 Fix race #3 2015-10-21 15:01:11 +09:00
Scott Lystig Fritchie
9d177c6b54 Fix race #2 2015-10-21 14:45:21 +09:00
Scott Lystig Fritchie
976a701e0c Fix timeout problem in test/machi_proxy_flu1_client_test.erl 2015-10-21 14:31:58 +09:00
Scott Lystig Fritchie
981b55c070 Fix race #1 2015-10-21 14:31:41 +09:00
UENISHI Kota
a43397a7b8 Update to review comments 2015-10-21 10:58:00 +09:00
UENISHI Kota
ebb9bc3f5a Allow reading multiple chunks at once
* When repairing multiple chunks at once and any of its repair
  failed, the whole read request and repair work will fail
* Rename read_repair3 and read_repair4 to do_repair_chunks and
  do_repair chunk in machi_file_proxy
* This pull request changes return semantics of read_chunk(), that
  returns any chunk included in requested range
* First and last chunk may be cut to fit the requested range
* In machi_file_proxy, unwritten_bytes are removed and replaced by
  machi_csum_table
2015-10-20 17:59:09 +09:00
Scott Lystig Fritchie
6f9814ffb4 Merge ss/deps-for-debugging (with rebar.config conflict fix) 2015-10-19 16:41:03 +09:00
UENISHI Kota
3e975f53b8 Allow read_chunk() to return partial chunks
This is simply a change of read_chunk() protocol, where a response of
read_chunk() becomes list of written bytes along with checksum. All
related code including repair is changed as such. This is to pass all
tests and not actually supporting partial chunks.
2015-10-19 15:37:17 +09:00
Shunichi Shinohara
208c02853f Add cluster_info to deps and small callback module
For debuging from shell, some functions in machi_cinfo are exported:

- public_projection/1
- private_projection/1
- fitness/1
- chain_manager/1
- flu1/1
2015-10-19 15:36:05 +09:00
Scott Lystig Fritchie
00ac0f4cd3 Reduce compiler warnings and verbose output that clutters eunit test output 2015-10-16 17:41:01 +09:00
UENISHI Kota
6f790527f5 Follow with missing tests and related fix 2015-10-16 10:10:05 +09:00
UENISHI Kota
e45469b5ce Move checksum file related code to machi_csum_table 2015-10-15 11:28:40 +09:00
Mark Allen
baeffbab0b Merge pull request #6 from basho/mra/write-once-clean
Integrate write once invariant into current FLU implementation
2015-10-14 10:15:57 -05:00
Scott Lystig Fritchie
7439a2738d Work-around racy query of wedge_status in machi_cr_client_test 2015-10-14 16:28:01 +09:00
Scott Lystig Fritchie
8cd41a7bf2 Clean up projection-related tests in machi_proxy_flu1_client:api_smoke_test 2015-10-14 12:49:48 +09:00
Mark Allen
ec9682520a Fix tests with bad file names.
Either catch the {error, bad_arg} tuple or modify the file name to
conform to the machi conventions of prefix^uuid^seqno.
2015-10-13 21:13:12 -05:00
UENISHI Kota
e113f6ffdd Reach the trim stub to CR client 2015-10-13 17:25:59 +09:00
UENISHI Kota
dfe953b7d8 Add surface of trim to scrub 2015-10-13 17:14:44 +09:00
Scott Lystig Fritchie
2724960eaf TODO MARK: added clarification to test/machi_flu_psup_test.erl 2015-10-12 15:43:45 +09:00
Scott Lystig Fritchie
5131ebdd16 Change eunit expectations from change to using psup 2015-10-12 15:38:47 +09:00
Scott Lystig Fritchie
cbf773215e TODO MARK add comment for machi_cr_client_test:smoke_test2/0 failure 2015-10-12 15:29:54 +09:00
Scott Lystig Fritchie
8a8c4dcede Adapt machi_cr_client_test:smoke_test2/0 to change in FLU semantics: partial_write -> unwritten 2015-10-12 14:22:47 +09:00
Mark Allen
f3e6d46e36 Fix chain manager failures disabling active mode
The FLU psup starts the chain manager in active mode by default
(as it should for normal run-time operation.) By adding the
{active_mode, false} tuple to the options list, we can
tell the chain manager that it should be explicitly manipulated
during tests.
2015-10-11 23:05:44 -05:00
Mark Allen
da0b331936 WIP 2015-10-11 23:05:27 -05:00
Mark Allen
855f94925c Validate semantics on partial reads 2015-10-11 23:05:00 -05:00
Mark Allen
8187e01fe0 Use psup startup 2015-10-11 23:04:43 -05:00
Mark Allen
5926cef44a Make test start up more reliable 2015-10-08 15:49:22 -05:00
Mark Allen
d3fe7ee181 Pull write-once files over to clean branch
I am treating the original write-once branch as a prototype
which I am now throwing away. I had too much work interleved
in there, so I felt like the best thing to do would be to cut
a new clean branch and pull the files over and start over
against a recent-ish master.

We will have to refactor the other things in FLU in a more
piecemeal fashion.
2015-10-02 16:29:09 -05:00
Scott Lystig Fritchie
6425cca13f Fix broken eunit test 2015-09-21 21:44:03 +09:00
Scott Lystig Fritchie
69a304102e Write public proj in all_members order only 2015-09-21 15:09:16 +09:00
Scott Lystig Fritchie
83e878eb07 More verbosity, whee 2015-09-20 14:06:55 +09:00
Scott Lystig Fritchie
6b4ed1c061 Verbose debugging cruft 2015-09-19 14:25:07 +09:00
Scott Lystig Fritchie
72bfa163ba Small test bugfixes & verbose/debugging cruft 2015-09-19 14:16:54 +09:00
Scott Lystig Fritchie
5001406499 Add proplist-based configuration for TCP port and tmp dir for converge demo 2015-09-15 17:54:27 +09:00
Scott Lystig Fritchie
75c94420e0 Add test_ets_table to give programmatic slowdown 2015-09-14 22:52:41 +09:00
Scott Lystig Fritchie
b4f8bc8058 Add pretty_time(). Add CONFIRM verbose logging for none proj 2015-09-14 17:00:09 +09:00
Scott Lystig Fritchie
fdf78bdbbc Tweak IsRelevantToMe_p in B10 (more)
Last night we hit a rare case of failed convergence.

f was out of sync with the rest of the world.
f: upi=[b,g,f] repairing=[a,c]
The "rest of the world" used a larger chain at:
*: upi=[c,b,g,a], repairing=[f]

And f refused to join the larger chain because of the way that
IsRelevantToMe_p was being calculated before this commit.

Hrrrm, though, I'm not convinced that this particular problem
is fixed 100% by this patch.  What if the chain lengths were
the same but also UPI incompatible?  e.g. if I remove 'a' from
the "real world (in the partition simulator)" example above:

f: upi=[b,g,f] repairing=[c]
*: upi=[c,b,g], repairing=[f]

Hrmmmmm, I may need to reintroduce the my-recent-adopted-projection-
flapping-like-counter thingie to try to break this kind of
incompatible deadlock.
2015-09-14 13:40:34 +09:00
Scott Lystig Fritchie
4fba6c0d33 Adjust converge test conditions slightly 2015-09-13 21:07:54 +09:00
Scott Lystig Fritchie
04369673b0 MaxFiles static file deletion isn't good for make_zerf(). Add some no-partition scenarios 2015-09-13 16:59:08 +09:00
Scott Lystig Fritchie
f3a0ee91cf WIP: thread P_calc_current all the way to C100 for CP mode assist 2015-09-13 15:58:45 +09:00
Scott Lystig Fritchie
5efec1b6cd Add upi_unanimous annotation to AP mode 2015-09-11 21:47:05 +09:00
Scott Lystig Fritchie
68f1ff68ee Bugfix: broken eunit test 2015-09-11 17:52:40 +09:00
Scott Lystig Fritchie
a0c129c16d Bugfix: wow, a chain state transition sanity check bug 2015-09-11 17:32:52 +09:00
Scott Lystig Fritchie
8df7d58365 Add partition simulator support to fitness service 2015-09-11 16:45:29 +09:00
Scott Lystig Fritchie
41737ae62a Add delete_admin_down API implementation, oops! 2015-09-10 18:05:18 +09:00
Scott Lystig Fritchie
d45c249e89 Add admin down status API to fitness server 2015-09-10 17:30:11 +09:00
Scott Lystig Fritchie
c14b9ce50f Minor cleanup, add more partitions to converge demo 2015-09-10 16:39:15 +09:00
Scott Lystig Fritchie
b7aa33c617 Yeah, nearly there. AP fails occasionally in multiple-asymmetric-partition sequence 2015-09-09 23:10:39 +09:00
Scott Lystig Fritchie
7af863d840 Add stubs of machi_fitness server 2015-09-08 16:13:07 +09:00
Scott Lystig Fritchie
185c9eb313 WIP: add failing eunit placeholder for spam 2015-09-07 15:38:23 +09:00
Scott Lystig Fritchie
4376ce9ec1 Remove all flap counting and inner projection stuff 2015-09-04 17:17:49 +09:00
Scott Lystig Fritchie
2e2f5f44c4 Another tweak to private_projections_are_stable() 2015-09-01 00:51:12 +09:00
Scott Lystig Fritchie
823b47bef3 Bugfix: convergence property for CP mode, again 2015-08-30 19:52:31 +09:00
Scott Lystig Fritchie
764708f3ef Fix private_projections_are_stable() for long CP mode chains 2015-08-30 00:03:51 +09:00
Scott Lystig Fritchie
94394d3429 Bugfix: allow none proj to re-emerge from flapping (more)
See comments added in this commit at A40.

So far, I've been doing CP mode testing with a handful of (very useful)
network partition combinations using:

    machi_chain_manager1_converge_demo:t(3, [{private_write_verbose,true}, {consistency_mode, cp_mode}, {witnesses, [a]}]).

Next steps:

* Expand number & types of partitions
* Expand to chain lengths of 5 and beyond
2015-08-29 21:36:53 +09:00
Scott Lystig Fritchie
ee19a0856b WIP: justincase 2015-08-29 19:59:46 +09:00
Scott Lystig Fritchie
dc5ae4047a Bugfix: react_to_env_A30 inner->norm fix, make_zerf() none proj derp fix 2015-08-29 18:01:13 +09:00
Scott Lystig Fritchie
85eb3567a3 Bugfix: convergence property for CP mode 2015-08-29 15:57:23 +09:00
Scott Lystig Fritchie
403cb5b7a6 WIP: improvements, but now flapping inner epoch keeps increasing {sigh} 2015-08-28 21:13:54 +09:00
Scott Lystig Fritchie
3dfe5c2677 WIP: fix annotation history on disk 2015-08-28 18:37:11 +09:00
Scott Lystig Fritchie
12b74a52fd WIP: pre-dinner paranoid checkin 2015-08-27 18:45:27 +09:00
Scott Lystig Fritchie
28335a1310 Add CP mode unwedge. All eunit tests are passing again. 2015-08-26 18:47:39 +09:00
Scott Lystig Fritchie
e8f3ab381d Add set_consistency_mode() to projection store API, use it 2015-08-26 14:57:51 +09:00
Scott Lystig Fritchie
833463f20d Merge branch 'master' into slf/chain-manager/cp-mode4 2015-08-26 14:39:42 +09:00
Scott Lystig Fritchie
27656eafaa Fix (via sleep, egadz) race condition in machi_flu_psup_test 2015-08-26 14:38:56 +09:00
Scott Lystig Fritchie
c12231c7b6 Fix other tests to accomodate new semantics 2015-08-25 19:45:31 +09:00
Scott Lystig Fritchie
c0ee323637 Our new unit test works, yay 2015-08-25 19:42:33 +09:00
Scott Lystig Fritchie
83f49472db WIP: intermediate refactoring 2015-08-25 19:31:05 +09:00
Scott Lystig Fritchie
0a4c0f963e Add failing test case for annotating private projections via dbg2 list 2015-08-25 19:12:23 +09:00
Scott Lystig Fritchie
1c5a17b708 WIP: adjust throttle of flapping 'shut up' 2015-08-25 17:01:14 +09:00
Scott Lystig Fritchie
9a86453753 WIP: half-baked idea, stopping for the night (more)
So, I'm 50% sure this is a good idea for CP mode: if there's
a later public projection than P_current, then who knows what
we might have missed.  So, call make_zerf() to find out the
absolute latest.  Problem: flapping state appears to be lost,
booo.
2015-08-24 21:54:30 +09:00
Scott Lystig Fritchie
2f82fe0487 WIP: cp_mode improvements 2015-08-24 19:04:26 +09:00
Scott Lystig Fritchie
f6e81e6cd0 Add damper check for flapping of *inner* projections, whee! 2015-08-23 20:01:44 +09:00
Scott Lystig Fritchie
2b2facaba2 Add more FLU choices to converge demo 2015-08-22 14:56:26 +09:00
Scott Lystig Fritchie
14fad2d704 End-to-end chain state checking is still broken (more)
If we use verbose output from:

    machi_chain_manager1_converge_demo:t(3, [{private_write_verbose,true}, {consistency_mode, cp_mode}, {witnesses, [a]}]).

And use:

    tail -f typescript_file | egrep --line-buffered 'SET|attempted|CONFIRM'

... then we can clearly see a chain safety violation when moving from
epoch 81 -> 83.  I need to add more smarts to the safety checking,
both at the individual transition sanity check and at the converge_demo
overall rolling sanity check.

Key to output: CONFIRM by epoch {num} {csum} at {UPI} {Repairing}

    SET # of FLUs = 3 members [a,b,c]).
    CONFIRM by epoch 1 <<96,161,96,...>> at [a,b] [c]
    CONFIRM by epoch 5 <<134,243,175,...>> at [b,c] []
    CONFIRM by epoch 7 <<207,93,225,...>> at [b,c] []
    CONFIRM by epoch 47 <<60,142,248,...>> at [b,c] []
    SET partitions = [{c,b},{c,a}] (1 of 2) at {22,3,34}
    CONFIRM by epoch 81 <<223,58,184,...>> at [a,b] []
    SET partitions = [{b,c},{b,a}] (2 of 2) at {22,3,38}
    CONFIRM by epoch 83 <<33,208,224,...>> at [a,c] []
    SET partitions = []
    CONFIRM by epoch 85 <<173,179,149,...>> at [a,c] [b]
2015-08-13 22:16:28 +09:00
Scott Lystig Fritchie
e956c0b534 Fix (yet again) converge demo stable criteria 2015-08-13 21:26:07 +09:00