Add test for append and repair with partition simulator #33

Merged
shino merged 6 commits from ss-repair-with-partition-simulator into master 2015-11-04 05:44:04 +00:00
shino commented 2015-10-30 01:57:57 +00:00 (Migrated from github.com)

This PR adds EQC (statem) test for file operations and repair under
simulated network partition.

The main purpose is to confirm no dataloss, i.e. every chunk that has
been successfully written (ACK received) by append/write opration will
be read after partition heals.


Test sometimes fails, following cases should be investigated respectively.

  1. bad_return_value from machi_cr_client. Probably just code bug.
    Output: https://gist.github.com/shino/6e917e1afc26a83cb01e
  2. machi_cr_client:append_chunk responds ok but {Offset, Length, Key}
    is the same one before witten. Maybe test code bug, maybe around simulator,
    maybe actual code... don't know yet...
    Output: https://gist.github.com/shino/f38140fbe1c4da8e8fa4
    (full version, long. 59a84638cd/gistfile1.txt)
  3. read_chunk for written chunk returns different binary from the one when
    it was written. No idea at the moment
    Output: https://gist.github.com/shino/011f3d3e3f4028093d58#file-gistfile1-txt-L10569-L10580
This PR adds EQC (statem) test for file operations and repair under simulated network partition. The main purpose is to confirm no dataloss, i.e. every chunk that has been successfully written (ACK received) by append/write opration will be read after partition heals. --- Test sometimes fails, following cases should be investigated respectively. 1. `bad_return_value` from `machi_cr_client`. Probably just code bug. Output: https://gist.github.com/shino/6e917e1afc26a83cb01e 2. `machi_cr_client:append_chunk` responds ok but `{Offset, Length, Key}` is the same one before witten. Maybe test code bug, maybe around simulator, maybe actual code... don't know yet... Output: https://gist.github.com/shino/f38140fbe1c4da8e8fa4 (full version, long. https://gist.githubusercontent.com/shino/335e09373da5b46bb2c8/raw/59a84638cd916940f3e67af2c6a4b15eb4e551a9/gistfile1.txt) 3. `read_chunk` for written chunk returns different binary from the one when it was written. No idea at the moment Output: https://gist.github.com/shino/011f3d3e3f4028093d58#file-gistfile1-txt-L10569-L10580
jadeallenx commented 2015-10-30 05:57:35 +00:00 (Migrated from github.com)

This looks very cool, @shino! Looking forward to running it tomorrow. 🍰

This looks very cool, @shino! Looking forward to running it tomorrow. :cake:
slfritchie commented 2015-11-02 02:54:24 +00:00 (Migrated from github.com)

Bug #36 is an important one to fix. My guess is that it addresses Shino's description of number two and number three? I have not looked at the first one yet ... and probably will not have time today, sorry! (Need to finish Ricon-related stuff and pack for travel tomorrow.)

Bug #36 is an important one to fix. My guess is that it addresses Shino's description of number two and number three? I have not looked at the first one yet ... and probably will not have time today, sorry! (Need to finish Ricon-related stuff and pack for travel tomorrow.)
shino commented 2015-11-02 03:03:09 +00:00 (Migrated from github.com)

Agree > #36 will fix 2. and 3.

For the first one bad_return_value, quick fix is at 903e9395c8, just reply tag missing for handle_call return. I will create another PR with small eunit unless too difficult 😅

Agree > #36 will fix 2. and 3. For the first one `bad_return_value`, quick fix is at https://github.com/basho/machi/commit/903e9395c8a32af8ae7440ab8d0fb4d170c93cfd, just `reply` tag missing for `handle_call` return. I will create another PR with small eunit unless too difficult :sweat_smile:
slfritchie commented 2015-11-02 08:44:44 +00:00 (Migrated from github.com)

+1 from me (just one small change request) but I have been in a hurry today. @mrallen1 and/or @kuenishi, could you please review also?

Also/otherwise, I'm thrilled that this is finding bugs even before the PR is merged. ^_^

+1 from me (just one small change request) but I have been in a hurry today. @mrallen1 and/or @kuenishi, could you please review also? Also/otherwise, I'm thrilled that this is finding bugs even before the PR is merged. `^_^`
kuenishi commented 2015-11-02 09:12:57 +00:00 (Migrated from github.com)

I'll try a review on Wednesday unless Mark takes while we're in holiday.

I'll try a review on Wednesday unless Mark takes while we're in holiday.
shino commented 2015-11-02 09:32:16 +00:00 (Migrated from github.com)

@slfritchie Thanks for review! I pushed the commit to address your comment.

For the first one bad_return_value, quick fix is at 903e939, just reply tag missing for handle_call return. I will create another PR with small eunit unless too difficult

My expectation was wrong and somewhat deeper. Opened Separate issue #39.

@slfritchie Thanks for review! I pushed the commit to address your comment. > For the first one `bad_return_value`, quick fix is at 903e939, just reply tag missing for handle_call return. I will create another PR with small eunit unless too difficult My expectation was wrong and somewhat deeper. Opened Separate issue #39.
kuenishi commented 2015-11-04 02:55:38 +00:00 (Migrated from github.com)

I often see (not always) this error:

Reason:
  exception:
    exit({critical_error,
            {doublewrite_diff,
               {1084, 10,
                <<112, 114, 101, 94, 48, 50, 51, 51, 98, 98, 51, 54, 45, 101,
                  54, 100, 101, 45, 52, 53, 55, 97, 45, 98, 98, 50, 50, 45, 50,
                  102, 97, 52, 97, 52, 49, 51, 56, 51, 51, 52, 94, 49>>},
               {<<197, 51, 119, 192, 169, 161, 98, 56, 134, 187>>,
                <<238, 54, 176, 107, 52, 147, 92, 85, 36, 126>>}}})
      in machi_ap_repair_eqc:append/3 (test/machi_ap_repair_eqc.erl:147)
         eqc_statem:run_parallel_commands/2 (../src/eqc_statem.erl:1449)
         machi_ap_repair_eqc:-prop_repair_par/1-fun-2-/3 (test/machi_ap_repair_eqc.erl:260)
machi_ap_repair_eqc:89: prop_repair_par_test_...*failed*
in function machi_ap_repair_eqc:'-prop_repair_par_test_/0-fun-1-'/3 (test/machi_ap_repair_eqc.erl, line 92)

It pretty look like the 3rd issue but not sure.

I often see (not always) this error: ``` erlang Reason: exception: exit({critical_error, {doublewrite_diff, {1084, 10, <<112, 114, 101, 94, 48, 50, 51, 51, 98, 98, 51, 54, 45, 101, 54, 100, 101, 45, 52, 53, 55, 97, 45, 98, 98, 50, 50, 45, 50, 102, 97, 52, 97, 52, 49, 51, 56, 51, 51, 52, 94, 49>>}, {<<197, 51, 119, 192, 169, 161, 98, 56, 134, 187>>, <<238, 54, 176, 107, 52, 147, 92, 85, 36, 126>>}}}) in machi_ap_repair_eqc:append/3 (test/machi_ap_repair_eqc.erl:147) eqc_statem:run_parallel_commands/2 (../src/eqc_statem.erl:1449) machi_ap_repair_eqc:-prop_repair_par/1-fun-2-/3 (test/machi_ap_repair_eqc.erl:260) machi_ap_repair_eqc:89: prop_repair_par_test_...*failed* in function machi_ap_repair_eqc:'-prop_repair_par_test_/0-fun-1-'/3 (test/machi_ap_repair_eqc.erl, line 92) ``` It pretty look like the 3rd issue but not sure.
shino commented 2015-11-04 03:50:19 +00:00 (Migrated from github.com)

I often see (not always) this error:

That is 2nd one. Please refer #36 , which includes temporary(?) patch.

> I often see (not always) this error: That is 2nd one. Please refer #36 , which includes temporary(?) patch.
kuenishi commented 2015-11-04 05:18:20 +00:00 (Migrated from github.com)

+1 but please address one small comment before merge. Great job!

+1 but please address one small comment before merge. Great job!
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: greg/machi#33
No description provided.