Commit graph

225 commits

Author SHA1 Message Date
Scott Lystig Fritchie
d2f93e919e Single chain manager simulation test: no bad projection transitions! 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
057f958bb1 WIP: chain manager simulation test 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
410c8ff7ce WIP: chain manager simulation test 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
b8c87b23ad WIP: chain manager simulation test 2015-03-02 20:20:16 +09:00
Scott Lystig Fritchie
4ebc80dc39 Add src/machi_util.erl 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
a81552ed82 Makefile un-derp'ing 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
a5dc72834f Fix proj0_test for concuerror, yay! 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
4969e019b2 Fix proj0_test for concuerror, yay! 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
e50e669b79 TODO left off here 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
97c5789b44 WIP: eunit tests pass, but Concuerror loops forever then errs on max retries on proj0_test 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
f7447e8953 WIP: done (I hope) adding Lamport clocks 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
ee7bc2645b WIP: in the middle of adding Lamport clocks 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
b443a15542 register op name sanity: write and _read_ 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
921d90a69b WIP: enforce wedging and new projection writes 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
bebce51ab9 WIP: minimal write-once projection store in FLU 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
34c8c6490a WIP: add Name arg to start_link() 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
2d3a29471d Minimal FLU0 single register, plus Concuerror tests 2015-03-02 20:20:15 +09:00
Scott Lystig Fritchie
f378204a91 Add fledgling log implementation based on CORFU papers (corfurl stuff) 2015-03-02 20:20:07 +09:00
Scott Lystig Fritchie
370f70303d Merge branch 'merge/tango-prototype' 2015-03-02 20:07:25 +09:00
Scott Lystig Fritchie
94ebd4bb6f Rename prototype/tango-prototype -> prototype/tango 2015-03-02 20:06:45 +09:00
Scott Lystig Fritchie
3f3f3e4f5d Update README.tango.md with latest checkpoint implementation fix notes 2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie
c5ed355dac Rename tango readme 2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie
8da46f78fe BAH! Checkpoint is quite broken, see new README.tango.md 2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie
7bf98fa648 All tests pass, but checkpointing does not truncate history 2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie
fed2f43783 WIP: all but queue checkpointing now passes 2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie
0b3bb3ee7c WIP: tango_oid_test now passes 2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie
a0bb7ee23d WIP: tango_oid refactoring, all broken: infinite loop 2015-03-02 20:03:46 +09:00
Scott Lystig Fritchie
9a3ac02413 WIP: first round of tango_oid refactoring, all broken horribly 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
03f071316c Gadz, more sequencer cleanup. corfurl_test now passes 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
30fc62ab22 Gadz, more sequencer cleanup. corfurl_sequencer_test now passes 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
b8c051c89f Fix broken sequencer semantics.
It occurred to me today that I implemented the sequencer incorrectly and
hadn't yet noticed because I don't have any tests that are
complex/interleaved/perhaps-non-deterministic to find the problem.
The problem is that the sequencer's current implementation only keeps
track of the last LPN for any Tango stream.

The fix is to do what the paper actually says: the sequencer keeps a
*list* of the last $K$ LPNs for each stream.  Derp.  Yes, that's really
necessary to avoid a pretty simple race condition with 2 actors
simultaneously updating a single Tango stream.

1st commit: fix the implementation and the smoke test.  The
broken-everything-else will be repaired in later commits.
2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
940012cef1 Add checkpoint support for tango_dt_map 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
4cf8ac7ed8 Add checkpoint support for tango_dt_queue 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
970eb263db Fix bug in backpointer handling, derp! 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
004a18d948 Add checkpoint support for tango_dt_register 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
7b9c94553c Add skeleton support for single-page checkpointing 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
1c1e1368dd Added src/tango_dt_queue.erl plus test 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
6caeaeb6b5 Ha! Damn quick and easy to add tango_dt_map.erl 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
df53ec0a4e Refactor register DT into tango_dt.erl and tango_dt_register.erl 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
c068057c96 Add missing func corfurl_client:append_page/3, then fix tango_dt_register_test 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
cdeddbb582 Heh, demonstrate a concurrency bug that I knew was there, yay, fixit time! 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
18b38c249e First draft of tango_dt_register 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
6067e26201 Change semantics of OID map, silly me, to match what's needed 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
436c6ac14b Minor type fixup 2015-03-02 20:03:45 +09:00
Scott Lystig Fritchie
4fe4758d7a Generic parameterization of the map, done badly, part 1 2015-03-02 20:03:44 +09:00
Scott Lystig Fritchie
9c73872d20 Fix TEST vs PULSE tests 2015-03-02 20:03:44 +09:00
Scott Lystig Fritchie
e9f16d7b1b Dialyzer clean 2015-03-02 20:03:44 +09:00
Scott Lystig Fritchie
7878d954c1 Add dialyzer targets to Makefile ... time to get type serious 2015-03-02 20:03:44 +09:00
Scott Lystig Fritchie
be55d98bb5 Tango OID mapper put now passes basic unit test! 2015-03-02 20:03:44 +09:00
Scott Lystig Fritchie
52e2fa2edb Start WIP on tango_oid.erl 2015-03-02 20:03:41 +09:00
Scott Lystig Fritchie
c02d35821e Clean up tango_test.erl debugging cruft 2015-03-02 20:00:37 +09:00
Scott Lystig Fritchie
1184607bce Fix scan_backward with stopping LPN # 2015-03-02 20:00:37 +09:00
Scott Lystig Fritchie
1bb127eb65 Add scan_backward LPN limit + test 2015-03-02 20:00:37 +09:00
Scott Lystig Fritchie
c311a187ac Test refactoring 2 2015-03-02 20:00:37 +09:00
Scott Lystig Fritchie
9d2f494db0 Test refactoring 2015-03-02 20:00:37 +09:00
Scott Lystig Fritchie
c5b4bf8d7b Basic infrastructure and testing for Tango-style streams 2015-03-02 20:00:33 +09:00
Scott Lystig Fritchie
fe79df48b5 Add fledgling log implementation based on CORFU papers (corfurl stuff) 2015-03-02 19:59:01 +09:00
Scott Lystig Fritchie
2bf28122c1 Fix typos in docs/corfurl.md 2015-03-02 18:10:46 +09:00
Scott Lystig Fritchie
22f46c329d Add annoying & verbose TODO reminder for FILL implementation fixing! 2015-03-02 18:10:46 +09:00
Scott Lystig Fritchie
1c5e8d3726 Change env var BITCASK_PULSE -> USE_PULSE 2015-03-02 18:10:46 +09:00
Scott Lystig Fritchie
edd5b62563 del prototype/corfurl/README.old.md 2015-03-02 18:10:46 +09:00
Scott Lystig Fritchie
305cf34a2d Move old README.md -> README.old.md, create new README.md 2015-03-02 18:08:29 +09:00
Scott Lystig Fritchie
c9764bf5f6 Add new docs/corfurl/notes/README.md stuff
and also:

Add CORFU papers section
Merge corfurl.md and CONCEPTS.md
Add one more CORFU-related paper
Delete prototype/corfurl/docs/CONCEPTS.md
2015-03-02 18:08:29 +09:00
Scott Lystig Fritchie
8b105672b1 Bugfix for read-repair (thanks PULSE), model change to handle handle aborted writes 2015-03-02 18:08:29 +09:00
Scott Lystig Fritchie
b7b9255f5f Partial fix for bug in last commit, but not good enough 2015-03-02 18:08:29 +09:00
Scott Lystig Fritchie
6858041c7d See comments added by this commit for append_page() bug found, racing with epoch change 2015-03-02 18:08:29 +09:00
Scott Lystig Fritchie
40c28b79bb PULSE test now uses corfurl_client (retry logic) for all ops 2015-03-02 18:08:29 +09:00
Scott Lystig Fritchie
7ac1e7f178 Add retry loop for read_page/2, fill_page/2, trim_page/2 2015-03-02 18:08:29 +09:00
Scott Lystig Fritchie
1f0e43d33f Fix dumb think-o in corfurl_client:append_page() retry counter 2015-03-02 18:08:29 +09:00
Scott Lystig Fritchie
04f2105df0 Var renaming in corfurl_client:append_page() 2015-03-02 18:08:29 +09:00
Scott Lystig Fritchie
8df5326b0c Try to restart the sequencer only if it looks like nobody else has 2015-03-02 18:08:29 +09:00
Scott Lystig Fritchie
0b031bcf0a Change polling constants for to deal with PULSE's evil 2015-03-02 18:08:28 +09:00
Scott Lystig Fritchie
fb1216649c Finish very basic PULSE testing of stopping & restarting the sequencer 2015-03-02 18:08:28 +09:00
Scott Lystig Fritchie
63d1c93fc9 Fix silly-dumb errors in seal epoch comparisons 2015-03-02 18:08:28 +09:00
Scott Lystig Fritchie
96b561cde9 Fix broken EUnit tests 2015-03-02 18:08:28 +09:00
Scott Lystig Fritchie
d93572c391 Refactoring to implement stop_sequencer command 2015-03-02 18:08:24 +09:00
Scott Lystig Fritchie
d5091358ff Put the sequencer pid inside the projection 2015-03-02 18:06:52 +09:00
Scott Lystig Fritchie
a64a09338d Fix broken EUnit tests (been in PULSE land too long) 2015-03-02 18:06:48 +09:00
Scott Lystig Fritchie
20a2a51649 Partial fix (#2 of 2) for model problem in honest write-vs-trim race 2015-03-02 18:05:03 +09:00
Scott Lystig Fritchie
638a45e8cb Partial fix for model problem in honest write-vs-trim race 2015-03-02 18:05:03 +09:00
Scott Lystig Fritchie
eabebac6f2 Fix PULSE model difficulty of how to handle races between write & trim.
This trim race is (as far as I can tell) fine -- I see no correctness
problem with CORFU, on the client side or the server side.  However,
this race with a trim causes a model problem that I believe can be
solved this way:

1. We must keep track of the fact that the page write is happening:
someone can notice the write via read-repair or even a regular read by
the tail.  We do this in basically the way that all other writes
are handled in the ValuesR relation.

2. Add new code to client-side writer: if there's a trim race, *and*
if we're using PULSE, then return a special error code that says that
the write was ok *and* that we raced with trim.

2b. If we aren't using pulse, just return {ok, LPN}.

3. For the transition check property, treat the new return code as if
it is a w_tt.  Actually, we use a special marker atom, w_special_trimmed
for that purpose, but it is later treated the same way that w_tt is by the
filter_transition_trimfill_suffixes() filter.
2015-03-02 18:05:02 +09:00
Scott Lystig Fritchie
13e15e0ecf Add MSC charts to help explain BAD-looking trim race 2015-03-02 18:05:02 +09:00
Scott Lystig Fritchie
d077148b47 Attempt to fix unimplemented corner case, thanks PULSE! 2015-03-02 18:05:02 +09:00
Scott Lystig Fritchie
b7e3f91931 Add ?EVENT_LOG() to add extra trace info to corfurl and corfurl_flu 2015-03-02 18:05:02 +09:00
Scott Lystig Fritchie
479efce0b1 Make PULSE model aware of read-repair for 'error_trimmed' races
The read operation isn't a read-only operation: it can trigger
read-repair in the case where a hole is discovered.  The PULSE
model needs to be aware of this kind of thing.

Imagine that we have a 3-way race, between an append to LPN 1,
a read of LPN 1, and a trim of LPN 1.  There is a single chain
of length 3.  The FLUs in the chain are annotated below with
"F1", "F2", and "F3".  Note also the indentation levels, with
F1's indented is smaller than F2's << F3's.

 2,{call,<0.8748.3>,{append,<<0>>,will_be,1}}},
 4,{call,<0.8746.3>,{read,1}}},
 6,{call,<0.8747.3>,{trim,1,will_fail,error_unwritten}}},

 6, Read has contacted tail of chain, it is unwritten.  Time for repair.
 6,{read_repair,1,[<0.8741.3>,<0.8742.3>,<0.8743.3>]}},

 6,  F1:{flu,write,<0.8741.3>,1,ok}},
 7,  F1:{flu,trim,<0.8741.3>,1,ok}},  % by repair

 9,{read_repair,1,fill,<0.8742.3>}},

 9,          F2:{flu,trim,<0.8742.3>,1,error_unwritten}},

 9,{read_repair,1,<0.8741.3>,trimmed}},

10,{result,<0.8747.3>,error_unwritten}},
   Trim operation from time=6 stops here

10,          F2:{flu,write,<0.8742.3>,1,ok}},
11,          F2:{flu,fill,<0.8742.3>,1,error_overwritten}},

12,                  F3:{flu,write,<0.8743.3>,1,ok}},

12,{read_repair,1,fill,<0.8742.3>,overwritten,try_trim}},

13,{result,<0.8748.3>,{ok,1}}}, % append/write to LPN 1

13,          F2:{flu,trim,<0.8742.3>,1,ok}},

14,{read_repair,1,fill,<0.8743.3>}},
15,                  F3:{flu,fill,<0.8743.3>,1,error_overwritten}},

16,{read_repair,1,fill,<0.8743.3>,overwritten,try_to_trim}},
17,                  F3:{flu,trim,<0.8743.3>,1,ok}},

18,{result,<0.8746.3>,error_trimmed}}]
2015-03-02 18:05:02 +09:00
Scott Lystig Fritchie
a7dd78d8f1 Switch to Lamport clocks for PULSE verifying 2015-03-02 18:04:59 +09:00
Scott Lystig Fritchie
5420e9ca1f Bugfix for read repair: if trimmed, try fill first then trim 2015-03-02 18:03:10 +09:00
Scott Lystig Fritchie
88d44722be Fix PULSE model bug of adding multiple same values to orddict 2015-03-02 18:03:10 +09:00
Scott Lystig Fritchie
8ec5f04903 Bug: PULSE found a way to reach a 'left_off_here' corner case, sweet 2015-03-02 18:03:10 +09:00
Scott Lystig Fritchie
e40394a3a7 Bugfix: yet another race in read_repair, sweet 2015-03-02 18:03:10 +09:00
Scott Lystig Fritchie
370c57b78a Bug: corfurl:read_repair_chain() should use trim when it encounters error_trimmed 2015-03-02 18:03:09 +09:00
Scott Lystig Fritchie
fd32bcb308 Fix PULSE model to accomodate API change from previous commit.
Now 1+ trim & fill transitions are collapsed to a single 'w_t+' atom.
The atom name is a bit odd; think about regexps and it hopefully
makes sense.
2015-03-02 18:03:09 +09:00
Scott Lystig Fritchie
431827f65e Allow racing trim/fill and read-repair to simply "win".
This exposes a bug in the PULSE model, now that we can have multiple
successful fill/trim for the same LPN.
2015-03-02 18:03:09 +09:00
Scott Lystig Fritchie
5edee3a2cf Don't bother adding 2 when picking an LPN for fill & trim 2015-03-02 18:03:09 +09:00
Scott Lystig Fritchie
d2562588f2 Move the lists:reverse() in make_chains() to preserve input's order in the output 2015-03-02 18:03:09 +09:00
Scott Lystig Fritchie
e791876212 Fix silly model error when calculating values 2015-03-02 18:03:09 +09:00
Scott Lystig Fritchie
f5c4474669 Derp, turn off TRIP_no_append_duplicates 2015-03-02 18:03:09 +09:00
Scott Lystig Fritchie
b3ed9ef51c Add fill checking to PULSE model, minimal API coverage is complete 2015-03-02 18:03:09 +09:00
Scott Lystig Fritchie
7a46709c13 Change transition type names to make better invalid transition detection 2015-03-02 18:03:09 +09:00
Scott Lystig Fritchie
8a56771182 Add better condition for perhaps_trip_fill_page() 2015-03-02 18:03:09 +09:00
Scott Lystig Fritchie
db6fa3d895 Fix two bugs found by PULSE in corfurl_flu.erl, yay! 2015-03-02 18:03:09 +09:00
Scott Lystig Fritchie
86d4583aef Add fill support to the PULSE model 2015-03-02 18:03:09 +09:00
Scott Lystig Fritchie
7dba8beae9 Refactor PULSE test for easier checking, prior to adding fill & trim. 2015-03-02 18:03:09 +09:00
Scott Lystig Fritchie
78019b402f Refactor the PULSE model testing error 'trip' code 2015-03-02 18:03:09 +09:00
Scott Lystig Fritchie
c80921de25 Add scan_forward() command, no result checking yet 2015-03-02 18:03:09 +09:00
Scott Lystig Fritchie
fb6b1cdc3c Fix read_page() model problem: no more false positives! 2015-03-02 18:03:09 +09:00
Scott Lystig Fritchie
e9851767fc Add read_page() temporal check 2015-03-02 18:03:09 +09:00
Scott Lystig Fritchie
a7aff2f141 Dumbdumbdumb don't interfere with event_logger:event() duh! 2015-03-02 18:03:09 +09:00
Scott Lystig Fritchie
c14e1facf4 Add read_approx() to the PULSE model, only 5% correctness checks done 2015-03-02 18:03:08 +09:00
Scott Lystig Fritchie
572d1803d0 Add (mostly) temporal logic checking for exactly-once append_page().
Also, for peace of mind (I hope), I've added this -ifndef to introduce
a bug that should cause the new exactly-once append_page() check to fail.
This should make it easier to change the model and *TEST* the changes,
to avoid breaking the model without ever knowing it.
2015-03-02 18:03:08 +09:00
Scott Lystig Fritchie
25bf64a03c Just in case commit: WIP 2015-03-02 18:03:08 +09:00
Scott Lystig Fritchie
58ced8d14c Add PULSE control over sequencer handing out duplicate page numbers 2015-03-02 18:03:08 +09:00
Scott Lystig Fritchie
21a3fd6d07 Use temporal logic for check_trace() 2015-03-02 18:03:08 +09:00
Scott Lystig Fritchie
e0ec95e8f7 Added small PULSE usage sketch in docs/corfurl.md 2015-03-02 18:03:08 +09:00
Scott Lystig Fritchie
b430fa479c PULSE condition checking is only 98% embarassing 2015-03-02 18:03:08 +09:00
Scott Lystig Fritchie
bcc6cf1e6a PULSE bugfix: race with finish_init message 2015-03-02 18:03:08 +09:00
Scott Lystig Fritchie
a294a0eff0 Skeleton of PULSE test created, first bug (race in sequencer init) is found, huzzah! 2015-03-02 18:03:05 +09:00
Scott Lystig Fritchie
feed231d5e Move EUnit test code to test subdir 2015-03-02 17:59:31 +09:00
Scott Lystig Fritchie
3963ce44f0 More sanity checking for fill() in smoke test 2015-03-02 17:57:31 +09:00
Scott Lystig Fritchie
3d2be7255f Basic smoke test for read repair 2015-03-02 17:57:31 +09:00
Scott Lystig Fritchie
6014b0584e Fix read() response to a prior fill 2015-03-02 17:57:31 +09:00
Scott Lystig Fritchie
c23aeabc20 Read-repair, not tested 2015-03-02 17:57:30 +09:00
Scott Lystig Fritchie
945635f837 Basic scan_forward done 2015-03-02 17:57:30 +09:00
Scott Lystig Fritchie
05a71eebb0 corfurl:read_page() done, no read-repair yet 2015-03-02 17:57:30 +09:00
Scott Lystig Fritchie
72bf329e1c Add fledgling log implementation based on CORFU papers 2015-03-02 17:57:27 +09:00