Commit graph

359 commits

Author SHA1 Message Date
Gregory Burd
af4d2ba4d8 Merge branch 'master' of github.com:basho/hanoidb 2012-06-11 23:41:49 +01:00
Gregory Burd
ee89f93009 Use a uniform method to calculate bytes that can manage lists as well. 2012-06-11 23:40:28 +01:00
Gregory Burd
f4eea4a594 Whitespace and comment cleanup. 2012-06-11 23:40:07 +01:00
Gregory Burd
acbcf9d601 Return properly shaped tuple. 2012-06-11 23:39:04 +01:00
Gregory Burd
0e98543a84 Don't depend on basho_bench. 2012-06-06 18:17:11 +02:00
Kresten Krab Thorup
669f589d0c Implement smaller incremental merge steps
Right now, this is controlled by the macro
INC_MERGE_STEP in hanoidb_nursery; eventually
we should turn this into a configuration option.

Making this small, (minimum is 1), hurts average perf
but reduces the 99.9 percentile latency.
2012-05-11 22:31:23 +02:00
Kresten Krab Thorup
1b42172cbe Fix bug when merge result is an empty file
This happens when all entries are expired, or
if all entries would have been tombstones.
2012-05-11 14:58:53 +02:00
Kresten Krab Thorup
61720065d9 Add some doc/spec 2012-05-11 12:33:33 +02:00
Kresten Krab Thorup
fbda7af576 Honor expiry_secs == 0
When this is the case, we use the old on-disk
encoding to shave 4 bytes off every entry.
2012-05-11 12:30:07 +02:00
Kresten Krab Thorup
181b1debb8 Update basho_bench driver to recognize config flags
Also, add default values to template configuration
2012-05-11 12:29:16 +02:00
Kresten Krab Thorup
b6955c9a75 Refactor for expiry_secs option
Tree nodes now hold entries at the form

   {Key, ?TOMBSTONE 
       | BinValue
       | {?TOMBSTONE, TStamp}
       | {BinValue, TStamp}}

We use the form without TStamp when expiry_secs
is unset or set to 0 (i.e., values don't expire).

merger/writer: Move KV count into writer, because
now the writer:add determines if a value is expired
and thus wither a value is actually written.  Thus,
writer now has a new API function which returns the
KV count written so far.

reader: lookup/fold API hides the TStamp tuples,
so only the next_node API used by the merger
is exposed to these {Key, {_, TStamp}} entries.

nursery: like reader, the TStamp'ed tuples are
not exposed in the client API; expired values
are simply not returned from fold/lookup.

hanoidb: add config option {expiry_secs, N}.

other modules: Make sure that config is passed
all the way down through (sub) processes to be
able to utilize the config option everywhere.

test: update to work with new option.
2012-05-11 12:00:32 +02:00
Kresten Krab Thorup
245d815e4c Add magic "HAN1" to btree file format
BREAKING CHANGE!  This change provides for future
file format changes, but also breaks backwards
compatibility.

Also describe the file format in design_document
2012-05-08 17:27:05 +02:00
Kresten Krab Thorup
f7681da1db Add some more current design diagrams 2012-05-08 00:02:08 +02:00
Kresten Krab Thorup
7622b4e4b8 Implement concurrent GETs
With this change, GETs will flow concurrently
down through the level controllers, replying
directly to the caller via gen_server:reply.
Very actor-like :-)
2012-05-08 00:00:10 +02:00
Kresten Krab Thorup
c58b627661 First steps towards intelligent CRC error handling
Current code base silently ignores CRC errors,
meaning that KVs that have errors will just
disappear, or may show up as a previously stored
value for the same key.
2012-05-07 23:58:44 +02:00
Kresten Krab Thorup
14dd00ad12 Fix some doc strings to make edoc happy 2012-05-07 22:53:05 +02:00
Kresten Krab Thorup
3abc189680 Remove unused files 2012-05-07 17:27:17 +02:00
Kresten Krab Thorup
4b2c937be9 chmod +x enable-hanoidb 2012-05-07 17:25:54 +02:00
Kresten Krab Thorup
e315b92faf Rename hanoi -> hanoidb 2012-05-07 17:22:55 +02:00
Kresten Krab Thorup
3f5a8a7792 New todo items 2012-05-07 14:57:58 +02:00
Kresten Krab Thorup
ab6b974830 Fix list_buckets
List buckets was blocking riak, because file-level
range folds sends too many values. This makes
range fold stable towards such cases.
2012-05-07 01:48:48 +02:00
Kresten Krab Thorup
6b6f4417c1 Ensure fold limits are honored 2012-05-07 01:09:08 +02:00
Kresten Krab Thorup
9a7b9eb29f Simplify life cycle for fold workers
With this change, the fold worker does not
link to the receiver; now it simply monitors
the receiving process.  If the receiver dies,
the fold worker dies normally.  

The individual fold processes running on level
files are linked to the fold worker; so between
fold merge worker and those, normal link/kill
applies.
2012-05-07 00:14:33 +02:00
Kresten Krab Thorup
fb67fed456 Rename 'die' message from hanoi -> fold_worker
More appropriately now named shutdown
2012-05-06 22:42:10 +02:00
Kresten Krab Thorup
b26f612a1f Rename plain_rpc:send_cast to just cast
Also introduce plain_rpc:call/3 (last arg is timeout)
2012-05-06 22:41:01 +02:00
Kresten Krab Thorup
0066b19c80 Simplify call sequence for fold
Now hanoi:fold_range/4 creates the fold worker;
which makes the callee and not the hanoi main
gen_server be ancestor for the worker.
2012-05-06 22:39:10 +02:00
Kresten Krab Thorup
5b88a71e1d Simplify riak_kv_hanoi_backend:is_empty/1
Do range fold with limit=1, rather than throw
exception.
2012-05-06 19:49:53 +02:00
Kresten Krab Thorup
0b8d035bda Handle premature eof in tree reader 2012-05-06 19:48:38 +02:00
Kresten Krab Thorup
2c195da15e Let the "fast" merge strategy be the default 2012-05-06 11:46:20 +02:00
Kresten Krab Thorup
9bbc6194d9 Remove riak_kv from rebar.config
Needed for running backend_eqc, but makes
rebar go bazongo on cyclic dependencies.
2012-05-05 21:53:33 +02:00
Kresten Krab Thorup
f9b7fcf224 Implement hanoi:destroy/1
Also riak_kv_hanoi_backend:drop/1 (The latter does
hanoi:destroy and then re-opens the same store).
2012-05-05 21:14:15 +02:00
Kresten Krab Thorup
96c5ec74c3 Minor changes to test code 2012-05-05 18:56:03 +02:00
Kresten Krab Thorup
8f1600b41a Fold back pressure, step 2
This makes fold-from-snapshot use the back pressure
model of doing plain_rpc:call to the merge worker
delivering chunks of 100 KVs.

The back pressure is entirely internal to hanoi,
designed to ensure that the process that merges
fold results from the individual levels is not
swamped with fold data.

Folds with a limit < 10 still do "blocking fold"
which is more efficient and uses fewer FDs, but 
blocks concurrent put/get operations.
2012-05-05 18:53:02 +02:00
Kresten Krab Thorup
49afbbc411 Add initial merge work when opening level
When re-opening a hanoi database, issue some
initial merge work to make sure that there is
room for future inserts.
2012-05-05 18:47:42 +02:00
Kresten Krab Thorup
ec55a38c42 fold from file counting
When folding, don't count tombstones towards
the folding limit.  Merge worker needs to be
able to progress to LIMIT number of non-deleted
entries.
2012-05-05 13:46:03 +02:00
Kresten Krab Thorup
f821b38ea2 Fold back pressure, step 1
This first step of the fold back pressure impl
changes fold worker so that it does not get
flooded by messages.  Now, we take messages
and put them in queues (one per fold source),
so we don't have to do selective receive on
bazillions of messages.
2012-05-05 12:53:17 +02:00
Kresten Krab Thorup
bfb8f3f783 Redo fix for exceptions in fold operations 2012-05-02 17:15:13 +02:00
Kresten Krab Thorup
41cc2a9196 Add riak_kv as dependency
Needed to run backend_eqc tests
2012-05-02 17:13:58 +02:00
Kresten Krab Thorup
68114bdbff Fix folding deleted entries
There was a couple of bugs found by Triq, which
exhibited bugs in folding.
2012-05-02 17:13:03 +02:00
Kresten Krab Thorup
70fc4030f6 Enable quick check for backend tests
Actual QC tests come from riak_kv
2012-05-02 17:11:08 +02:00
Kresten Krab Thorup
5e3417f9d6 Reenable quick check tests 2012-05-01 16:27:39 +02:00
Kresten Krab Thorup
c8964e955c Config option {merge_strategy, fast|predictable}
Both options have same log2(N) upper bound on
latencies, but `fast' fluctuates.
2012-05-01 16:27:06 +02:00
Kresten Krab Thorup
01ea88b67c Implement hibernation for readers too
This enables all open files in a merge worker
to be closed while it is waiting for work to do.
2012-05-01 02:12:02 +02:00
Kresten Krab Thorup
c998e8ca31 Fix merge work for opening a HanoiDB 2012-05-01 02:10:13 +02:00
Kresten Krab Thorup
c8e403af8c Refactor merge work computation
Now merge work computation is close to ideal.
It does not take into account the actual size
of files at each level, but we have not figured
out how to utilize that knowledge.
2012-04-30 23:34:27 +02:00
Kresten Krab Thorup
380a4f9cfc Redo work load computation
The simplistic approach has a race condition.
This works for now, albeit still issuing too
much work.
2012-04-30 22:44:21 +02:00
Kresten Krab Thorup
be507c0e13 Syntax error 2012-04-30 21:59:55 +02:00
Kresten Krab Thorup
0009e17d4f Change delegate work computation
We were delegating too much work.  The original
algorithm description said that for each insert,
"1" unit of merge work has to be done 
*at each level* … implying that if nothing needs
doing at a level, that "not done work" does not 
add to work done elsewhere. This fix gets us back
to that situation (by always subtracting at least
2^TOP_LEVEL from the presented work amount), while
maintaining the (beneficial) effect of chunking
merge work at at anything but the last level.

Effectively, this reduces the maximum amount of
merge work done, also reducing our worst case
latency.

Now that we understand this, we can refactor the
algorithm to delegate "DoneWork", because then
each level can determine the total work, and see
if any work is left "for me".  That's next.
2012-04-30 21:38:53 +02:00
Kresten Krab Thorup
74686b1380 Implement merge hibernation for tail scan
When scanning just one file (because all it's keys
are after the ones in the other file), we also
can need hibernation to save memory.  Especially
the bloom filters being built take a lot of mem.
2012-04-30 21:28:33 +02:00
Kresten Krab Thorup
6ce7101506 Correct step counts in merger
Merge was progressing too fast. This corrects
the progress house keeping in processing
merge work.
2012-04-30 19:28:20 +02:00