Commit graph

384 commits

Author SHA1 Message Date
Kresten Krab Thorup
0066b19c80 Simplify call sequence for fold
Now hanoi:fold_range/4 creates the fold worker;
which makes the callee and not the hanoi main
gen_server be ancestor for the worker.
2012-05-06 22:39:10 +02:00
Kresten Krab Thorup
5b88a71e1d Simplify riak_kv_hanoi_backend:is_empty/1
Do range fold with limit=1, rather than throw
exception.
2012-05-06 19:49:53 +02:00
Kresten Krab Thorup
0b8d035bda Handle premature eof in tree reader 2012-05-06 19:48:38 +02:00
Kresten Krab Thorup
2c195da15e Let the "fast" merge strategy be the default 2012-05-06 11:46:20 +02:00
Kresten Krab Thorup
9bbc6194d9 Remove riak_kv from rebar.config
Needed for running backend_eqc, but makes
rebar go bazongo on cyclic dependencies.
2012-05-05 21:53:33 +02:00
Kresten Krab Thorup
f9b7fcf224 Implement hanoi:destroy/1
Also riak_kv_hanoi_backend:drop/1 (The latter does
hanoi:destroy and then re-opens the same store).
2012-05-05 21:14:15 +02:00
Kresten Krab Thorup
96c5ec74c3 Minor changes to test code 2012-05-05 18:56:03 +02:00
Kresten Krab Thorup
8f1600b41a Fold back pressure, step 2
This makes fold-from-snapshot use the back pressure
model of doing plain_rpc:call to the merge worker
delivering chunks of 100 KVs.

The back pressure is entirely internal to hanoi,
designed to ensure that the process that merges
fold results from the individual levels is not
swamped with fold data.

Folds with a limit < 10 still do "blocking fold"
which is more efficient and uses fewer FDs, but 
blocks concurrent put/get operations.
2012-05-05 18:53:02 +02:00
Kresten Krab Thorup
49afbbc411 Add initial merge work when opening level
When re-opening a hanoi database, issue some
initial merge work to make sure that there is
room for future inserts.
2012-05-05 18:47:42 +02:00
Kresten Krab Thorup
ec55a38c42 fold from file counting
When folding, don't count tombstones towards
the folding limit.  Merge worker needs to be
able to progress to LIMIT number of non-deleted
entries.
2012-05-05 13:46:03 +02:00
Kresten Krab Thorup
f821b38ea2 Fold back pressure, step 1
This first step of the fold back pressure impl
changes fold worker so that it does not get
flooded by messages.  Now, we take messages
and put them in queues (one per fold source),
so we don't have to do selective receive on
bazillions of messages.
2012-05-05 12:53:17 +02:00
Kresten Krab Thorup
bfb8f3f783 Redo fix for exceptions in fold operations 2012-05-02 17:15:13 +02:00
Kresten Krab Thorup
41cc2a9196 Add riak_kv as dependency
Needed to run backend_eqc tests
2012-05-02 17:13:58 +02:00
Kresten Krab Thorup
68114bdbff Fix folding deleted entries
There was a couple of bugs found by Triq, which
exhibited bugs in folding.
2012-05-02 17:13:03 +02:00
Kresten Krab Thorup
70fc4030f6 Enable quick check for backend tests
Actual QC tests come from riak_kv
2012-05-02 17:11:08 +02:00
Kresten Krab Thorup
5e3417f9d6 Reenable quick check tests 2012-05-01 16:27:39 +02:00
Kresten Krab Thorup
c8964e955c Config option {merge_strategy, fast|predictable}
Both options have same log2(N) upper bound on
latencies, but `fast' fluctuates.
2012-05-01 16:27:06 +02:00
Kresten Krab Thorup
01ea88b67c Implement hibernation for readers too
This enables all open files in a merge worker
to be closed while it is waiting for work to do.
2012-05-01 02:12:02 +02:00
Kresten Krab Thorup
c998e8ca31 Fix merge work for opening a HanoiDB 2012-05-01 02:10:13 +02:00
Kresten Krab Thorup
c8e403af8c Refactor merge work computation
Now merge work computation is close to ideal.
It does not take into account the actual size
of files at each level, but we have not figured
out how to utilize that knowledge.
2012-04-30 23:34:27 +02:00
Kresten Krab Thorup
380a4f9cfc Redo work load computation
The simplistic approach has a race condition.
This works for now, albeit still issuing too
much work.
2012-04-30 22:44:21 +02:00
Kresten Krab Thorup
be507c0e13 Syntax error 2012-04-30 21:59:55 +02:00
Kresten Krab Thorup
0009e17d4f Change delegate work computation
We were delegating too much work.  The original
algorithm description said that for each insert,
"1" unit of merge work has to be done 
*at each level* … implying that if nothing needs
doing at a level, that "not done work" does not 
add to work done elsewhere. This fix gets us back
to that situation (by always subtracting at least
2^TOP_LEVEL from the presented work amount), while
maintaining the (beneficial) effect of chunking
merge work at at anything but the last level.

Effectively, this reduces the maximum amount of
merge work done, also reducing our worst case
latency.

Now that we understand this, we can refactor the
algorithm to delegate "DoneWork", because then
each level can determine the total work, and see
if any work is left "for me".  That's next.
2012-04-30 21:38:53 +02:00
Kresten Krab Thorup
74686b1380 Implement merge hibernation for tail scan
When scanning just one file (because all it's keys
are after the ones in the other file), we also
can need hibernation to save memory.  Especially
the bloom filters being built take a lot of mem.
2012-04-30 21:28:33 +02:00
Kresten Krab Thorup
6ce7101506 Correct step counts in merger
Merge was progressing too fast. This corrects
the progress house keeping in processing
merge work.
2012-04-30 19:28:20 +02:00
Kresten Krab Thorup
d6b8491a3d Make step code more explicit
This change has no semantic effect, only
makes the code easier to read
2012-04-30 19:27:13 +02:00
Kresten Krab Thorup
18c197d959 New config: {read|write}_buffer_size
These two parameters (defaulting to 512k) control
the amount of erlang file buffer space to allocate
for delayed_write and read_ahead when merging.

This config parameter is *per merge task* of which 
there can be many for each open HanoiDB; and again
multiplied by number of active vnodes in Riak.

As such, this can config parameter is significant
for the memory usage of a Riak with Hanoi, but setting
it too low will kill the performance.
2012-04-30 00:06:42 +02:00
Kresten Krab Thorup
a6952cdb77 Rename _Variables to remove compiler warnings 2012-04-29 23:57:49 +02:00
Kresten Krab Thorup
e63df328ed remove verbose info_msg 2012-04-29 18:43:38 +02:00
Kresten Krab Thorup
77a81499f9 Fix problem in merge hibernation
The merge state includes an bloom reference,
which needed to be properly serialized.
2012-04-29 01:32:02 +02:00
Kresten Krab Thorup
f0833de3fc Set default pagesize to 8k
Also reduce read ahead / delayed write parameters
so we don't need too much memory in merge procs.
2012-04-29 00:33:15 +02:00
Kresten Krab Thorup
15fc05634a Implement hibernation in merge processes
Analysis seems to indicate that merge processes
(from high-numbered levels) tend to be activated
quite infrequent. Thus, we term-to-bin/gzip the
merge process state, and invoke explicit gc
before waiting for a {step, …} message again.
2012-04-29 00:32:50 +02:00
Kresten Krab Thorup
801817cf70 Move tree traversal to separate process
Looks like we're generating a lot of garbage
here.  Moving this to a separate process lets
us avoid a lot of garbage collection work, since
we don't cache these parsed nodes anyway.
2012-04-28 22:40:39 +02:00
Kresten Krab Thorup
b53d6fc3c3 Add riak_core dependency
For unit tests to run
2012-04-28 18:45:53 +02:00
Kresten Krab Thorup
4fbd7d17ed Update README/TODO 2012-04-28 18:42:04 +02:00
Kresten Krab Thorup
682191ce06 Tree writing code was broken
In some cases, inner nodes were not being emitted.
This some times would cause queries (get / range_fold)
to only include results in a right-most branch.
2012-04-28 18:35:35 +02:00
Kresten Krab Thorup
940fa823e2 integer index values fixed
The code was assuming index values were binaries.
2012-04-28 18:34:03 +02:00
Kresten Krab Thorup
9d3542c4a0 Enable debug print upon opening a level
Also clean up some variable names to be
more descriptive / correct.
2012-04-28 18:32:50 +02:00
Kresten Krab Thorup
5c717b1ec3 Fix range_fold
Range fold with from_key < first_key would
always return an empty result.
2012-04-28 18:31:19 +02:00
Kresten Krab Thorup
be5db4e4be Update readme on 2i and fast bucket listing 2012-04-27 12:09:19 +02:00
Kresten Krab Thorup
4e354c0379 Implement "fast" fold buckets function
do repeated limit=1 range queries based on
the sext encoding of {o, Bucket, Key}
2012-04-27 12:08:35 +02:00
Kresten Krab Thorup
eb63ce1d04 Fix two more unit tests 2012-04-27 10:23:52 +02:00
Kresten Krab Thorup
9a7e2131a1 Implement 2i
Most code copied from eleveldb backend, except
we can do more precise range folds with hanoi
so no need to throw exceptions from fold functions.
2012-04-27 10:03:19 +02:00
Kresten Krab Thorup
89b04fe4fb Add debug logging 2012-04-27 10:01:53 +02:00
Kresten Krab Thorup
f41aaa265e Fix bug with fold termination 2012-04-27 10:00:36 +02:00
Kresten Krab Thorup
2d928fce73 Fix unit tests 2012-04-27 09:59:09 +02:00
Kresten Krab Thorup
6b47d8dd1e Fix race condition
When merge is completed, and inject-to-next-level
is pending, there is still a B file, but no 
current merge_pid.  In this case, don't try
to do merge work at this level.
2012-04-27 09:47:21 +02:00
Kresten Krab Thorup
b07d16d292 Add hanoi:transact, and CRC checks for nursery.log
This involves some cleanup/reorg of code
in hanoi_util.  Streaming trees and nursery
now use the same cry checking code.

Future: Keep the CRC-encoded binary around, 
and reuse it when writing trees.  This will reduce
cpu costs involved in re-computing those all the
time.
2012-04-26 17:18:49 +02:00
Kresten Krab Thorup
67f1c46b7e Code cleanup
Clean up a little in hanoi_level, avoiding an
extra message send when initiating incremental
merge
2012-04-26 17:13:47 +02:00
Kresten Krab Thorup
eba7f820ef Update README 2012-04-26 17:12:37 +02:00