Commit graph

146 commits

Author SHA1 Message Date
Kresten Krab Thorup
25b4099eec Only compress if compressed size is smaller 2012-04-23 04:23:52 +02:00
Kresten Krab Thorup
d37b227936 Implement compression + block size
option {compression, none|gzip|snappy}

... except right now using snappy is broken,
it seems that it causes bloom filters to
crash. Needs investigation.

option {block_size, 32768} 

... writes data to disk in chunks of ~32k.
2012-04-23 03:49:08 +02:00
Kresten Krab Thorup
14ef03e06a Initial options
Now we're ready to handle some options
2012-04-23 02:20:47 +02:00
Kresten Krab Thorup
0718d33d7a Add {sync_strategy, sync | {seconds, N} | none}
We should add o_sync also like bitcask
2012-04-23 02:20:12 +02:00
Kresten Krab Thorup
3b451d5863 Fix folding
The "blocking range fold" only works for modest
data sets, otherwise it gets prohibitively slow,
so for now we always do "snapshot range fold".
2012-04-23 02:10:18 +02:00
Kresten Krab Thorup
99fb1bee74 When opening a level, enforce just enough merge
When re-opening a Hanoi data store, we need to
reestablish the invariant that there is always
room to inject a data file at the top level.

In a worst case scenario, every level has all of
A, B, and C; and thus needs to merge A+B -> X
fully in order to accommodate what the parent 
will inject. 2*BTREE_SIZE(Level) >= sizeof(A+B)
2012-04-22 23:49:39 +02:00
Kresten Krab Thorup
8694cc118f Specify infinity as gen_server:call timeout 2012-04-22 23:30:25 +02:00
Gregory Burd
43f095b3f0 Rename "lsm-btree" to "hanoi". 2012-04-21 15:20:39 -04:00
Gregory Burd
fec14a1c51 Put the copyright/license header into a few overlooked files. 2012-04-19 18:09:01 -04:00
Gregory Burd
197914939c Removed redundant function. 2012-04-19 18:00:06 -04:00
Gregory Burd
23f6d76a72 Merge branch 'krab-incremental-merge' into gsb-merge-krab-20120419
Conflicts:
	src/lsm_btree.erl
	src/lsm_btree.hrl
2012-04-19 16:54:23 -04:00
Kresten Krab Thorup
6289602045 Improve incremental merge
This change makes incremental merge be concurrent 
with filling up the nursery. So in stead of waiting
for an incremental merge to complete before returning
from insert, it

- blocks waiting for a possible previous incremental merge to complete
- issues a new incremental merge.

This improves put latencies, but not throughput.
2012-04-19 22:33:27 +02:00
Kresten Krab Thorup
ee90944c62 Implement incremental insert
This slows down insert to be log2(N), where N is
the total number of objects in the store.  The upside
is that it also removes the terrible worst case
scenarios for insert.
2012-04-19 19:57:39 +02:00
Kresten Krab Thorup
3d80b164d5 Introduce 3rd file in each level to reduce worst-case
Now, each level is comprised of 3 files,

   A=Oldest, B=Older, C=Old

As in [Overmars and Leeuwen, 1983]. As soon as we have A & B,
we initiate a merge, (to the M=New) file, i.e. we merge more
eagerly than previously.

Next step in this refactoring is to add a scheduler that enforces
some merge activity as part of a PUT.
2012-04-19 16:07:11 +02:00
Gregory Burd
3f02eadc27 Too large a nursery opens up the potential for long (in seconds) merges. 2012-04-18 16:49:57 -04:00
Kresten Krab Thorup
4e53b0a083 Allow fold worker to send {fold_results, PID, KVs}
Not just individual KVs, but lists of KVs
2012-04-18 09:28:59 +02:00
Kresten Krab Thorup
c3f916c350 Make fold_worker link to the consumer
This should ensure proper cleanup if a process
calling fold exits while folding.
2012-04-18 09:25:47 +02:00
Kresten Krab Thorup
5facc3df18 Undo fancy-pancy sext key encoding
Sadly, this didn't work
Have to investigate more…
2012-04-16 21:51:01 -04:00
Kresten Krab Thorup
1cf4805da5 Utilize sext to optimize bucket range queries
Also re-fix the range arguments which were broken
in a previous commit.
2012-04-16 21:51:01 -04:00
Kresten Krab Thorup
e7a621e449 Handle exceptions inside sync_fold_range
Someone tried (tsk, tsk) to terminate a fold operation
by throwing an exception.  Now we also should handle
such situations gracefully.
2012-04-16 21:49:15 -04:00
Steve Vinoski
4dab3a65e5 remove throw statements from fold functions 2012-04-16 21:42:04 -04:00
Kresten Krab Thorup
454a111ad7 Handle exceptions inside sync_fold_range
Someone tried (tsk, tsk) to terminate a fold operation
by throwing an exception.  Now we also should handle
such situations gracefully.
2012-04-17 00:29:28 +02:00
Steve Vinoski
a911734134 lsm_btree:sync_fold_range/5 no longer exists, use /4 instead 2012-04-16 17:46:43 -04:00
Steve Vinoski
79872680da rename temp_riak_kv_backend to lsm_btree_temp_riak_kv_backend
The module temp_riak_kv_backend is already used in another experimental
storage backend. Rename it to avoid collisions when trying to use both
backends in riak together.
2012-04-16 17:25:51 -04:00
Steve Vinoski
eefada16ac restore lookup/2 function, for compatibility 2012-04-16 16:50:55 -04:00
Kresten Krab Thorup
5f9f5c18d6 Add #btree_range to all fold ops
lets you specify this for fold operations

  #btree_range {
     from_key :: binary(),
     from_inclusive = true :: boolean(),
     to_key :: binary() | undefined,
     to_inclusive = false :: boolean(),
     limit = undefined :: pos_integer() | undefined
  }
2012-04-16 16:46:57 -04:00
Kresten Krab Thorup
9a7959ff4c Simplify KEY_IN_RANGE macro 2012-04-16 16:45:25 -04:00
Gregory Burd
5c5934549f The nursery was far too small, increase it from 32 to 8192 objects. 2012-04-15 17:23:42 -04:00
Gregory Burd
e81a3480ab Fix silly oversight with variable names, add better data dir prep. 2012-04-15 16:56:19 -04:00
Gregory Burd
4eaa02ac3f Formatting 2012-04-15 16:55:12 -04:00
Gregory Burd
b4823d3e8f Formatting 2012-04-15 16:54:27 -04:00
Gregory Burd
61d360550e Use not_found uniformly (rather than notfound). 2012-04-15 15:34:42 -04:00
Gregory Burd
5338a07c54 Don't prefix vnode directories with the backend name 2012-04-15 14:36:05 -04:00
Gregory Burd
b325f3e792 * Changed "lookup" to "get" just because
* Added copyright notices to files
 * Added Apache 2.0 License file with permission from Kresten/Trifork
 * Changed the handle from "Db" to "Tree" because... it made me feel better
 * Other minor changes here and there
2012-04-15 10:35:39 -04:00
Gregory Burd
e6e3b55d23 Minor renaming 2012-04-15 07:24:26 -04:00
Gregory Burd
95d18b6cd5 Minor name change 2012-04-15 07:23:44 -04:00
Gregory Burd
e581370242 Minor rename 2012-04-15 07:19:50 -04:00
Steve Vinoski
4bc1eb6e19 add riak kv backend 2012-04-14 20:49:56 -04:00
Kresten Krab Thorup
f0def8231b Introduce btree_range record for range queries
This allows specifying ranges with from/to
being inclusive or not, and providing a result
limit (latter not implemented yet).

This change just makes all current tests pass.
2012-01-23 00:51:31 +01:00
Kresten Krab Thorup
42b353ecfd Implement sequential/random reader API 2012-01-23 00:49:07 +01:00
Jesper Louis Andersen
baa779ddaa Fix a bug in lsm_tree:close/1.
There is a race condition based on the monitor set in a call. We might
get a normal exit from the monitor message deep inside gen_server.
This has to be handled. I've seen this race in my QC tests.
2012-01-21 21:17:33 +01:00
Kresten Krab Thorup
fc024e95b6 Make proper range fold in nursery 2012-01-20 14:08:07 +01:00
Kresten Krab Thorup
ec2fe4ce8c Make close/1 resilient to noproc
Getting noproc exceptions here is ok, we simply
ignore such errors.
2012-01-20 10:14:47 +01:00
Kresten Krab Thorup
30a0bd4b01 Use ?BTREE_SIZE macro everywhere 2012-01-20 10:09:54 +01:00
Kresten Krab Thorup
c26e0695c5 Finish renaming lsm_btree_merger2 2012-01-20 10:06:43 +01:00
Kresten Krab Thorup
07b6b17534 Rename merger2 -> merger 2012-01-20 10:05:08 +01:00
Kresten Krab Thorup
1ad7bb2158 Remove unused merger 2012-01-20 10:04:10 +01:00
Kresten Krab Thorup
30ad1f0794 Use ?TOMBSTONE macro everywhere 2012-01-19 15:19:22 +01:00
Kresten Krab Thorup
f56f530d7a Add both sync and async range fold
Sync blocks insert/lookup while doing a range
query, but still buffers the results in a
per-range query process.  

Async fold runs on a hard link copy of the
underlying tree data.

This commit and also fixes a number of bugs 
related to folding; it was not taking nursery 
data into account.
2012-01-19 14:25:47 +01:00
Kresten Krab Thorup
ead8d3a41d Make lsm_btree:close/1 stop more processes
Closing a tree did not stop ongoing merge
processes beyond the current top level.
Now close synchronously calls down through all
levels and closes each one.
2012-01-19 14:19:16 +01:00
Kresten Krab Thorup
29d1493415 Rename lsm_btree:range/3 to lsm_btree:async_range/3 2012-01-16 15:13:47 +01:00
Erik Søe Sørensen
49c8d5b06f Clarify level handling in writer. 2012-01-16 07:50:31 +08:00
Kresten Krab Thorup
771d18f9f7 Implement lsm_btree:fold_range/5
First implementation of range fold

Range folding doesn't prohibit insert/lookup or
merge operations, but each level can only have
one range fold operation active.

Thus, worst case active range folds can double
space requirements, because it holds hard-linked
copies of used btree files.
2012-01-16 00:37:52 +01:00
Kresten Krab Thorup
f2629c3fd2 Improve nursery handling
- Code moved to separate module
- Recovery now operational; re-opening a
  tree will actually read the nursery log.

We use a sequential log file for the nursery
and also keep inserted {K,V} in memory
only 32 K/V pairs at a time.

NURSERY_SIZE = (1 bsl TOP_LEVEL)
configured in src/lsm_btree.htl
2012-01-10 18:04:13 -04:00
Kresten Krab Thorup
15736dd82a Rename modules fractal_btree -> lsm_btree 2012-01-07 17:17:48 +01:00
Kresten Krab Thorup
db2399ee4a Rename fractal_btree -> lsm_btree, phase I 2012-01-07 17:14:52 +01:00
Kresten Krab Thorup
99a6985eed Allow put(Key, Binary|'deleted')
Last missing piece to make delete work it seems
2012-01-07 14:49:25 +01:00
Kresten Krab Thorup
5ca4443f04 Remove warning 2012-01-07 14:48:10 +01:00
Kresten Krab Thorup
3d0c36c3bc Add option to evict tombstones in merge 2012-01-07 00:28:26 +01:00
Kresten Krab Thorup
5b4a4551a9 Reduce verbosity 2012-01-07 00:27:20 +01:00
Kresten Krab Thorup
120609f8ac Delete X files so levels can reopen cleanly 2012-01-07 00:26:44 +01:00
Kresten Krab Thorup
61f0aa26d2 Merge branch 'level_mainloop_simplification' of git://github.com/eriksoe/fractal_btree 2012-01-06 23:48:40 +01:00
Erik Søe Sørensen
dac9b31266 Merge branch 'master' of github.com:eriksoe/fractal_btree
Conflicts:
	src/fractal_btree_level.erl
2012-01-06 13:56:32 +01:00
Kresten Krab Thorup
4e8602043f Implement range_fold 2012-01-06 02:06:25 +01:00
Kresten Krab Thorup
9a624f963e Rename read_leaf_node -> next_leaf_node 2012-01-06 02:06:00 +01:00
Kresten Krab Thorup
b21e253324 Store child-refs as {Pos,Size} so we can pread
This allows us to use file:pread to read a
child-node, rather than two separate reads
(one for node block size, and then one for 
the node block itself).

Also, encode the level# in node header, so that
scanning leafs doesn't need to decode the
node contents for inner nodes.
2012-01-06 00:29:05 +01:00
Kresten Krab Thorup
2f985d8576 Fix indentation 2012-01-06 00:02:29 +01:00
Erik Søe Sørensen
90ae581213 level: Calculate the size of the merged file correctly - again. 2012-01-05 21:09:36 +01:00
Erik Søe Sørensen
be91e047ff Merge branch 'master' of github.com:krestenkrab/fractal_btree 2012-01-05 18:32:28 +01:00
Kresten Krab Thorup
cff74ac93a Use proc_lib for spawning 2012-01-05 18:09:00 +01:00
Kresten Krab Thorup
6c0766a433 Allow merger to be local 2012-01-05 18:08:19 +01:00
Kresten Krab Thorup
51f1c13650 Assert nursery file is gone after inject 2012-01-05 18:07:20 +01:00
Erik Søe Sørensen
c43d5464d6 Merge branch 'master' of github.com:krestenkrab/fractal_btree 2012-01-05 17:50:50 +01:00
Erik Søe Sørensen
aca809aa90 level: Handle failure of merger process. Remove EXIT messages from inbox. 2012-01-05 17:50:18 +01:00
Kresten Krab Thorup
22e59b0fcc Make btree_writer:close use infinity 2012-01-05 17:37:15 +01:00
Kresten Krab Thorup
8c6d832f27 Merge pull request #1 from eriksoe/level_mainloop_simplification
Level mainloop simplification
2012-01-05 08:31:10 -08:00
Kresten Krab Thorup
cf54868d1b Implement new merge algorithm
Which does not spawn individual processes,
but rather does a "sequential merge"
2012-01-05 17:07:49 +01:00
Kresten Krab Thorup
00e2fba43a Correct count for merge 2012-01-05 16:46:32 +01:00
Erik Søe Sørensen
b9abf21bac Correct embarrasing typo wrt. use of do_lookup(). 2012-01-05 16:23:21 +01:00
Erik Søe Sørensen
36816e261a Merge remote-tracking branch 'kk/master' into level_mainloop_simplification
Conflicts:
	src/fractal_btree_level.erl
2012-01-05 16:14:54 +01:00
Kresten Krab Thorup
244e3128e9 Handle error case + debugging for that 2012-01-05 23:05:18 +08:00
Kresten Krab Thorup
e09e2b2aa2 Add basho_bench script/driver 2012-01-05 23:05:18 +08:00
Kresten Krab Thorup
3118bd8c62 Remove lots of info_msg 2012-01-05 23:05:18 +08:00
Erik Søe Sørensen
c04c11c67f First compile and test, then commit. 2012-01-05 16:02:29 +01:00
Erik Søe Sørensen
fa43e41c51 Simplify slightly in level:initialize(). 2012-01-05 16:01:14 +01:00
Erik Søe Sørensen
c93505fa58 Simplify main_loop() by collapsing to one clause. 2012-01-05 15:59:45 +01:00
Kresten Krab Thorup
27396c21d1 Handle error case + debugging for that 2012-01-05 15:56:37 +01:00
Erik Søe Sørensen
76c6cbd585 Simplify lookup in main_loop2(). 2012-01-05 15:46:13 +01:00
Erik Søe Sørensen
285b7bc95e Rename main_loop{0,1,2}() to main_loop(). 2012-01-05 15:35:16 +01:00
Erik Søe Sørensen
6184272d95 Remove unused level:size(). 2012-01-05 15:34:50 +01:00
Kresten Krab Thorup
86f28c683f Add basho_bench script/driver 2012-01-05 15:28:39 +01:00
Kresten Krab Thorup
7a0fc6addd Remove lots of info_msg 2012-01-05 15:28:23 +01:00
Kresten Krab Thorup
cdadb88ebf Top-level functionality fractal_btree "works"
There is a single unit test for the aggregate
functionality, so basic interactions work.

[Too many log messages right now]
2012-01-05 11:48:14 +01:00
Kresten Krab Thorup
da65b9abb1 zip stored bloom filter 2012-01-04 15:48:57 +01:00
Kresten Krab Thorup
5af86b9e23 Add bloom filter to btree index format 2012-01-04 15:36:52 +01:00
Kresten Krab Thorup
6e13f55044 Initial work-in-progress 2012-01-04 15:05:31 +01:00