Commit graph

133 commits

Author SHA1 Message Date
Kresten Krab Thorup
3d80b164d5 Introduce 3rd file in each level to reduce worst-case
Now, each level is comprised of 3 files,

   A=Oldest, B=Older, C=Old

As in [Overmars and Leeuwen, 1983]. As soon as we have A & B,
we initiate a merge, (to the M=New) file, i.e. we merge more
eagerly than previously.

Next step in this refactoring is to add a scheduler that enforces
some merge activity as part of a PUT.
2012-04-19 16:07:11 +02:00
Gregory Burd
3f02eadc27 Too large a nursery opens up the potential for long (in seconds) merges. 2012-04-18 16:49:57 -04:00
Kresten Krab Thorup
4e53b0a083 Allow fold worker to send {fold_results, PID, KVs}
Not just individual KVs, but lists of KVs
2012-04-18 09:28:59 +02:00
Kresten Krab Thorup
c3f916c350 Make fold_worker link to the consumer
This should ensure proper cleanup if a process
calling fold exits while folding.
2012-04-18 09:25:47 +02:00
Kresten Krab Thorup
5facc3df18 Undo fancy-pancy sext key encoding
Sadly, this didn't work
Have to investigate more…
2012-04-16 21:51:01 -04:00
Kresten Krab Thorup
1cf4805da5 Utilize sext to optimize bucket range queries
Also re-fix the range arguments which were broken
in a previous commit.
2012-04-16 21:51:01 -04:00
Kresten Krab Thorup
e7a621e449 Handle exceptions inside sync_fold_range
Someone tried (tsk, tsk) to terminate a fold operation
by throwing an exception.  Now we also should handle
such situations gracefully.
2012-04-16 21:49:15 -04:00
Steve Vinoski
4dab3a65e5 remove throw statements from fold functions 2012-04-16 21:42:04 -04:00
Kresten Krab Thorup
454a111ad7 Handle exceptions inside sync_fold_range
Someone tried (tsk, tsk) to terminate a fold operation
by throwing an exception.  Now we also should handle
such situations gracefully.
2012-04-17 00:29:28 +02:00
Steve Vinoski
a911734134 lsm_btree:sync_fold_range/5 no longer exists, use /4 instead 2012-04-16 17:46:43 -04:00
Steve Vinoski
79872680da rename temp_riak_kv_backend to lsm_btree_temp_riak_kv_backend
The module temp_riak_kv_backend is already used in another experimental
storage backend. Rename it to avoid collisions when trying to use both
backends in riak together.
2012-04-16 17:25:51 -04:00
Steve Vinoski
eefada16ac restore lookup/2 function, for compatibility 2012-04-16 16:50:55 -04:00
Kresten Krab Thorup
5f9f5c18d6 Add #btree_range to all fold ops
lets you specify this for fold operations

  #btree_range {
     from_key :: binary(),
     from_inclusive = true :: boolean(),
     to_key :: binary() | undefined,
     to_inclusive = false :: boolean(),
     limit = undefined :: pos_integer() | undefined
  }
2012-04-16 16:46:57 -04:00
Kresten Krab Thorup
9a7959ff4c Simplify KEY_IN_RANGE macro 2012-04-16 16:45:25 -04:00
Gregory Burd
5c5934549f The nursery was far too small, increase it from 32 to 8192 objects. 2012-04-15 17:23:42 -04:00
Gregory Burd
e81a3480ab Fix silly oversight with variable names, add better data dir prep. 2012-04-15 16:56:19 -04:00
Gregory Burd
4eaa02ac3f Formatting 2012-04-15 16:55:12 -04:00
Gregory Burd
b4823d3e8f Formatting 2012-04-15 16:54:27 -04:00
Gregory Burd
61d360550e Use not_found uniformly (rather than notfound). 2012-04-15 15:34:42 -04:00
Gregory Burd
5338a07c54 Don't prefix vnode directories with the backend name 2012-04-15 14:36:05 -04:00
Gregory Burd
b325f3e792 * Changed "lookup" to "get" just because
* Added copyright notices to files
 * Added Apache 2.0 License file with permission from Kresten/Trifork
 * Changed the handle from "Db" to "Tree" because... it made me feel better
 * Other minor changes here and there
2012-04-15 10:35:39 -04:00
Gregory Burd
e6e3b55d23 Minor renaming 2012-04-15 07:24:26 -04:00
Gregory Burd
95d18b6cd5 Minor name change 2012-04-15 07:23:44 -04:00
Gregory Burd
e581370242 Minor rename 2012-04-15 07:19:50 -04:00
Steve Vinoski
4bc1eb6e19 add riak kv backend 2012-04-14 20:49:56 -04:00
Kresten Krab Thorup
f0def8231b Introduce btree_range record for range queries
This allows specifying ranges with from/to
being inclusive or not, and providing a result
limit (latter not implemented yet).

This change just makes all current tests pass.
2012-01-23 00:51:31 +01:00
Kresten Krab Thorup
42b353ecfd Implement sequential/random reader API 2012-01-23 00:49:07 +01:00
Jesper Louis Andersen
baa779ddaa Fix a bug in lsm_tree:close/1.
There is a race condition based on the monitor set in a call. We might
get a normal exit from the monitor message deep inside gen_server.
This has to be handled. I've seen this race in my QC tests.
2012-01-21 21:17:33 +01:00
Kresten Krab Thorup
fc024e95b6 Make proper range fold in nursery 2012-01-20 14:08:07 +01:00
Kresten Krab Thorup
ec2fe4ce8c Make close/1 resilient to noproc
Getting noproc exceptions here is ok, we simply
ignore such errors.
2012-01-20 10:14:47 +01:00
Kresten Krab Thorup
30a0bd4b01 Use ?BTREE_SIZE macro everywhere 2012-01-20 10:09:54 +01:00
Kresten Krab Thorup
c26e0695c5 Finish renaming lsm_btree_merger2 2012-01-20 10:06:43 +01:00
Kresten Krab Thorup
07b6b17534 Rename merger2 -> merger 2012-01-20 10:05:08 +01:00
Kresten Krab Thorup
1ad7bb2158 Remove unused merger 2012-01-20 10:04:10 +01:00
Kresten Krab Thorup
30ad1f0794 Use ?TOMBSTONE macro everywhere 2012-01-19 15:19:22 +01:00
Kresten Krab Thorup
f56f530d7a Add both sync and async range fold
Sync blocks insert/lookup while doing a range
query, but still buffers the results in a
per-range query process.  

Async fold runs on a hard link copy of the
underlying tree data.

This commit and also fixes a number of bugs 
related to folding; it was not taking nursery 
data into account.
2012-01-19 14:25:47 +01:00
Kresten Krab Thorup
ead8d3a41d Make lsm_btree:close/1 stop more processes
Closing a tree did not stop ongoing merge
processes beyond the current top level.
Now close synchronously calls down through all
levels and closes each one.
2012-01-19 14:19:16 +01:00
Kresten Krab Thorup
29d1493415 Rename lsm_btree:range/3 to lsm_btree:async_range/3 2012-01-16 15:13:47 +01:00
Erik Søe Sørensen
49c8d5b06f Clarify level handling in writer. 2012-01-16 07:50:31 +08:00
Kresten Krab Thorup
771d18f9f7 Implement lsm_btree:fold_range/5
First implementation of range fold

Range folding doesn't prohibit insert/lookup or
merge operations, but each level can only have
one range fold operation active.

Thus, worst case active range folds can double
space requirements, because it holds hard-linked
copies of used btree files.
2012-01-16 00:37:52 +01:00
Kresten Krab Thorup
f2629c3fd2 Improve nursery handling
- Code moved to separate module
- Recovery now operational; re-opening a
  tree will actually read the nursery log.

We use a sequential log file for the nursery
and also keep inserted {K,V} in memory
only 32 K/V pairs at a time.

NURSERY_SIZE = (1 bsl TOP_LEVEL)
configured in src/lsm_btree.htl
2012-01-10 18:04:13 -04:00
Kresten Krab Thorup
15736dd82a Rename modules fractal_btree -> lsm_btree 2012-01-07 17:17:48 +01:00
Kresten Krab Thorup
db2399ee4a Rename fractal_btree -> lsm_btree, phase I 2012-01-07 17:14:52 +01:00
Kresten Krab Thorup
99a6985eed Allow put(Key, Binary|'deleted')
Last missing piece to make delete work it seems
2012-01-07 14:49:25 +01:00
Kresten Krab Thorup
5ca4443f04 Remove warning 2012-01-07 14:48:10 +01:00
Kresten Krab Thorup
3d0c36c3bc Add option to evict tombstones in merge 2012-01-07 00:28:26 +01:00
Kresten Krab Thorup
5b4a4551a9 Reduce verbosity 2012-01-07 00:27:20 +01:00
Kresten Krab Thorup
120609f8ac Delete X files so levels can reopen cleanly 2012-01-07 00:26:44 +01:00
Kresten Krab Thorup
61f0aa26d2 Merge branch 'level_mainloop_simplification' of git://github.com/eriksoe/fractal_btree 2012-01-06 23:48:40 +01:00
Erik Søe Sørensen
dac9b31266 Merge branch 'master' of github.com:eriksoe/fractal_btree
Conflicts:
	src/fractal_btree_level.erl
2012-01-06 13:56:32 +01:00
Kresten Krab Thorup
4e8602043f Implement range_fold 2012-01-06 02:06:25 +01:00
Kresten Krab Thorup
9a624f963e Rename read_leaf_node -> next_leaf_node 2012-01-06 02:06:00 +01:00
Kresten Krab Thorup
b21e253324 Store child-refs as {Pos,Size} so we can pread
This allows us to use file:pread to read a
child-node, rather than two separate reads
(one for node block size, and then one for 
the node block itself).

Also, encode the level# in node header, so that
scanning leafs doesn't need to decode the
node contents for inner nodes.
2012-01-06 00:29:05 +01:00
Kresten Krab Thorup
2f985d8576 Fix indentation 2012-01-06 00:02:29 +01:00
Erik Søe Sørensen
90ae581213 level: Calculate the size of the merged file correctly - again. 2012-01-05 21:09:36 +01:00
Erik Søe Sørensen
be91e047ff Merge branch 'master' of github.com:krestenkrab/fractal_btree 2012-01-05 18:32:28 +01:00
Kresten Krab Thorup
cff74ac93a Use proc_lib for spawning 2012-01-05 18:09:00 +01:00
Kresten Krab Thorup
6c0766a433 Allow merger to be local 2012-01-05 18:08:19 +01:00
Kresten Krab Thorup
51f1c13650 Assert nursery file is gone after inject 2012-01-05 18:07:20 +01:00
Erik Søe Sørensen
c43d5464d6 Merge branch 'master' of github.com:krestenkrab/fractal_btree 2012-01-05 17:50:50 +01:00
Erik Søe Sørensen
aca809aa90 level: Handle failure of merger process. Remove EXIT messages from inbox. 2012-01-05 17:50:18 +01:00
Kresten Krab Thorup
22e59b0fcc Make btree_writer:close use infinity 2012-01-05 17:37:15 +01:00
Kresten Krab Thorup
8c6d832f27 Merge pull request #1 from eriksoe/level_mainloop_simplification
Level mainloop simplification
2012-01-05 08:31:10 -08:00
Kresten Krab Thorup
cf54868d1b Implement new merge algorithm
Which does not spawn individual processes,
but rather does a "sequential merge"
2012-01-05 17:07:49 +01:00
Kresten Krab Thorup
00e2fba43a Correct count for merge 2012-01-05 16:46:32 +01:00
Erik Søe Sørensen
b9abf21bac Correct embarrasing typo wrt. use of do_lookup(). 2012-01-05 16:23:21 +01:00
Erik Søe Sørensen
36816e261a Merge remote-tracking branch 'kk/master' into level_mainloop_simplification
Conflicts:
	src/fractal_btree_level.erl
2012-01-05 16:14:54 +01:00
Kresten Krab Thorup
244e3128e9 Handle error case + debugging for that 2012-01-05 23:05:18 +08:00
Kresten Krab Thorup
e09e2b2aa2 Add basho_bench script/driver 2012-01-05 23:05:18 +08:00
Kresten Krab Thorup
3118bd8c62 Remove lots of info_msg 2012-01-05 23:05:18 +08:00
Erik Søe Sørensen
c04c11c67f First compile and test, then commit. 2012-01-05 16:02:29 +01:00
Erik Søe Sørensen
fa43e41c51 Simplify slightly in level:initialize(). 2012-01-05 16:01:14 +01:00
Erik Søe Sørensen
c93505fa58 Simplify main_loop() by collapsing to one clause. 2012-01-05 15:59:45 +01:00
Kresten Krab Thorup
27396c21d1 Handle error case + debugging for that 2012-01-05 15:56:37 +01:00
Erik Søe Sørensen
76c6cbd585 Simplify lookup in main_loop2(). 2012-01-05 15:46:13 +01:00
Erik Søe Sørensen
285b7bc95e Rename main_loop{0,1,2}() to main_loop(). 2012-01-05 15:35:16 +01:00
Erik Søe Sørensen
6184272d95 Remove unused level:size(). 2012-01-05 15:34:50 +01:00
Kresten Krab Thorup
86f28c683f Add basho_bench script/driver 2012-01-05 15:28:39 +01:00
Kresten Krab Thorup
7a0fc6addd Remove lots of info_msg 2012-01-05 15:28:23 +01:00
Kresten Krab Thorup
cdadb88ebf Top-level functionality fractal_btree "works"
There is a single unit test for the aggregate
functionality, so basic interactions work.

[Too many log messages right now]
2012-01-05 11:48:14 +01:00
Kresten Krab Thorup
da65b9abb1 zip stored bloom filter 2012-01-04 15:48:57 +01:00
Kresten Krab Thorup
5af86b9e23 Add bloom filter to btree index format 2012-01-04 15:36:52 +01:00
Kresten Krab Thorup
6e13f55044 Initial work-in-progress 2012-01-04 15:05:31 +01:00