diff --git a/TODO b/TODO index 69fe721..df9b0a3 100644 --- a/TODO +++ b/TODO @@ -1,21 +1,24 @@ -* hanoi - * [cleanup] add @doc strings and and -spec's - * [cleanup] check to make sure every error returns with a reason {error, Reason} - * [feature] statistics - * [feature] use lager for error messages - * [enhancement] add crc or something to the files - * [feature] add config parameters on open +* lsm_btree + * [2i] secondary index support + * atomic multi-commit/recovery + * add checkpoint/1 and sync/1 - flush pending writes to stable storage + (nursery:finish() and finish/flush any merges) + * [config] add config parameters on open * {sync, boolean()} fdsync or not on write * {cache, bytes(), name} share max(bytes) cache named 'name' via etc - * [enhancement] use etc/emmap to access/cache files - * [enhancement] adaptive nursery sizing - * [feature] support for time based expiry, merge should eliminate expired data - * [feature] add truncate/1 - quickly truncates a database to 0 items - * [feature] add sync/1 - flush pending writes to disk (aka checkpoint) - (nursery:finish() and finish/flush any merges) - * [feature] count/1 - return number of items currently in tree - * [feature] "group" commit - ability to make many k/v add/update/deletes atomic (for 2i) - * [enhancement] backpressure on fold operations + * [stats] statistics + * For each level {#merges, {merge-time-min, max, average}} + * [expiry] support for time based expiry, merge should eliminate expired data + * add @doc strings and and -spec's + * check to make sure every error returns with a reason {error, Reason} + * lager; check for uses of lager:error/2 + * add version 1, crc to the files + * add compression via snappy (https://github.com/fdmanana/snappy-erlang-nif) + * add encryption + * adaptive nursery sizing + * add truncate/1 - quickly truncates a database to 0 items + * count/1 - return number of items currently in tree + * backpressure on fold operations - The "sync_fold" creates a snapshot (hard link to btree files), which provides consistent behavior but may use a lot of disk space if there is a lot of insertion going on. @@ -23,31 +26,14 @@ serviced, then picks up from there again. So you could see intermittent puts in a subsequent batch of results. -* riak_kv_hanoie_backend - * add support for time-based expiry - * finish support for 2i - * add stats collection - - For each level {#merges, {merge-time-min, max, average}} - PHASE 2: -* hanoi - +* lsm_btree * Define a standard struct which is the metadata added at the end of the file, e.g. [btree-nodes] [meta-data] [offset of meta-data]. This is written - in hanoi_writer:flush_nodes, and read in hanoi_reader:open2. + in lsm_btree_writer:flush_nodes, and read in lsm_btree_reader:open2. * [feature] compression, encryption on disk -PHASE 3: -* lsm_ixdb - * hanoi{btree, trie, ...} support for sub-databases and associations with - different index types - * [major change] add more CAPABILITIES such as - test-and-set(Fun, Key, Value) - to compare a vclock quickly, to speed up - the get/put patch for every update - * [enhancement] change encoding/layout of data on disk using sub-databases - and secondary indexes - bucket/key{meta[], data} -> ?? REVIEW LITERATURE AND OTHER SIMILAR IMPLEMENTATAIONS: @@ -56,7 +42,7 @@ REVIEW LITERATURE AND OTHER SIMILAR IMPLEMENTATAIONS: * http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.44.2782&rep=rep1&type=pdf -1: make the "first level" have more thatn 2^5 entries (controlled by the constant TOP_LEVEL in hanoi.hrl); this means a new set of files is opened/closed/merged for every 32 insert/updates/deletes. Setting this higher will just make the nursery correspondingly larger, which should be absolutely fine. +1: make the "first level" have more thatn 2^5 entries (controlled by the constant TOP_LEVEL in lsm_btree.hrl); this means a new set of files is opened/closed/merged for every 32 insert/updates/deletes. Setting this higher will just make the nursery correspondingly larger, which should be absolutely fine. 2: Right now, the streaming btree writer emits a btree page based on number of elements. This could be changed to be based on the size of the node (say, some block-size boudary) and then add padding at the end so that each node read becomes a clean block transfer. Right now, we're probably taking way to many reads.