hanoidb/TODO

* Phase 1: Minimum viable product (in order of priority)
  * lager; check for uses of lager:error/2
  * configurable TOP_LEVEL size
  * support for future file format changes
    * Define a standard struct which is the metadata added at the end of the
      file, e.g. [btree-nodes] [meta-data] [offset of meta-data]. This is written
      in hanoi_writer:flush_nodes, and read in hanoi_reader:open2.
  * test new snappy compression support
  * Riak/KV secondary index (2i) support
    * atomic multi-commit/recovery
  * support for time based expiry, merge should eliminate expired data
  * statistics
    * for each level {#merges, {merge-time-min, max, average}}
  * add @doc strings and and -spec's
  * check to make sure every error returns with a reason {error, Reason}

* Phase 2: Production Ready
  * dual-nursery
  * cache for read-path
    * {cache, bytes(), name} share max(bytes) cache named 'name' via etc

* Phase 3: Wish List
  * add truncate/1 - quickly truncates a database to 0 items
  * count/1 - return number of items currently in tree
  * adaptive nursery sizing
  * backpressure on fold operations
    - The "sync_fold" creates a snapshot (hard link to btree files), which
      provides consistent behavior but may use a lot of disk space if there is
      a lot of insertion going on.
    - The "async_fold" folds a limited number, and remembers the last key
      serviced, then picks up from there again. So you could see intermittent
      puts in a subsequent batch of results.
  * add block-level encryption support


## NOTES:

1: make the "first level" have more thatn 2^5 entries (controlled by the constant TOP_LEVEL in hanoi.hrl); this means a new set of files is opened/closed/merged for every 32 insert/updates/deletes. Setting this higher will just make the nursery correspondingly larger, which should be absolutely fine.

2: Right now, the streaming btree writer emits a btree page based on number of elements. This could be changed to be based on the size of the node (say, some block-size boudary) and then add padding at the end so that each node read becomes a clean block transfer. Right now, we're probably taking way to many reads.

3: Also, there is no caching of read nodes. So every time a btree node is visited it is also read from disk and term_to_binary'ed. But we need a caching system for that to work well (https://github.com/cliffmoon/cherly is difficult to build), it needs to be rebar-ified.

4: Also, the format for btree nodes could probably be optimized. Right now it's just binary_to_term of a key/value list as far as I remember. Perhaps we dont have to deserialize the entire thing.

5: It might also be good to employ a scheduler (github.com/esl/jobs<http://github.com/esl/jobs>) for issuing merges; because I think that it can be a problem for the OS if there are too many merges going on at the same time.
Cleanup 2012-04-24 14:37:45 +00:00			`* Phase 1: Minimum viable product (in order of priority)`
			`* lager; check for uses of lager:error/2`
			`* configurable TOP_LEVEL size`
			`* support for future file format changes`
			`* Define a standard struct which is the metadata added at the end of the`
			`file, e.g. [btree-nodes] [meta-data] [offset of meta-data]. This is written`
			`in hanoi_writer:flush_nodes, and read in hanoi_reader:open2.`
			`* test new snappy compression support`
			`* Riak/KV secondary index (2i) support`
Update todo items. 2012-04-21 21:24:37 +00:00			`* atomic multi-commit/recovery`
Cleanup 2012-04-24 14:37:45 +00:00			`* support for time based expiry, merge should eliminate expired data`
			`* statistics`
			`* for each level {#merges, {merge-time-min, max, average}}`
Update todo items. 2012-04-21 21:24:37 +00:00			`* add @doc strings and and -spec's`
			`* check to make sure every error returns with a reason {error, Reason}`
Cleanup 2012-04-24 14:37:45 +00:00
			`* Phase 2: Production Ready`
			`* dual-nursery`
			`* cache for read-path`
			`* {cache, bytes(), name} share max(bytes) cache named 'name' via etc`

			`* Phase 3: Wish List`
Update todo items. 2012-04-21 21:24:37 +00:00			`* add truncate/1 - quickly truncates a database to 0 items`
			`* count/1 - return number of items currently in tree`
Cleanup 2012-04-24 14:37:45 +00:00			`* adaptive nursery sizing`
Update todo items. 2012-04-21 21:24:37 +00:00			`* backpressure on fold operations`
Keep track of todo items here 2012-04-18 21:12:40 +00:00			`- The "sync_fold" creates a snapshot (hard link to btree files), which`
			`provides consistent behavior but may use a lot of disk space if there is`
			`a lot of insertion going on.`
			`- The "async_fold" folds a limited number, and remembers the last key`
			`serviced, then picks up from there again. So you could see intermittent`
			`puts in a subsequent batch of results.`
Cleanup 2012-04-24 14:37:45 +00:00			`* add block-level encryption support`
Keep track of todo items here 2012-04-18 21:12:40 +00:00

Cleanup 2012-04-24 14:37:45 +00:00			`## NOTES:`
Keep track of todo items here 2012-04-18 21:12:40 +00:00
Implement compression + block size option {compression, none\|gzip\|snappy} ... except right now using snappy is broken, it seems that it causes bloom filters to crash. Needs investigation. option {block_size, 32768} ... writes data to disk in chunks of ~32k. 2012-04-23 01:49:08 +00:00			`1: make the "first level" have more thatn 2^5 entries (controlled by the constant TOP_LEVEL in hanoi.hrl); this means a new set of files is opened/closed/merged for every 32 insert/updates/deletes. Setting this higher will just make the nursery correspondingly larger, which should be absolutely fine.`
Keep track of todo items here 2012-04-18 21:12:40 +00:00
			`2: Right now, the streaming btree writer emits a btree page based on number of elements. This could be changed to be based on the size of the node (say, some block-size boudary) and then add padding at the end so that each node read becomes a clean block transfer. Right now, we're probably taking way to many reads.`

			`3: Also, there is no caching of read nodes. So every time a btree node is visited it is also read from disk and term_to_binary'ed. But we need a caching system for that to work well (https://github.com/cliffmoon/cherly is difficult to build), it needs to be rebar-ified.`

			`4: Also, the format for btree nodes could probably be optimized. Right now it's just binary_to_term of a key/value list as far as I remember. Perhaps we dont have to deserialize the entire thing.`

			`5: It might also be good to employ a scheduler (github.com/esl/jobs<http://github.com/esl/jobs>) for issuing merges; because I think that it can be a problem for the OS if there are too many merges going on at the same time.`