46 lines
2.8 KiB
Text
46 lines
2.8 KiB
Text
* Phase 1: Minimum viable product (in order of priority)
|
|
* lager; check for uses of lager:error/2
|
|
* configurable TOP_LEVEL size
|
|
* support for future file format changes
|
|
* Define a standard struct which is the metadata added at the end of the
|
|
file, e.g. [btree-nodes] [meta-data] [offset of meta-data]. This is written
|
|
in hanoi_writer:flush_nodes, and read in hanoi_reader:open2.
|
|
* test new snappy compression support
|
|
* support for time based expiry, merge should eliminate expired data
|
|
* status and statistics
|
|
* for each level {#merges, {merge-time-min, max, average}}
|
|
* add @doc strings and and -spec's
|
|
* check to make sure every error returns with a reason {error, Reason}
|
|
|
|
* clean up names, btree_range -> key_range, etc.
|
|
|
|
* Phase 2: Production Ready
|
|
* dual-nursery
|
|
* cache for read-path
|
|
* {cache, bytes(), name} share max(bytes) cache named 'name' via etc
|
|
|
|
* Phase 3: Wish List
|
|
* add truncate/1 - quickly truncates a database to 0 items
|
|
* count/1 - return number of items currently in tree
|
|
* adaptive nursery sizing
|
|
* backpressure on fold operations
|
|
- The "sync_fold" creates a snapshot (hard link to btree files), which
|
|
provides consistent behavior but may use a lot of disk space if there is
|
|
a lot of insertion going on.
|
|
- The "async_fold" folds a limited number, and remembers the last key
|
|
serviced, then picks up from there again. So you could see intermittent
|
|
puts in a subsequent batch of results.
|
|
* add block-level encryption support
|
|
|
|
|
|
## NOTES:
|
|
|
|
1: make the "first level" have more thatn 2^5 entries (controlled by the constant TOP_LEVEL in hanoi.hrl); this means a new set of files is opened/closed/merged for every 32 insert/updates/deletes. Setting this higher will just make the nursery correspondingly larger, which should be absolutely fine.
|
|
|
|
2: Right now, the streaming btree writer emits a btree page based on number of elements. This could be changed to be based on the size of the node (say, some block-size boudary) and then add padding at the end so that each node read becomes a clean block transfer. Right now, we're probably taking way to many reads.
|
|
|
|
3: Also, there is no caching of read nodes. So every time a btree node is visited it is also read from disk and term_to_binary'ed. But we need a caching system for that to work well (https://github.com/cliffmoon/cherly is difficult to build), it needs to be rebar-ified.
|
|
|
|
4: Also, the format for btree nodes could probably be optimized. Right now it's just binary_to_term of a key/value list as far as I remember. Perhaps we dont have to deserialize the entire thing.
|
|
|
|
5: It might also be good to employ a scheduler (github.com/esl/jobs<http://github.com/esl/jobs>) for issuing merges; because I think that it can be a problem for the OS if there are too many merges going on at the same time.
|