Cleanup
This commit is contained in:
parent
86516d4b2d
commit
e4d8615a99
3 changed files with 54 additions and 50 deletions
|
@ -1,6 +1,6 @@
|
||||||
|
# Hanoi's Design
|
||||||
|
|
||||||
# Hanoi's Design: How this LSM-BTree Works
|
### Basics
|
||||||
|
|
||||||
If there are N records, there are in log<sub>2</sub>(N) levels (each being a plain B-tree in a file named "A-*level*.data"). The file `A-0.data` has 1 record, `A-1.data` has 2 records, `A-2.data` has 4 records, and so on: `A-n.data` has 2<sup>n</sup> records.
|
If there are N records, there are in log<sub>2</sub>(N) levels (each being a plain B-tree in a file named "A-*level*.data"). The file `A-0.data` has 1 record, `A-1.data` has 2 records, `A-2.data` has 4 records, and so on: `A-n.data` has 2<sup>n</sup> records.
|
||||||
|
|
||||||
In "stable state", each level file is either full (there) or empty (not there); so if there are e.g. 20 records stored, then there are only data in filed `A-2.data` (4 records) and `A-4.data` (16 records).
|
In "stable state", each level file is either full (there) or empty (not there); so if there are e.g. 20 records stored, then there are only data in filed `A-2.data` (4 records) and `A-4.data` (16 records).
|
||||||
|
@ -28,7 +28,6 @@ Deletes are the same: they are also done by inserting a tombstone (a special val
|
||||||
|
|
||||||
|
|
||||||
## Merge Logic
|
## Merge Logic
|
||||||
|
|
||||||
The really clever thing about this storage mechanism is that merging is guaranteed to be able to "keep up" with insertion. Bitcask for instance has a similar merging phase, but it is separated from insertion. This means that there can suddenly be a lot of catching up to do. The flip side is that you can then decide to do all merging at off-peak hours, but it is yet another thing that need to be configured.
|
The really clever thing about this storage mechanism is that merging is guaranteed to be able to "keep up" with insertion. Bitcask for instance has a similar merging phase, but it is separated from insertion. This means that there can suddenly be a lot of catching up to do. The flip side is that you can then decide to do all merging at off-peak hours, but it is yet another thing that need to be configured.
|
||||||
|
|
||||||
With LSM B-Trees; back-pressure is provided by the injection mechanism, which only returns when an injection is complete. Thus, every 2nd insert needs to wait for level #0 to finish the required merging; which - assuming merging has linear I/O complexity - is enough to guarantee that the merge mechanism can keep up at higher-numbered levels.
|
With LSM B-Trees; back-pressure is provided by the injection mechanism, which only returns when an injection is complete. Thus, every 2nd insert needs to wait for level #0 to finish the required merging; which - assuming merging has linear I/O complexity - is enough to guarantee that the merge mechanism can keep up at higher-numbered levels.
|
||||||
|
@ -71,6 +70,4 @@ When X is closed and clean, it is actually intermittently renamed M so that if t
|
||||||
ABC files have 2^level KVs in it, regardless of the size of those KVs. XM files have 2^(level+1) approximately ... since tombstone merges might reduce the numbers or repeat PUTs of cause.
|
ABC files have 2^level KVs in it, regardless of the size of those KVs. XM files have 2^(level+1) approximately ... since tombstone merges might reduce the numbers or repeat PUTs of cause.
|
||||||
|
|
||||||
### File Descriptors
|
### File Descriptors
|
||||||
|
|
||||||
Hanoi needs a lot of file descriptors, currently 6*⌈log<sub>2</sub>(N)-TOP_LEVEL⌉, with a nursery of size 2<sup>TOP_LEVEL</sup>, and N Key/Value pairs in the store. Thus, storing 1.000.000 KV's need 72 file descriptors, storing 1.000.000.000 records needs 132 file descriptors, 1.000.000.000.000 records needs 192.
|
Hanoi needs a lot of file descriptors, currently 6*⌈log<sub>2</sub>(N)-TOP_LEVEL⌉, with a nursery of size 2<sup>TOP_LEVEL</sup>, and N Key/Value pairs in the store. Thus, storing 1.000.000 KV's need 72 file descriptors, storing 1.000.000.000 records needs 132 file descriptors, 1.000.000.000.000 records needs 192.
|
||||||
|
|
||||||
|
|
47
README.md
47
README.md
|
@ -1,29 +1,42 @@
|
||||||
# Hanoi Key/Value Storage Engine
|
# Hanoi Key/Value Storage Engine
|
||||||
|
|
||||||
This Erlang-based storage engine implements a structure somewhat like LSM-trees (Log-Structured Merge Trees, see docs/10.1.1.44.2782.pdf). The notes below describe how this storage engine work; I have not done extensive studies as how it differs from other storage mechanisms, but a brief brows through available online resources on LSM-trees indicates that this storage engine is quite different in several respects.
|
This storage engine implements a structure somewhat like LSM-trees
|
||||||
|
(Log-Structured Merge Trees, see docs/10.1.1.44.2782.pdf). The notes in
|
||||||
The storage engine can function as an alternative backend for Basho's Riak/KV.
|
DESIGN.md describe how this storage engine work; I have not done extensive
|
||||||
|
studies as how it differs from other storage mechanisms, but a brief review of
|
||||||
|
available research on LSM-trees indicates that this storage engine is quite
|
||||||
|
different in several respects.
|
||||||
|
|
||||||
Here's the bullet list:
|
Here's the bullet list:
|
||||||
|
|
||||||
- Insert, Delete and Read all have worst case log<sub>2</sub>(N) complexity.
|
- Insert, Delete and Read all have worst case log<sub>2</sub>(N) complexity.
|
||||||
- The cost of evicting stale key/values is amortized into insertion, so you don't need to schedule merge to happen at off-peak hours.
|
- The cost of evicting stale key/values is amortized into insertion
|
||||||
- Operations-friendly "append-only" storage (allows you to backup live system, and crash-recovery is very fast)
|
- you don't need a separate eviction thread to keep memory use low
|
||||||
- Supports range queries (and thus eventually Riak 2i.)
|
- you don't need to schedule merges to happen at off-peak hours
|
||||||
- Doesn't need much RAM, but does need a lot of file descriptors
|
- Operations-friendly "append-only" storage
|
||||||
- All around 3000 lines of pure Erlang code
|
- allows you to backup live system
|
||||||
|
- crash-recovery is very fast and the logic is straight forward
|
||||||
|
- Supports efficient range queries
|
||||||
|
- Uses bloom filters to avoid unnecessary lookups on disk
|
||||||
|
- Efficient resource utilization
|
||||||
|
- Doesn't store all keys in memory
|
||||||
|
- Uses a modest number of file descriptors proportional to the number of levels
|
||||||
|
- IO is generally balanced between random and sequential
|
||||||
|
- Low CPU overhead
|
||||||
|
- ~2000 lines of pure Erlang code in src/*.erl
|
||||||
|
|
||||||
### Deploying the hanoi for testing with Riak/KV
|
### How to deploy Hanoi as a Riak/KV backend
|
||||||
|
|
||||||
You can deploy `hanoi` into a Riak devrel cluster using the
|
This storage engine can function as an alternative backend for Basho's Riak/KV.
|
||||||
`enable-hanoi` script. Clone the `riak` repo, change your working directory
|
|
||||||
to it, and then execute the `enable-hanoi` script. It adds `hanoi` as a
|
You can deploy `hanoi` into a Riak devrel cluster using the `enable-hanoi`
|
||||||
dependency, runs `make all devrel`, and then modifies the configuration
|
script. Clone the `riak` repo, change your working directory to it, and then
|
||||||
settings of the resulting dev nodes to use the hanoi storage backend.
|
execute the `enable-hanoi` script. It adds `hanoi` as a dependency, runs `make
|
||||||
|
all devrel`, and then modifies the configuration settings of the resulting dev
|
||||||
|
nodes to use the hanoi storage backend.
|
||||||
|
|
||||||
1. `git clone git://github.com/basho/riak.git`
|
1. `git clone git://github.com/basho/riak.git`
|
||||||
1. `cd riak/deps`
|
1. `cd riak/deps`
|
||||||
1. `git clone git://github.com/basho/hanoi.git`
|
1. `git clone git://github.com/basho/hanoi.git`
|
||||||
1. `cd ..`
|
1. `cd ..`
|
||||||
1. `./deps/hanoi/enable-hanoi` # which does `make all devrel`
|
1. `./deps/hanoi/enable-hanoi`
|
||||||
|
|
||||||
|
|
50
TODO
50
TODO
|
@ -1,22 +1,28 @@
|
||||||
* hanoi (in order of priority)
|
* Phase 1: Minimum viable product (in order of priority)
|
||||||
* [2i] secondary index support
|
* lager; check for uses of lager:error/2
|
||||||
|
* configurable TOP_LEVEL size
|
||||||
|
* support for future file format changes
|
||||||
|
* Define a standard struct which is the metadata added at the end of the
|
||||||
|
file, e.g. [btree-nodes] [meta-data] [offset of meta-data]. This is written
|
||||||
|
in hanoi_writer:flush_nodes, and read in hanoi_reader:open2.
|
||||||
|
* test new snappy compression support
|
||||||
|
* Riak/KV secondary index (2i) support
|
||||||
* atomic multi-commit/recovery
|
* atomic multi-commit/recovery
|
||||||
* add checkpoint/1 and sync/1 - flush pending writes to stable storage
|
* support for time based expiry, merge should eliminate expired data
|
||||||
(nursery:finish() and finish/flush any merges)
|
* statistics
|
||||||
* [config] add config parameters on open
|
* for each level {#merges, {merge-time-min, max, average}}
|
||||||
* {cache, bytes(), name} share max(bytes) cache named 'name' via etc
|
|
||||||
* [stats] statistics
|
|
||||||
* For each level {#merges, {merge-time-min, max, average}}
|
|
||||||
* [expiry] support for time based expiry, merge should eliminate expired data
|
|
||||||
* add @doc strings and and -spec's
|
* add @doc strings and and -spec's
|
||||||
* check to make sure every error returns with a reason {error, Reason}
|
* check to make sure every error returns with a reason {error, Reason}
|
||||||
* lager; check for uses of lager:error/2
|
|
||||||
* add version 1, crc to the files
|
* Phase 2: Production Ready
|
||||||
* add compression via snappy (https://github.com/fdmanana/snappy-erlang-nif)
|
* dual-nursery
|
||||||
* add encryption
|
* cache for read-path
|
||||||
* adaptive nursery sizing
|
* {cache, bytes(), name} share max(bytes) cache named 'name' via etc
|
||||||
|
|
||||||
|
* Phase 3: Wish List
|
||||||
* add truncate/1 - quickly truncates a database to 0 items
|
* add truncate/1 - quickly truncates a database to 0 items
|
||||||
* count/1 - return number of items currently in tree
|
* count/1 - return number of items currently in tree
|
||||||
|
* adaptive nursery sizing
|
||||||
* backpressure on fold operations
|
* backpressure on fold operations
|
||||||
- The "sync_fold" creates a snapshot (hard link to btree files), which
|
- The "sync_fold" creates a snapshot (hard link to btree files), which
|
||||||
provides consistent behavior but may use a lot of disk space if there is
|
provides consistent behavior but may use a lot of disk space if there is
|
||||||
|
@ -24,22 +30,10 @@
|
||||||
- The "async_fold" folds a limited number, and remembers the last key
|
- The "async_fold" folds a limited number, and remembers the last key
|
||||||
serviced, then picks up from there again. So you could see intermittent
|
serviced, then picks up from there again. So you could see intermittent
|
||||||
puts in a subsequent batch of results.
|
puts in a subsequent batch of results.
|
||||||
|
* add block-level encryption support
|
||||||
|
|
||||||
|
|
||||||
PHASE 2:
|
## NOTES:
|
||||||
* hanoi
|
|
||||||
* Define a standard struct which is the metadata added at the end of the
|
|
||||||
file, e.g. [btree-nodes] [meta-data] [offset of meta-data]. This is written
|
|
||||||
in hanoi_writer:flush_nodes, and read in hanoi_reader:open2.
|
|
||||||
* [feature] compression, encryption on disk
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
REVIEW LITERATURE AND OTHER SIMILAR IMPLEMENTATAIONS:
|
|
||||||
* nessdb https://code.google.com/p/nessdb/source/browse/LSM-BTREE?r=3a1df166a19505a2369dd954e8fc6d0a545f3d7b
|
|
||||||
* http://tokutek.com/downloads/mysqluc-2010-fractal-trees.pdf page 14+
|
|
||||||
* http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.44.2782&rep=rep1&type=pdf
|
|
||||||
|
|
||||||
|
|
||||||
1: make the "first level" have more thatn 2^5 entries (controlled by the constant TOP_LEVEL in hanoi.hrl); this means a new set of files is opened/closed/merged for every 32 insert/updates/deletes. Setting this higher will just make the nursery correspondingly larger, which should be absolutely fine.
|
1: make the "first level" have more thatn 2^5 entries (controlled by the constant TOP_LEVEL in hanoi.hrl); this means a new set of files is opened/closed/merged for every 32 insert/updates/deletes. Setting this higher will just make the nursery correspondingly larger, which should be absolutely fine.
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue