machi/prototype/tango/README.md

102 lines
3.8 KiB
Markdown
Raw Normal View History

# Tango prototype
This is a quick hack, just to see how quick & easy it might be to
build Tango on top of corfurl. It turned out to be pretty quick and
easy.
This prototype does not include any datatype-specific APIs, such as an
HTTP REST interface for manipulating a queue. The current API is
native Erlang only. However, because the Tango client communicates to
the underlying CORFU log via the `corfurl` interface, this
implementation is powerful enough to run concurrently on multiple
Erlang nodes.
This implementation does not follow the same structure as described in
the Tango paper. I made some changes, based on some guesses/partial
understanding of the paper. If I were to start over again, I'd try to
use the exact same naming scheme & structure suggested by the paper.
## Testing environment
Tested using Erlang/OTP R16B and Erlang/OTP 17, both on OS X.
It ought to "just work" on other versions of Erlang and on other OS
platforms, but sorry, I haven't tested it.
Use `make` and `make test` to compile and run unit tests.
## Data types implemented
* OID mapper
* Simple single-value register
* Map (i.e., multi-value register or basic key-value store)
* Queue
* Used the Erlang/OTP `queue.erl` library for rough inspiration
* Operations: is_empty, length, peek, to_list, member, in, out,
reverse, filter.
* Queue mutation operations are not idempotent with respect to
multiple writes in the underlying CORFU log, e.g., due to CORFU
log reconfiguration or partial write error/timeouts.
## Experimental idea: built-in OID checkpointing
I was toying with the idea of adding a Tango "history splicing"
operation that could make the implementation per-OID checkpoint &
garbage collection (and CORFU-level trimming) operations much easier.
I think that this might be a very good idea and that it deserves more
research & work.
The implementation of the checkpointing & splicing as it is today is
flawed. See the TODO list below for more details.
## Information about the Tango paper
"Tango: Distributed Data Structures over a Shared Log"
Balakrishnan, Malkhi, Wobber, Wu, Brabhakaran, Wei, Davis, Rao, Zou, Zuck
Describes a framework for developing data structures that reside
persistently within a CORFU log: the log *is* the database/data
structure store.
http://www.snookles.com/scottmp/corfu/Tango.pdf
See also, `../corfu/docs/corfurl.md` for more information on CORFU
research papers.
## TODO list
2015-03-03 07:22:05 +00:00
__ The src/corfu* files in this sub-repo differ from the original
prototype source files in the ../corfu sub-repo, sorry!
__ The current checkpoint implementation is fundamentally broken and
needs a rewrite, or else.
This issue is not mentioned at all in the Tango paper.
option 1: fix checkpoint to be 100% correct
option 2: checkpointing is for the weak and the memory-constrained, so
don't bother. Instead, rip out the current checkpoint code,
period.
option 3: other
xx Checkpoint fix option #1: history splicing within the same OID?
xx Checkpoint fix option #2: checkpoint to a new OID, history writes to both
OIDs during the CP, then a marker in the old OID
to switch over to the new OID?
History splicing has a flaw that I belive just won't work. The switch to a
new OID has problems with updates written to the old OID before and before the
new checkpoint has finished.
I believe that a checkpoint where:
* all Tango writes, checkpoint and non-checkpoint alike, are noted with
a checkpoint number.
* that checkpoint number is strictly increasing
* a new checkpoint has a new checkpoint number
* scans ignore blocks with checkpoint numbers larger than the current
active checkpoint #, until the checkpoint is complete.
... ought to work correctly.