Round 1 of doc updates

This commit is contained in:
Scott Lystig Fritchie 2015-03-03 17:59:04 +09:00
parent 26f08e62ec
commit 7c0e174a3d
3 changed files with 78 additions and 5 deletions

View file

@ -48,6 +48,8 @@ the simulator.
See [[https://tools.ietf.org/html/rfc7282][On Consensus and Humming in the IETF]], RFC 7282.
See also: [[http://www.snookles.com/slf-blog/2015/03/01/on-humming-consensus-an-allegory/][On “Humming Consensus”, an allegory]].
** Tunesmith?
A mix of orchestral conducting, music composition, humming?
@ -365,7 +367,8 @@ document presents a detailed example.)
* Sketch of the self-management algorithm
** Introduction
See also, the diagram (((Diagram1.eps))), a flowchart of the
Refer to the diagram `chain-self-management-sketch.Diagram1.pdf`, a
flowchart of the
algorithm. The code is structured as a state machine where function
executing for the flowchart's state is named by the approximate
location of the state within the flowchart. The flowchart has three

View file

@ -23,7 +23,7 @@ eunit:
pulse: compile
env USE_PULSE=1 $(REBAR_BIN) skip_deps=true clean compile
env USE_PULSE=1 $(REBAR_BIN) skip_deps=true -D PULSE eunit
env USE_PULSE=1 $(REBAR_BIN) skip_deps=true -D PULSE -v eunit
CONC_ARGS = --pz ./.eunit --treat_as_normal shutdown --after_timeout 1000

View file

@ -1,9 +1,51 @@
# The chain-manager prototype
# The chain manager prototype
This is a very early experiment to try to create a distributed "rough
consensus" algorithm that is sufficient & safe for managing the order
of a Chain Replication chain, its members, and its chain order.
of a Chain Replication chain, its members, and its chain order. A
name hasn't been chosen yet, though the following are contenders:
* chain self-management
* rough consensus
* humming consensus
* foggy consensus
## Code status: active!
Unlike the other code projects in this repository's `prototype`
directory, the chain management code is still under active
development. It is quite likely (as of early March 2015) that this
code will be robust enough to move to the "real" Machi code base soon.
The most up-to-date documentation for this prototype will **not** be
found in this subdirectory. Rather, please see the `doc` directory at
the top of the Machi source repository.
## Testing, testing, testing
It's important to implement any Chain Replication chain manager as
close to 100% bug-free as possible. Any bug can introduce the
possibility of data loss, which is something we must avoid.
Therefore, we will spend a large amount of effort to use as many
robust testing tools and methods as feasible to test this code.
* [Concuerror](http://concuerror.com), a DPOR-based full state space
exploration tool. Some preliminary Concuerror tests can be found in the
`test/machi_flu0_test.erl` module.
* [QuickCheck](http://www.quviq.com/products/erlang-quickcheck/), a
property-based testing tool for Erlang. QuickCheck doesn't provide
the reassurance of 100% state exploration, but it proven quite
effective at Basho for finding numerous subtle bugs.
* Automatic simulation of arbitrary network partition failures. This
code is already in progress and is used, for example, by the
`test/machi_chain_manager1_test.erl` module.
* TLA+ (future work), to try to create a rigorous model of the
algorithm and its behavior
If you'd like to work on additional testing of this component, please
[open a new GitHub Issue ticket](https://github.com/basho/machi) with
any questions you have. Or just open a GitHub pull request. <tt>^_^</tt>
## Compilation & unit testing
Use `make` and `make test`. Note that the Makefile assumes that the
@ -11,5 +53,33 @@ Use `make` and `make test`. Note that the Makefile assumes that the
Tested using Erlang/OTP R16B and Erlang/OTP 17, both on OS X.
It ought to "just work" on other versions of Erlang and on other OS
If you wish to run the PULSE test in
`test/machi_chain_manager1_pulse.erl` module, you must use Erlang
R16B and Quviq QuickCheck 1.30.2 -- there is a known problem with
QuickCheck 1.33.2, sorry!
Otherwise, it ought to "just work" on other versions of Erlang and on other OS
platforms, but sorry, I haven't tested it.
### Testing with simulated network partitions
One of the unit tests spits out **a tremendous amount** of verbose
logging information to the console. This test, the
`machi_chain_manager1_test:convergence_demo_test()`, isn't the typical
small unit test. Rather, it (ab)uses the EUnit framework to
automatically run this quite large test together with all of the other
tiny unit tests.
See the `doc/chain-self-management-sketch.org` file for details of how
the simulator works.
In summary, the simulator tries to emulate the effect of arbitrary
asymmetric network partitions. For example, for two simulated nodes A
and B, it's possible to have node A send messages to B, but B cannot
send messages to A.
This kind of one-way message passing is nearly impossible do with
distributed Erlang, because disterl uses TCP. If a network partition
happens at ISO Layer 2 (for example, due to a bad Ethernet cable that
has a faulty receive wire), the entire TCP connection will hang rather
than deliver disterl messages in only one direction.