16 KiB
Table of contents
- Hands-on experiments with Machi and Humming Consensus
- Using the network partition simulator and convergence demo test code
Prerequisites
Please refer to the Machi development environment prerequisites doc for Machi developer environment prerequisites.
If you do not have an Erlang/OTP runtime system available, but you do have the Vagrant virtual machine manager available, then please refer to the instructions in the prerequisites doc for using Vagrant.
## Clone and compile the codePlease see the Machi 'clone and compile' doc for the short list of steps required to fetch the Machi source code from GitHub and to compile & test Machi.
Running three Machi instances on a single machine
All of the commands that should be run at your login shell (e.g. Bash, c-shell) can be cut-and-pasted from this document directly to your login shell prompt.
Run the following command:
make stagedevrel
This will create a directory structure like this:
|-dev1-|... stand-alone Machi app + subdirectories
|-dev-|-dev2-|... stand-alone Machi app + directories
|-dev3-|... stand-alone Machi app + directories
Each of the dev/dev1
, dev/dev2
, and dev/dev3
are stand-alone
application instances of Machi and can be run independently of each
other on the same machine. This demo will use all three.
The lifecycle management utilities for Machi are a bit immature,
currently. They assume that each Machi server runs on a host with a
unique hostname -- there is no flexibility built-in yet to easily run
multiple Machi instances on the same machine. To continue with the
demo, we need to use sudo
or su
to obtain superuser privileges to
edit the /etc/hosts
file.
Please add the following line to /etc/hosts
, using this command:
sudo sh -c 'echo "127.0.0.1 machi1 machi2 machi3" >> /etc/hosts'
Next, we will use a shell script to finish setting up our cluster. It will do the following for us:
- Verify that the new line that was added to
/etc/hosts
is correct. - Modify the
etc/app.config
files to configure the Humming Consensus chain manager's actions logged to thelog/console.log
file. - Start the three application instances.
- Verify that the three instances are running correctly.
- Configure a single chain, with one FLU server per application instance.
Please run this script using this command:
./priv/humming-consensus-demo.setup.sh
If the output looks like this (and exits with status zero), then the script was successful.
Step: Verify that the required entries in /etc/hosts are present
Step: add a verbose logging option to app.config
Step: start three three Machi application instances
pong
pong
pong
Step: configure one chain to start a Humming Consensus group with three members
Result: ok
Result: ok
Result: ok
We have now created a single replica chain, called c1
, that has
three file servers participating in the chain. Thanks to the
hostnames that we added to /etc/hosts
, all are using the localhost
network interface.
| App instance | Pseudo | FLU name | TCP port |
| directory | Hostname | | number |
|--------------+----------+----------+----------|
| dev1 | machi1 | flu1 | 20401 |
| dev2 | machi2 | flu2 | 20402 |
| dev3 | machi3 | flu3 | 20403 |
The log files for each application instance can be found in the
./dev/devN/log/console.log
file, where the N
is the instance
number: 1, 2, or 3.
Understanding the chain manager's log file output
After running the ./priv/humming-consensus-demo.setup.sh
script,
let's look at the last few lines of the ./dev/dev1/log/console.log
log file for Erlang VM process #1.
2016-03-09 10:16:35.676 [info] <0.105.0>@machi_lifecycle_mgr:process_pending_flu:422 Started FLU f1 with supervisor pid <0.128.0>
2016-03-09 10:16:35.676 [info] <0.105.0>@machi_lifecycle_mgr:move_to_flu_config:540 Creating FLU config file f1
2016-03-09 10:16:35.790 [info] <0.105.0>@machi_lifecycle_mgr:bootstrap_chain2:312 Configured chain c1 via FLU f1 to mode=ap_mode all=[f1,f2,f3] witnesses=[]
2016-03-09 10:16:35.790 [info] <0.105.0>@machi_lifecycle_mgr:move_to_chain_config:546 Creating chain config file c1
2016-03-09 10:16:44.139 [info] <0.132.0> CONFIRM epoch 1141 <<155,42,7,221>> upi [] rep [] auth f1 by f1
2016-03-09 10:16:44.271 [info] <0.132.0> CONFIRM epoch 1148 <<57,213,154,16>> upi [f1] rep [] auth f1 by f1
2016-03-09 10:16:44.864 [info] <0.132.0> CONFIRM epoch 1151 <<239,29,39,70>> upi [f1] rep [f3] auth f1 by f1
2016-03-09 10:16:45.235 [info] <0.132.0> CONFIRM epoch 1152 <<173,17,66,225>> upi [f2] rep [f1,f3] auth f2 by f1
2016-03-09 10:16:47.343 [info] <0.132.0> CONFIRM epoch 1154 <<154,231,224,149>> upi [f2,f1,f3] rep [] auth f2 by f1
Let's pick apart some of these lines. We have started all three servers at about the same time. We see some race conditions happen, and some jostling and readjustment happens pretty quickly in the first few seconds.
Started FLU f1 with supervisor pid <0.128.0>
- This VM, #1,
started a FLU (Machi data server) with the name
f1
. In the Erlang process supervisor hierarchy, the process ID of the top supervisor is<0.128.0>
.
- This VM, #1,
started a FLU (Machi data server) with the name
Configured chain c1 via FLU f1 to mode=ap_mode all=[f1,f2,f3] witnesses=[]
- A bootstrap configuration for a chain named
c1
has been created. - The FLUs/data servers that are eligible for participation in the
chain have names
f1
,f2
, andf3
. - The chain will operate in eventual consistency mode (
ap_mode
) - The witness server list is empty. Witness servers are never used in eventual consistency mode.
- A bootstrap configuration for a chain named
CONFIRM epoch 1141 <<155,42,7,221>> upi [] rep [] auth f1 by f1
- All participants in epoch 1141 are unanimous in adopting epoch
1141's projection. All active membership lists are empty, so
there is no functional chain replication yet, at least as far as
server
f1
knows - The epoch's abbreviated checksum is
<<155,42,7,221>>
. - The UPI list, i.e. the replicas whose data is 100% in sync is
[]
, the empty list. (UPI = Update Propagation Invariant) - The list of servers that are under data repair (
rep
) is also empty,[]
. - This projection was authored by server
f1
. - The log message was generated by server
f1
.
- All participants in epoch 1141 are unanimous in adopting epoch
1141's projection. All active membership lists are empty, so
there is no functional chain replication yet, at least as far as
server
CONFIRM epoch 1148 <<57,213,154,16>> upi [f1] rep [] auth f1 by f1
- Now the server
f1
has created a chain of length 1,[f1]
. - Chain repair/file re-sync is not required when the UPI server list changes from length 0 -> 1.
- Now the server
CONFIRM epoch 1151 <<239,29,39,70>> upi [f1] rep [f3] auth f1 by f1
- Server
f1
has noticed that serverf3
is alive. Apparently it has not yet noticed that serverf2
is also running. - Server
f3
is in the repair list.
- Server
CONFIRM epoch 1152 <<173,17,66,225>> upi [f2] rep [f1,f3] auth f2 by f1
- Server
f2
is apparently now aware that all three servers are running. - The previous configuration used by
f2
wasupi [f2]
, i.e.,f2
was running in a chain of one.f2
noticed thatf1
andf3
were now available and has started adding them to the chain. - All new servers are always added to the tail of the chain in the repair list.
- In eventual consistency mode, a UPI change like this is OK.
- When performing a read, a client must read from both tail of the UPI list and also from all repairing servers.
- When performing a write, the client writes to both the UPI
server list and also the repairing list, in that order.
- I.e., the client concatenates both lists,
UPI ++ Repairing
, for its chain configuration for the write.
- I.e., the client concatenates both lists,
- Server
f2
will trigger file repair/re-sync shortly.- The waiting time for starting repair has been configured to be extremely short, 1 second. The default waiting time is 10 seconds, in case Humming Consensus remains unstable.
- Server
CONFIRM epoch 1154 <<154,231,224,149>> upi [f2,f1,f3] rep [] auth f2 by f1
- File repair/re-sync has finished. All file data on all servers are now in sync.
- The UPI/in-sync part of the chain is now
[f2,f1,f3]
, and there are no servers under repair.
Let's create some failures
Here are some suggestions for creating failures.
- Use the
./dev/devN/bin/machi stop
and./dev/devN/bin/machi start
commands to stop & start VM #N
. - Stop a VM abnormally by using
kill
. The OS process name to look for isbeam.smp
. - Suspend and resume a VM, using the
SIGSTOP
andSIGCONT
signals.- E.g.
kill -STOP 9823
andkill -CONT 9823
- E.g.
The network partition simulator is not (yet) available when running Machi in this mode. Please see the next section for instructions on how to use partition simulator.
# Using the network partition simulator and convergence demo test codeThis is the demo code mentioned in the presentation that Scott Lystig Fritchie gave at the RICON 2015 conference.
A complete example of all input and output
If you don't have an Erlang/OTP 17 runtime environment available, please see this file for full input and output of a strong consistency length=3 chain test: https://gist.github.com/slfritchie/8352efc88cc18e62c72c This file contains all commands input and all simulator output from a sample run of the simulator.
To help interpret the output of the test, please skip ahead to the "The test output is very verbose" section.
Prerequisites
If you don't have git
and/or the Erlang 17 runtime system available
on your OS X, FreeBSD, Linux, or Solaris machine, please take a look
at the Prerequisites section first. When you have
installed the prerequisite software, please return back here.
Clone and compile the code
Please briefly visit the Clone and compile the code section. When finished, please return back here.
Run an interactive Erlang CLI shell
Run the following command at your login shell:
erl -pz .eunit ebin deps/*/ebin
If you are using Erlang/OTP version 17, you should see some CLI output that looks like this:
Erlang/OTP 17 [erts-6.4] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]
Eshell V6.4 (abort with ^G)
1>
The test output is very verbose ... what are the important parts?
The output of the Erlang command
machi_chain_manager1_converge_demo:help()
will display the following
guide to the output of the tests.
A visualization of the convergence behavior of the chain self-management
algorithm for Machi.
1. Set up some server and chain manager pairs.
2. Create a number of different network partition scenarios, where
(simulated) partitions may be symmetric or asymmetric. Then stop changing
the partitions and keep the simulated network stable (and perhaps broken).
3. Run a number of iterations of the algorithm in parallel by poking each
of the manager processes on a random'ish basis.
4. Afterward, fetch the chain transition changes made by each FLU and
verify that no transition was unsafe.
During the iteration periods, the following is a cheatsheet for the output.
See the internal source for interpreting the rest of the output.
'SET partitions = '
A pair-wise list of actors which cannot send messages. The
list is uni-directional. If there are three servers (a,b,c),
and if the partitions list is '[{a,b},{b,c}]' then all
messages from a->b and b->c will be dropped, but any other
sender->recipient messages will be delivered successfully.
'x uses:'
The FLU x has made an internal state transition and is using
this epoch's projection as operating chain configuration. The
rest of the line is a summary of the projection.
'CONFIRM epoch {N}'
This message confirms that all of the servers listed in the
UPI and repairing lists of the projection at epoch {N} have
agreed to use this projection because they all have written
this projection to their respective private projection stores.
The chain is now usable by/available to all clients.
'Sweet, private projections are stable'
This report announces that this iteration of the test cycle
has passed successfully. The report that follows briefly
summarizes the latest private projection used by each
participating server. For example, when in strong consistency
mode with 'a' as a witness and 'b' and 'c' as real servers:
%% Legend:
%% server name, epoch ID, UPI list, repairing list, down list, ...
%% ... witness list, 'false' (a constant value)
[{a,{{1116,<<23,143,246,55>>},[a,b],[],[c],[a],false}},
{b,{{1116,<<23,143,246,55>>},[a,b],[],[c],[a],false}}]
Both servers 'a' and 'b' agree on epoch 1116 with epoch ID
{1116,<<23,143,246,55>>} where UPI=[a,b], repairing=[],
down=[c], and witnesses=[a].
Server 'c' is not shown because 'c' has wedged itself OOS (out
of service) by configuring a chain length of zero.
If no servers are listed in the report (i.e. only '[]' is
displayed), then all servers have wedged themselves OOS, and
the chain is unavailable.
'DoIt,'
This marks a group of tick events which trigger the manager
processes to evaluate their environment and perhaps make a
state transition.
A long chain of 'DoIt,DoIt,DoIt,' means that the chain state has
(probably) settled to a stable configuration, which is the goal of the
algorithm.
Press control-c to interrupt the test....".
Run a test in eventual consistency mode
Run the following command at the Erlang CLI prompt:
machi_chain_manager1_converge_demo:t(3, [{private_write_verbose,true}]).
The first argument, 3
, is the number of servers to participate in
the chain. Please note:
- Chain lengths as short as 1 or 2 are valid, but the results are a bit boring.
- Chain lengths as long as 7 or 9 can be used, but they may suffer from longer periods of churn/instability before all chain managers reach agreement via humming consensus. (It is future work to shorten the worst of the unstable churn latencies.)
- In eventual consistency mode, chain lengths may be even numbers, e.g. 2, 4, or 6.
- The simulator will choose partition events from the permutations of
all 1, 2, and 3 node partition pairs. The total runtime will
increase dramatically with chain length.
- Chain length 2: about 3 partition cases
- Chain length 3: about 35 partition cases
- Chain length 4: about 230 partition cases
- Chain length 5: about 1100 partition cases
Run a test in strong consistency mode (with witnesses):
NOTE: Due to a bug in the test code, please do not try to run the convergence test in strong consistency mode and also without the correct minority number of witness servers! If in doubt, please run the commands shown below exactly.
Run the following command at the Erlang CLI prompt:
machi_chain_manager1_converge_demo:t(3, [{private_write_verbose,true}, {consistency_mode, cp_mode}, {witnesses, [a]}]).
The first argument, 3
, is the number of servers to participate in
the chain. Chain lengths as long as 7 or 9 can be used, but they may
suffer from longer periods of churn/instability before all chain
managers reach agreement via humming consensus.
Due to the bug mentioned above, please use the following commands when running with chain lengths of 5 or 7, respectively.
machi_chain_manager1_converge_demo:t(5, [{private_write_verbose,true}, {consistency_mode, cp_mode}, {witnesses, [a,b]}]).
machi_chain_manager1_converge_demo:t(7, [{private_write_verbose,true}, {consistency_mode, cp_mode}, {witnesses, [a,b,c]}]).