This commit is contained in:
Scott Lystig Fritchie 2016-03-09 10:48:00 -08:00
parent 4e5c16f5e2
commit cd166361aa

View file

@ -118,14 +118,18 @@ log file for Erlang VM process #1.
2016-03-09 10:16:45.235 [info] <0.132.0> CONFIRM epoch 1152 <<173,17,66,225>> upi [f2] rep [f1,f3] auth f2 by f1 2016-03-09 10:16:45.235 [info] <0.132.0> CONFIRM epoch 1152 <<173,17,66,225>> upi [f2] rep [f1,f3] auth f2 by f1
2016-03-09 10:16:47.343 [info] <0.132.0> CONFIRM epoch 1154 <<154,231,224,149>> upi [f2,f1,f3] rep [] auth f2 by f1 2016-03-09 10:16:47.343 [info] <0.132.0> CONFIRM epoch 1154 <<154,231,224,149>> upi [f2,f1,f3] rep [] auth f2 by f1
Let's pick apart some of these lines. Let's pick apart some of these lines. We have started all three
servers at about the same time. We see some race conditions happen,
and some jostling and readjustment happens pretty quickly in the first
few seconds.
* `Started FLU f1 with supervisor pid <0.128.0>` ; This VM, #1, * `Started FLU f1 with supervisor pid <0.128.0>`
* This VM, #1,
started a FLU (Machi data server) with the name `f1`. In the Erlang started a FLU (Machi data server) with the name `f1`. In the Erlang
process supervisor hierarchy, the process ID of the top supervisor process supervisor hierarchy, the process ID of the top supervisor
is `<0.128.0>`. is `<0.128.0>`.
* `Configured chain c1 via FLU f1 to mode=ap_mode all=[f1,f2,f3] witnesses=[]` * `Configured chain c1 via FLU f1 to mode=ap_mode all=[f1,f2,f3] witnesses=[]`
A bootstrap configuration for a chain named `c1` has been created. * A bootstrap configuration for a chain named `c1` has been created.
* The FLUs/data servers that are eligible for participation in the * The FLUs/data servers that are eligible for participation in the
chain have names `f1`, `f2`, and `f3`. chain have names `f1`, `f2`, and `f3`.
* The chain will operate in eventual consistency mode (`ap_mode`) * The chain will operate in eventual consistency mode (`ap_mode`)
@ -143,6 +147,49 @@ Let's pick apart some of these lines.
empty, `[]`. empty, `[]`.
* This projection was authored by server `f1`. * This projection was authored by server `f1`.
* The log message was generated by server `f1`. * The log message was generated by server `f1`.
* `CONFIRM epoch 1148 <<57,213,154,16>> upi [f1] rep [] auth f1 by f1`
* Now the server `f1` has created a chain of length 1, `[f1]`.
* Chain repair/file re-sync is not required when the UPI server list
changes from length 0 -> 1.
* `CONFIRM epoch 1151 <<239,29,39,70>> upi [f1] rep [f3] auth f1 by f1`
* Server `f1` has noticed that server `f3` is alive. Apparently it
has not yet noticed that server `f2` is also running.
* Server `f3` is in the repair list.
* `CONFIRM epoch 1152 <<173,17,66,225>> upi [f2] rep [f1,f3] auth f2 by f1`
* Server `f2` is apparently now aware that all three servers are running.
* The previous configuration used by `f2` was `upi [f2]`, i.e., `f2`
was running in a chain of one. `f2` noticed that `f1` and `f3`
were now available and has started adding them to the chain.
* All new servers are always added to the tail of the chain.
* In eventual consistency mode, a UPI change like this is OK.
* When performing a read, a client must read from both tail of the
UPI list and also from all repairing servers.
* When performing a write, the client writes to both the UPI
server list and also the repairing list, in that order.
* Server `f2` will trigger file repair/re-sync shortly.
* The waiting time for starting repair has been configured to be
extremely short, 1 second. The default waiting time is 10
seconds, in case Humming Consensus remains unstable.
* `CONFIRM epoch 1154 <<154,231,224,149>> upi [f2,f1,f3] rep [] auth f2 by f1`
* File repair/re-sync has finished. All file data on all servers
are now in sync.
* The UPI/in-sync part of the chain is now `[f2,f1,f3]`, and there
are no servers under repair.
## Let's create some failures
Here are some suggestions for creating failures.
* Use the `./dev/devN/bin/machi stop` and ``./dev/devN/bin/machi start`
commands to stop & start VM #`N`.
* Stop a VM abnormally by using `kill`. The OS process name to look
for is `beam.smp`.
* Suspend and resume a VM, using the `SIGSTOP` and `SIGCONT` signals.
* E.g. `kill -STOP 9823` and `kill -CONT 9823`
The network partition simulator is not (yet) available when running
Machi in this mode. Please see the next section for instructions on
how to use partition simulator.
<a name="partition-simulator"> <a name="partition-simulator">