WIP
This commit is contained in:
parent
4e5c16f5e2
commit
cd166361aa
1 changed files with 50 additions and 3 deletions
|
@ -118,14 +118,18 @@ log file for Erlang VM process #1.
|
||||||
2016-03-09 10:16:45.235 [info] <0.132.0> CONFIRM epoch 1152 <<173,17,66,225>> upi [f2] rep [f1,f3] auth f2 by f1
|
2016-03-09 10:16:45.235 [info] <0.132.0> CONFIRM epoch 1152 <<173,17,66,225>> upi [f2] rep [f1,f3] auth f2 by f1
|
||||||
2016-03-09 10:16:47.343 [info] <0.132.0> CONFIRM epoch 1154 <<154,231,224,149>> upi [f2,f1,f3] rep [] auth f2 by f1
|
2016-03-09 10:16:47.343 [info] <0.132.0> CONFIRM epoch 1154 <<154,231,224,149>> upi [f2,f1,f3] rep [] auth f2 by f1
|
||||||
|
|
||||||
Let's pick apart some of these lines.
|
Let's pick apart some of these lines. We have started all three
|
||||||
|
servers at about the same time. We see some race conditions happen,
|
||||||
|
and some jostling and readjustment happens pretty quickly in the first
|
||||||
|
few seconds.
|
||||||
|
|
||||||
* `Started FLU f1 with supervisor pid <0.128.0>` ; This VM, #1,
|
* `Started FLU f1 with supervisor pid <0.128.0>`
|
||||||
|
* This VM, #1,
|
||||||
started a FLU (Machi data server) with the name `f1`. In the Erlang
|
started a FLU (Machi data server) with the name `f1`. In the Erlang
|
||||||
process supervisor hierarchy, the process ID of the top supervisor
|
process supervisor hierarchy, the process ID of the top supervisor
|
||||||
is `<0.128.0>`.
|
is `<0.128.0>`.
|
||||||
* `Configured chain c1 via FLU f1 to mode=ap_mode all=[f1,f2,f3] witnesses=[]`
|
* `Configured chain c1 via FLU f1 to mode=ap_mode all=[f1,f2,f3] witnesses=[]`
|
||||||
A bootstrap configuration for a chain named `c1` has been created.
|
* A bootstrap configuration for a chain named `c1` has been created.
|
||||||
* The FLUs/data servers that are eligible for participation in the
|
* The FLUs/data servers that are eligible for participation in the
|
||||||
chain have names `f1`, `f2`, and `f3`.
|
chain have names `f1`, `f2`, and `f3`.
|
||||||
* The chain will operate in eventual consistency mode (`ap_mode`)
|
* The chain will operate in eventual consistency mode (`ap_mode`)
|
||||||
|
@ -143,6 +147,49 @@ Let's pick apart some of these lines.
|
||||||
empty, `[]`.
|
empty, `[]`.
|
||||||
* This projection was authored by server `f1`.
|
* This projection was authored by server `f1`.
|
||||||
* The log message was generated by server `f1`.
|
* The log message was generated by server `f1`.
|
||||||
|
* `CONFIRM epoch 1148 <<57,213,154,16>> upi [f1] rep [] auth f1 by f1`
|
||||||
|
* Now the server `f1` has created a chain of length 1, `[f1]`.
|
||||||
|
* Chain repair/file re-sync is not required when the UPI server list
|
||||||
|
changes from length 0 -> 1.
|
||||||
|
* `CONFIRM epoch 1151 <<239,29,39,70>> upi [f1] rep [f3] auth f1 by f1`
|
||||||
|
* Server `f1` has noticed that server `f3` is alive. Apparently it
|
||||||
|
has not yet noticed that server `f2` is also running.
|
||||||
|
* Server `f3` is in the repair list.
|
||||||
|
* `CONFIRM epoch 1152 <<173,17,66,225>> upi [f2] rep [f1,f3] auth f2 by f1`
|
||||||
|
* Server `f2` is apparently now aware that all three servers are running.
|
||||||
|
* The previous configuration used by `f2` was `upi [f2]`, i.e., `f2`
|
||||||
|
was running in a chain of one. `f2` noticed that `f1` and `f3`
|
||||||
|
were now available and has started adding them to the chain.
|
||||||
|
* All new servers are always added to the tail of the chain.
|
||||||
|
* In eventual consistency mode, a UPI change like this is OK.
|
||||||
|
* When performing a read, a client must read from both tail of the
|
||||||
|
UPI list and also from all repairing servers.
|
||||||
|
* When performing a write, the client writes to both the UPI
|
||||||
|
server list and also the repairing list, in that order.
|
||||||
|
* Server `f2` will trigger file repair/re-sync shortly.
|
||||||
|
* The waiting time for starting repair has been configured to be
|
||||||
|
extremely short, 1 second. The default waiting time is 10
|
||||||
|
seconds, in case Humming Consensus remains unstable.
|
||||||
|
* `CONFIRM epoch 1154 <<154,231,224,149>> upi [f2,f1,f3] rep [] auth f2 by f1`
|
||||||
|
* File repair/re-sync has finished. All file data on all servers
|
||||||
|
are now in sync.
|
||||||
|
* The UPI/in-sync part of the chain is now `[f2,f1,f3]`, and there
|
||||||
|
are no servers under repair.
|
||||||
|
|
||||||
|
## Let's create some failures
|
||||||
|
|
||||||
|
Here are some suggestions for creating failures.
|
||||||
|
|
||||||
|
* Use the `./dev/devN/bin/machi stop` and ``./dev/devN/bin/machi start`
|
||||||
|
commands to stop & start VM #`N`.
|
||||||
|
* Stop a VM abnormally by using `kill`. The OS process name to look
|
||||||
|
for is `beam.smp`.
|
||||||
|
* Suspend and resume a VM, using the `SIGSTOP` and `SIGCONT` signals.
|
||||||
|
* E.g. `kill -STOP 9823` and `kill -CONT 9823`
|
||||||
|
|
||||||
|
The network partition simulator is not (yet) available when running
|
||||||
|
Machi in this mode. Please see the next section for instructions on
|
||||||
|
how to use partition simulator.
|
||||||
|
|
||||||
|
|
||||||
<a name="partition-simulator">
|
<a name="partition-simulator">
|
||||||
|
|
Loading…
Reference in a new issue