From cd166361aa61b156fbe344dea134ecc5aa67bfc6 Mon Sep 17 00:00:00 2001 From: Scott Lystig Fritchie Date: Wed, 9 Mar 2016 10:48:00 -0800 Subject: [PATCH] WIP --- doc/humming-consensus-demo.md | 53 +++++++++++++++++++++++++++++++++-- 1 file changed, 50 insertions(+), 3 deletions(-) diff --git a/doc/humming-consensus-demo.md b/doc/humming-consensus-demo.md index d00637c..5ef8e8d 100644 --- a/doc/humming-consensus-demo.md +++ b/doc/humming-consensus-demo.md @@ -118,14 +118,18 @@ log file for Erlang VM process #1. 2016-03-09 10:16:45.235 [info] <0.132.0> CONFIRM epoch 1152 <<173,17,66,225>> upi [f2] rep [f1,f3] auth f2 by f1 2016-03-09 10:16:47.343 [info] <0.132.0> CONFIRM epoch 1154 <<154,231,224,149>> upi [f2,f1,f3] rep [] auth f2 by f1 -Let's pick apart some of these lines. +Let's pick apart some of these lines. We have started all three +servers at about the same time. We see some race conditions happen, +and some jostling and readjustment happens pretty quickly in the first +few seconds. -* `Started FLU f1 with supervisor pid <0.128.0>` ; This VM, #1, +* `Started FLU f1 with supervisor pid <0.128.0>` + * This VM, #1, started a FLU (Machi data server) with the name `f1`. In the Erlang process supervisor hierarchy, the process ID of the top supervisor is `<0.128.0>`. * `Configured chain c1 via FLU f1 to mode=ap_mode all=[f1,f2,f3] witnesses=[]` - A bootstrap configuration for a chain named `c1` has been created. + * A bootstrap configuration for a chain named `c1` has been created. * The FLUs/data servers that are eligible for participation in the chain have names `f1`, `f2`, and `f3`. * The chain will operate in eventual consistency mode (`ap_mode`) @@ -143,6 +147,49 @@ Let's pick apart some of these lines. empty, `[]`. * This projection was authored by server `f1`. * The log message was generated by server `f1`. +* `CONFIRM epoch 1148 <<57,213,154,16>> upi [f1] rep [] auth f1 by f1` + * Now the server `f1` has created a chain of length 1, `[f1]`. + * Chain repair/file re-sync is not required when the UPI server list + changes from length 0 -> 1. +* `CONFIRM epoch 1151 <<239,29,39,70>> upi [f1] rep [f3] auth f1 by f1` + * Server `f1` has noticed that server `f3` is alive. Apparently it + has not yet noticed that server `f2` is also running. + * Server `f3` is in the repair list. +* `CONFIRM epoch 1152 <<173,17,66,225>> upi [f2] rep [f1,f3] auth f2 by f1` + * Server `f2` is apparently now aware that all three servers are running. + * The previous configuration used by `f2` was `upi [f2]`, i.e., `f2` + was running in a chain of one. `f2` noticed that `f1` and `f3` + were now available and has started adding them to the chain. + * All new servers are always added to the tail of the chain. + * In eventual consistency mode, a UPI change like this is OK. + * When performing a read, a client must read from both tail of the + UPI list and also from all repairing servers. + * When performing a write, the client writes to both the UPI + server list and also the repairing list, in that order. + * Server `f2` will trigger file repair/re-sync shortly. + * The waiting time for starting repair has been configured to be + extremely short, 1 second. The default waiting time is 10 + seconds, in case Humming Consensus remains unstable. +* `CONFIRM epoch 1154 <<154,231,224,149>> upi [f2,f1,f3] rep [] auth f2 by f1` + * File repair/re-sync has finished. All file data on all servers + are now in sync. + * The UPI/in-sync part of the chain is now `[f2,f1,f3]`, and there + are no servers under repair. + +## Let's create some failures + +Here are some suggestions for creating failures. + +* Use the `./dev/devN/bin/machi stop` and ``./dev/devN/bin/machi start` + commands to stop & start VM #`N`. +* Stop a VM abnormally by using `kill`. The OS process name to look + for is `beam.smp`. +* Suspend and resume a VM, using the `SIGSTOP` and `SIGCONT` signals. + * E.g. `kill -STOP 9823` and `kill -CONT 9823` + +The network partition simulator is not (yet) available when running +Machi in this mode. Please see the next section for instructions on +how to use partition simulator.