diff --git a/.gitignore b/.gitignore
index 8e46d5a..2693865 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,10 +1,9 @@
 .eunit
 deps
 *.o
-*.beam
+ebin/*.beam
 *.plt
 erl_crash.dump
-ebin
 rel/example_project
 .concrete/DEV_MODE
 .rebar
diff --git a/doc/chain-self-management-sketch.Diagram1.dia b/doc/chain-self-management-sketch.Diagram1.dia
new file mode 100644
index 0000000..6e42b7e
Binary files /dev/null and b/doc/chain-self-management-sketch.Diagram1.dia differ
diff --git a/doc/chain-self-management-sketch.Diagram1.pdf b/doc/chain-self-management-sketch.Diagram1.pdf
new file mode 100644
index 0000000..4271839
Binary files /dev/null and b/doc/chain-self-management-sketch.Diagram1.pdf differ
diff --git a/doc/chain-self-management-sketch.org b/doc/chain-self-management-sketch.org
new file mode 100644
index 0000000..07cfd41
--- /dev/null
+++ b/doc/chain-self-management-sketch.org
@@ -0,0 +1,672 @@
+-*- mode: org; -*-
+#+TITLE: Machi Chain Self-Management Sketch
+#+AUTHOR: Scott
+#+STARTUP: lognotedone hidestars indent showall inlineimages
+#+SEQ_TODO: TODO WORKING WAITING DONE
+
+* Abstract
+Yo, this is the first draft of a document that attempts to describe a
+proposed self-management algorithm for Machi's chain replication.
+Welcome!  Sit back and enjoy the disjointed prose.
+
+We attempt to describe first the self-management and self-reliance
+goals of the algorithm.  Then we make a side trip to talk about
+write-once registers and how they're used by Machi, but we don't
+really fully explain exactly why write-once is so critical (why not
+general purpose registers?) ... but they are indeed critical.  Then we
+sketch the algorithm by providing detailed annotation of a flowchart,
+then let the flowchart speak for itself, because writing good prose is
+prose is damn hard, but flowcharts are very specific and concise.
+
+Finally, we try to discuss the network partition simulator that the
+algorithm runs in and how the algorithm behaves in both symmetric and
+asymmetric network partition scenarios.  The symmetric partition cases
+are all working well (surprising in a good way), and the asymmetric
+partition cases are working well (in a damn mystifying kind of way).
+It'd be really, *really* great to get more review of the algorithm and
+the simulator.
+
+* Copyright
+%% Copyright (c) 2015 Basho Technologies, Inc.  All Rights Reserved.
+%%
+%% This file is provided to you under the Apache License,
+%% Version 2.0 (the "License"); you may not use this file
+%% except in compliance with the License.  You may obtain
+%% a copy of the License at
+%%
+%%   http://www.apache.org/licenses/LICENSE-2.0
+%%
+%% Unless required by applicable law or agreed to in writing,
+%% software distributed under the License is distributed on an
+%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+%% KIND, either express or implied.  See the License for the
+%% specific language governing permissions and limitations
+%% under the License.
+
+* TODO Naming: possible ideas
+** Humming consensus?
+
+See [[https://tools.ietf.org/html/rfc7282][On Consensus and Humming in the IETF]], RFC 7282.
+
+** Tunesmith?
+
+A mix of orchestral conducting, music composition, humming?
+
+** Foggy consensus?
+
+CORFU-like consensus between mist-shrouded islands of network
+partitions
+
+** Rough consensus
+
+This is my favorite, but it might be too close to handwavy/vagueness
+of English language, even with a precise definition and proof
+sketching?
+
+** Let the bikeshed continue!
+
+I agree with Chris: there may already be a definition that's close
+enough to "rough consensus" to continue using that existing tag than
+to invent a new one.  TODO: more research required
+
+* What does "self-management" mean in this context?
+
+For the purposes of this document, chain replication self-management
+is the ability for the N nodes in an N-length chain replication chain
+to manage the state of the chain without requiring an external party
+to participate.  Chain state includes:
+
+1. Preserve data integrity of all data stored within the chain.  Data
+   loss is not an option.
+2. Stably preserve knowledge of chain membership (i.e. all nodes in
+   the chain, regardless of operational status). A systems
+   administrators is expected to make "permanent" decisions about
+   chain membership.
+3. Use passive and/or active techniques to track operational
+   state/status, e.g., up, down, restarting, full data sync, partial
+   data sync, etc.
+4. Choose the run-time replica ordering/state of the chain, based on
+   current member status and past operational history.  All chain
+   state transitions must be done safely and without data loss or
+   corruption.
+5. As a new node is added to the chain administratively or old node is
+   restarted, add the node to the chain safely and perform any data
+   synchronization/"repair" required to bring the node's data into
+   full synchronization with the other nodes.
+
+* Goals
+** Better than state-of-the-art: Chain Replication self-management
+
+We hope/believe that this new self-management algorithem can improve
+the current state-of-the-art by eliminating all external management
+entities.  Current state-of-the-art for management of chain
+replication chains is discussed below, to provide historical context.
+
+*** "Leveraging Sharding in the Design of Scalable Replication Protocols" by Abu-Libdeh, van Renesse, and Vigfusson.
+
+Multiple chains are arranged in a ring (called a "band" in the paper).
+The responsibility for managing the chain at position N is delegated
+to chain N-1.  As long as at least one chain is running, that is
+sufficient to start/bootstrap the next chain, and so on until all
+chains are running.  (The paper then estimates mean-time-to-failure
+(MTTF) and suggests a "band of bands" topology to handle very large
+clusters while maintaining an MTTF that is as good or better than
+other management techniques.)
+
+If the chain self-management method proposed for Machi does not
+succeed, this paper's technique is our best fallback recommendation.
+
+*** An external management oracle, implemented by ZooKeeper
+
+This is not a recommendation for Machi: we wish to avoid using ZooKeeper.
+However, many other open and closed source software products use
+ZooKeeper for exactly this kind of data replica management problem.
+
+*** An external management oracle, implemented by Riak Ensemble
+
+This is a much more palatable choice than option #2 above.  We also
+wish to avoid an external dependency on something as big as Riak
+Ensemble.  However, if it comes between choosing Riak Ensemble or
+choosing ZooKeeper, the choice feels quite clear: Riak Ensemble will
+win, unless there is some critical feature missing from Riak
+Ensemble.  If such an unforseen missing feature is discovered, it
+would probably be preferable to add the feature to Riak Ensemble
+rather than to use ZooKeeper (and document it and provide product
+support for it and so on...).
+
+** Support both eventually consistent & strongly consistent modes of operation
+
+Machi's first use case is for Riak CS, as an eventually consistent
+store for CS's "block" storage.  Today, Riak KV is used for "block"
+storage.  Riak KV is an AP-style key-value store; using Machi in an
+AP-style mode would match CS's current behavior from points of view of
+both code/execution and human administrator exectations.
+
+Later, we wish the option of using CP support to replace other data
+store services that Riak KV provides today.  (Scope and timing of such
+replacement TBD.)
+
+We believe this algorithm allows a Machi cluster to fragment into
+arbitrary islands of network partition, all the way down to 100% of
+members running in complete network isolation from each other.
+Furthermore, it provides enough agreement to allow
+formerly-partitioned members to coordinate the reintegration &
+reconciliation of their data when partitions are healed.
+
+** Preserve data integrity of Chain Replicated data
+
+While listed last in this section, preservation of data integrity is
+paramount to any chain state management technique for Machi.
+
+** Anti-goal: minimize churn
+
+This algorithm's focus is data safety and not availability.  If
+participants have differing notions of time, e.g., running on
+extremely fast or extremely slow hardware, then this algorithm will
+"churn" in different states where the chain's data would be
+effectively unavailable.
+
+In practice, however, any series of network partition changes that
+case this algorithm to churn will cause other management techniques
+(such as an external "oracle") similar problems.  [Proof by handwaving
+assertion.]  See also: "time model" assumptions (below).
+
+* Assumptions
+** Introduction to assumptions, why they differ from other consensus algorithms
+
+Given a long history of consensus algorithms (viewstamped replication,
+Paxos, Raft, et al.), why bother with a slightly different set of
+assumptions and a slightly different protocol?
+
+The answer lies in one of our explicit goals: to have an option of
+running in an "eventually consistent" manner.  We wish to be able to
+make progress, i.e., remain available in the CAP sense, even if we are
+partitioned down to a single isolated node.  VR, Paxos, and Raft
+alone are not sufficient to coordinate service availability at such
+small scale.
+
+** The CORFU protocol is correct
+
+This work relies tremendously on the correctness of the CORFU
+protocol, a cousin of the Paxos protocol.  If the implementation of
+this self-management protocol breaks an assumption or prerequisite of
+CORFU, then we expect that the implementation will be flawed.
+
+** Communication model: Asyncronous message passing 
+*** Unreliable network: messages may be arbitrarily dropped and/or reordered
+**** Network partitions may occur at any time
+**** Network partitions may be asymmetric: msg A->B is ok but B->A fails
+*** Messages may be corrupted in-transit
+**** Assume that message MAC/checksums are sufficient to detect corruption
+**** Receiver informs sender of message corruption
+**** Sender may resend, if/when desired
+*** System particpants may be buggy but not actively malicious/Byzantine
+** Time model: per-node clocks, loosely synchronized (e.g. NTP)
+
+The protocol & algorithm presented here do not specify or require any
+timestamps, physical or logical.  Any mention of time inside of data
+structures are for human/historic/diagnostic purposes only.
+
+Having said that, some notion of physical time is suggested for
+purposes of efficiency.  It's recommended that there be some "sleep
+time" between iterations of the algorithm: there is no need to "busy
+wait" by executing the algorithm as quickly as possible.  See below,
+"sleep intervals between executions".
+
+** Failure detector model: weak, fallible, boolean
+
+We assume that the failure detector that the algorithm uses is weak,
+it's fallible, and it informs the algorithm in boolean status
+updates/toggles as a node becomes available or not.
+
+If the failure detector is fallible and tells us a mistaken status
+change, then the algorithm will "churn" the operational state of the
+chain, e.g. by removing the failed node from the chain or adding a
+(re)started node (that may not be alive) to the end of the chain.
+Such extra churn is regrettable and will cause periods of delay as the
+"rough consensus" (decribed below) decision is made.  However, the
+churn cannot (we assert/believe) cause data loss.
+
+** The "wedge state", as described by the Machi RFC & CORFU
+
+A chain member enters "wedge state" when it receives information that
+a newer projection (i.e., run-time chain state reconfiguration) is
+available.  The new projection may be created by a system
+administrator or calculated by the self-management algorithm.
+Notification may arrive via the projection store API or via the file
+I/O API.
+
+When in wedge state, the server/FLU will refuse all file write I/O API
+requests until the self-management algorithm has determined that
+"rough consensus" has been decided (see next bullet item).  The server
+may also refuse file read I/O API requests, depending on its CP/AP
+operation mode.
+
+See the Machi RFC for more detail of the wedge state and also the
+CORFU papers.
+
+** "Rough consensus": consensus built upon data that is *visible now*
+
+CS literature uses the word "consensus" in the context of the problem
+description at
+[[http://en.wikipedia.org/wiki/Consensus_(computer_science)#Problem_description]].
+This traditional definition differs from what is described in this
+document.
+
+The phrase "rough consensus" will be used to describe
+consensus derived only from data that is visible/known at the current
+time.  This implies that a network partition may be in effect and that
+not all chain members are reachable.  The algorithm will calculate
+"rough consensus" despite not having input from all/majority/minority
+of chain members.  "Rough consensus" may proceed to make a
+decision based on data from only a single participant, i.e., the local
+node alone.
+
+When operating in AP mode, i.e., in eventual consistency mode, "rough
+consensus" could mean that an chain of length N could split into N
+independent chains of length 1.  When a network partition heals, the
+rough consensus is sufficient to manage the chain so that each
+replica's data can be repaired/merged/reconciled safely.
+(Other features of the Machi system are designed to assist such
+repair safely.)
+
+When operating in CP mode, i.e., in strong consistency mode, "rough
+consensus" would require additional supplements.  For example, any
+chain that didn't have a minimum length of the quorum majority size of
+all members would be invalid and therefore would not move itself out
+of wedged state.  In very general terms, this requirement for a quorum
+majority of surviving participants is also a requirement for Paxos,
+Raft, and ZAB.
+
+(Aside: The Machi RFC also proposes using "witness" chain members to
+make service more available, e.g. quorum majority of "real" plus
+"witness" nodes *and* at least one member must be a "real" node.  See
+the Machi RFC for more details.)
+
+** Heavy reliance on a key-value store that maps write-once registers
+
+The projection store is implemented using "write-once registers"
+inside a key-value store: for every key in the store, the value must
+be either of:
+
+- The special 'unwritten' value
+- An application-specific binary blob that is immutable thereafter
+  
+* The projection store, built with write-once registers
+
+- NOTE to the reader: The notion of "public" vs. "private" projection
+  stores does not appear in the Machi RFC.
+
+Each participating chain node has its own "projection store", which is
+a specialized key-value store.  As a whole, a node's projection store
+is implemented using two different key-value stores:
+
+- A publicly-writable KV store of write-once registers
+- A privately-writable KV store of write-once registers
+
+Both stores may be read by any cluster member.
+
+The store's key is a positive integer; the integer represents the
+epoch number of the projection.  The store's value is an opaque
+binary blob whose meaning is meaningful only to the store's clients.
+
+See the Machi RFC for more detail on projections and epoch numbers.
+
+** The publicly-writable half of the projection store
+
+The publicly-writable projection store is used to share information
+during the first half of the self-management algorithm.  Any chain
+member may write a projection to this store.
+
+** The privately-writable half of the projection store
+
+The privately-writable projection store is used to store the "rough
+consensus" result that has been calculated by the local node.  Only
+the local server/FLU may write values into this store.
+
+The private projection store serves multiple purposes, including:
+
+- remove/clear the local server from "wedge state"
+- act as the store of record for chain state transitions
+- communicate to remote nodes the past states and current operational
+  state of the local node
+
+* Modification of CORFU-style epoch numbering and "wedge state" triggers
+
+According to the CORFU research papers, if a server node N or client
+node C believes that epoch E is the latest epoch, then any information
+that N or C receives from any source that an epoch E+delta (where
+delta > 0) exists will push N into the "wedge" state and C into a mode
+of searching for the projection definition for the newest epoch.
+
+In the algorithm sketch below, it should become clear that it's
+possible to have a race where two nodes may attempt to make proposals
+for a single epoch number.  In the simplest case, assume a chain of
+nodes A & B.  Assume that a symmetric network partition between A & B
+happens, and assume we're operating in AP/eventually consistent mode.
+
+On A's network partitioned island, A can choose a UPI list of `[A]'.
+Similarly B can choose a UPI list of `[B]'.  Both might choose the
+epoch for their proposal to be #42.  Because each are separated by
+network partition, neither can realize the conflict.  However, when
+the network partition heals, it can become obvious that there are
+conflicting values for epoch #42 ... but if we use CORFU's protocol
+design, which identifies the epoch identifier as an integer only, then
+the integer 42 alone is not sufficient to discern the differences
+between the two projections.
+
+The proposal modifies all use of CORFU's projection identifier
+to use the identifier below instead.  (A later section of this
+document presents a detailed example.)
+
+#+BEGIN_SRC
+{epoch #, hash of the entire projection (minus hash field itself)}
+#+END_SRC
+
+* Sketch of the self-management algorithm
+** Introduction
+See also, the diagram (((Diagram1.eps))), a flowchart of the
+algorithm.  The code is structured as a state machine where function
+executing for the flowchart's state is named by the approximate
+location of the state within the flowchart.  The flowchart has three
+columns:
+
+1. Column A: Any reason to change?
+2. Column B: Do I act?
+3. Column C: How do I act?
+
+States in each column are numbered in increasing order, top-to-bottom.
+
+** Flowchart notation
+- Author: a function that returns the author of a projection, i.e.,
+  the node name of the server that proposed the projection.
+
+- Rank: assigns a numeric score to a projection.  Rank is based on the
+  epoch number (higher wins), chain length (larger wins), number &
+  state of any repairing members of the chain (larger wins), and node
+  name of the author server (as a tie-breaking criteria).
+
+- E: the epoch number of a projection.
+
+- UPI: "Update Propagation Invariant".  The UPI part of the projection
+  is the ordered list of chain members where the UPI is preserved,
+  i.e., all UPI list members have their data fully synchronized
+  (except for updates in-process at the current instant in time).
+
+- Repairing: the ordered list of nodes that are in "repair mode",
+  i.e., synchronizing their data with the UPI members of the chain.
+
+- Down: the list of chain members believed to be down, from the
+  perspective of the author.  This list may be constructed from
+  information from the failure detector and/or by status of recent
+  attempts to read/write to other nodes' public projection store(s).
+
+- P_current: local node's projection that is actively used.  By
+  definition, P_current is the latest projection (i.e. with largest
+  epoch #) in the local node's private projection store.
+
+- P_newprop: the new projection proposal that is calculated locally,
+  based on local failure detector info & other data (e.g.,
+  success/failure status when reading from/writing to remote nodes'
+  projection stores).
+
+- P_latest: this is the highest-ranked projection with the largest
+  single epoch # that has been read from all available public
+  projection stores, including the local node's public store.
+
+- Unanimous: The P_latest projections are unanimous if they are
+  effectively identical.  Minor differences such as creation time may
+  be ignored, but elements such as the UPI list must not be ignored.
+  NOTE: "unanimous" has nothing to do with the number of projections
+  compared, "unanimous" is *not* the same as a "quorum majority".
+
+- P_current -> P_latest transition safe?: A predicate function to
+  check the sanity & safety of the transition from the local node's
+  P_current to the P_newprop, which must be unanimous at state C100.
+
+- Stop state: one iteration of the self-management algorithm has
+  finished on the local node.  The local node may execute a new
+  iteration at any time.
+
+** Column A: Any reason to change?
+*** A10: Set retry counter to 0
+*** A20: Create a new proposed projection based on the current projection
+*** A30: Read copies of the latest/largest epoch # from all nodes
+*** A40: Decide if the local proposal P_newprop is "better" than P_latest
+** Column B: Do I act?
+*** B10: 1. Is the latest proposal unanimous for the largest epoch #?
+*** B10: 2. Is the retry counter too big?
+*** B10: 3. Is another node's proposal "ranked" equal or higher to mine?
+** Column C: How to act?
+*** C1xx: Save latest proposal to local private store, unwedge, stop.
+*** C2xx: Ping author of latest to try again, then wait, then repeat alg.
+*** C3xx: My new proposal appears best: write @ all public stores, repeat alg
+
+** Flowchart notes
+*** Algorithm execution rates / sleep intervals between executions
+
+Due to the ranking algorithm's preference for author node names that
+are small (lexicographically), nodes with smaller node names should
+execute the algorithm more frequently than other nodes.  The reason
+for this is to try to avoid churn: a proposal by a "big" node may
+propose a UPI list of L at epoch 10, and a few moments later a "small"
+node may propose the same UPI list L at epoch 11.  In this case, there
+would be two chain state transitions: the epoch 11 projection would be
+ranked higher than epoch 10's projeciton.  If the "small" node
+executed more frequently than the "big" node, then it's more likely
+that epoch 10 would be written by the "small" node, which would then
+cause the "big" node to stop at state A40 and avoid any
+externally-visible action.
+
+*** Transition safety checking
+
+In state C100, the transition from P_current -> P_latest is checked
+for safety and sanity.  The conditions used for the check include:
+
+1. The Erlang data types of all record members are correct.
+2. UPI, down, & repairing lists contain no duplicates and are in fact
+   mutually disjoint.
+3. The author node is not down (as far as we can tell).
+4. Any additions in P_latest in the UPI list must appear in the tail
+   of the UPI list and were formerly in P_current's repairing list.
+5. No re-ordering of the UPI list members: P_latest's UPI list prefix
+   must be exactly equal to P_current's UPI prefix, and any P_latest's
+   UPI list suffix must in the same order as they appeared in
+   P_current's repairing list.
+
+The safety check may be performed pair-wise once or pair-wise across
+the entire history sequence of a server/FLU's private projection
+store.
+
+*** A simple example race between two participants noting a 3rd's failure
+
+Assume a chain of three nodes, A, B, and C.  In a projection at epoch
+E.  For all nodes, the P_current projection at epoch E is:
+
+#+BEGIN_QUOTE
+UPI=[A,B,C], Repairing=[], Down=[]
+#+END_QUOTE
+
+Now assume that C crashes during epoch E.  The failure detector
+running locally at both A & B eventually notice C's death.  The new
+information triggers a new iteration of the self-management algorithm.
+A calculates its P_newprop (call it P_newprop_a) and writes it to its
+own public projection store.  Meanwhile, B does the same and wins the
+race to write P_newprop_b to its own public projection store.
+
+At this instant in time, the public projection stores of each node
+looks something like this:
+
+|-------+--------------+--------------+--------------|
+| Epoch | Node A       | Node B       | Node C       |
+|-------+--------------+--------------+--------------|
+| E     | UPI=[A,B,C]  | UPI=[A,B,C]  | UPI=[A,B,C]  |
+|       | Repairing=[] | Repairing=[] | Repairing=[] |
+|       | Down=[]      | Down=[]      | Down=[]      |
+|       | Author=A     | Author=A     | Author=A     |
+|-------+--------------+--------------+--------------|
+| E+1   | UPI=[A,B]    | UPI=[A,B]    | C is dead,   |
+|       | Repairing=[] | Repairing=[] | unwritten    |
+|       | Down=[C]     | Down=[C]     |              |
+|       | Author=A     | Author=B     |              |
+|-------+--------------+--------------+--------------|
+
+If we use the CORFU-style projection naming convention, where a
+projection's name is exactly equal to the epoch number, then all
+participants cannot tell the difference between the projection at
+epoch E+1 authored by node A from the projection at epoch E+1 authored
+by node B: the names are the same, i.e., E+1.
+
+Machi must extend the original CORFU protocols by changing the name of
+the projection.  In Machi's case, the projection is named by this
+2-tuple: 
+#+BEGIN_SRC
+{epoch #, hash of the entire projection (minus hash field itself)}
+#+END_SRC
+
+This name is used in all relevant APIs where the name is required to
+make a wedge state transition.  In the case of the example & table
+above, all of the UPI & Repairing & Down lists are equal.  However, A
+& B's unanimity is due to the symmetric nature of C's partition: C is
+dead.  In the case of an asymmetric partition of C, it is indeed
+possible for A's version of epoch E+1's UPI list to be different from
+B's UPI list in the same epoch E+1.
+
+*** A second example, building on the first example
+
+Building on the first example, let's assume that A & B have reconciled
+their proposals for epoch E+2.  Nodes A & B are running under a
+unanimous proposal at E+2.
+
+|-------+--------------+--------------+--------------|
+| E+2   | UPI=[A,B]    | UPI=[A,B]    | C is dead,   |
+|       | Repairing=[] | Repairing=[] | unwritten    |
+|       | Down=[C]     | Down=[C]     |              |
+|       | Author=A     | Author=A     |              |
+|-------+--------------+--------------+--------------|
+
+Now assume that C restarts.  It was dead for a little while, and its
+code is slightly buggy.  Node C decides to make a proposal without
+first consulting its failure detector: let's assume that C believes
+that only C is alive.  Also, C knows that epoch E was the last epoch
+valid before it crashed, so it decides that it will write its new
+proposal at E+2.  The result is a set of public projection stores that
+look like this:
+
+|-----+--------------+--------------+--------------|
+| E+2 | UPI=[A,B]    | UPI=[A,B]    | UPI=[C]      |
+|     | Repairing=[] | Repairing=[] | Repairing=[] |
+|     | Down=[C]     | Down=[C]     | Down=[A,B]   |
+|     | Author=A     | Author=A     |              |
+|-----+--------------+--------------+--------------|
+
+Now we're in a pickle where a client C could read the latest
+projection from node C and get a different view of the world than if
+it had read the latest projection from nodes A or B.
+
+If running in AP mode, this wouldn't be a big problem: a write to node
+C only (or a write to nodes A & B only) would be reconciled
+eventually.  Also, eventually, one of the nodes would realize that C
+was no longer partitioned and would make a new proposal at epoch E+3.
+
+If running in CP mode, then any client that attempted to use C's
+version of the E+2 projection would fail: the UPI list does not
+contain a quorum majority of nodes.  (Other discussion of CP mode's
+use of quorum majority for UPI members is out of scope of this
+document.  Also out of scope is the use of "witness servers" to
+augment the quorum majority UPI scheme.)
+
+* The Simulator
+** Overview
+The function machi_chain_manager1_test:convergence_demo_test()
+executes the following in a simulated network environment within a
+single Erlang VM:
+
+#+BEGIN_QUOTE
+Test the convergence behavior of the chain self-management algorithm
+for Machi.
+
+  1. Set up 4 FLUs and chain manager pairs.
+
+  2. Create a number of different network partition scenarios, where
+     (simulated) partitions may be symmetric or asymmetric.  (At the
+     Seattle 2015 meet-up, I called this the "shaking the snow globe"
+     phase, where asymmetric network partitions are simulated and are
+     calculated at random differently for each simulated node.  During
+     this time, the simulated network is wildly unstable.)
+
+  3. Then halt changing the partitions and keep the simulated network
+     stable.  The simulated may remain broken (i.e. at least one
+     asymmetric partition remains in effect), but at least it's
+     stable.
+
+  4. Run a number of iterations of the algorithm in parallel by poking
+     each of the manager processes on a random'ish basis to simulate
+     the passage of time.
+
+  5. Afterward, fetch the chain transition histories made by each FLU
+     and verify that no transition was ever unsafe.
+#+END_QUOTE
+
+
+** Behavior in symmetric network partitions
+
+The simulator has yet to find an error.  This is both really cool and
+really terrifying: is this *really* working?  No, seriously, where are
+the bugs?  Good question.  Both the algorithm and the simulator need
+review and futher study.
+
+In fact, it'd be awesome if I could work with someone who has more
+TLA+ experience than I do to work on a formal specification of the
+self-management algorithm and verify its correctness.
+
+** Behavior in asymmetric network partitions
+
+The simulator's behavior during stable periods where at least one node
+is the victim of an asymmetric network partition is ... weird,
+wonderful, and something I don't completely understand yet.  This is
+another place where we need more eyes reviewing and trying to poke
+holes in the algorithm.
+
+In cases where any node is a victim of an asymmetric network
+partition, the algorithm oscillates in a very predictable way: each
+node X makes the same P_newprop projection at epoch E that X made
+during a previous recent epoch E-delta (where delta is small, usually
+much less than 10).  However, at least one node makes a proposal that
+makes unanimous results impossible.  When any epoch E is not
+unanimous, the result is one or more new rounds of proposals.
+However, because any node N's proposal doesn't change, the system
+spirals into an infinite loop of never-fully-unanimous proposals.
+
+From the sole perspective of any single participant node, the pattern
+of this infinite loop is easy to detect.  When detected, the local
+node moves to a slightly different mode of operation: it starts
+suspecting that a "proposal flapping" series of events is happening.
+(The name "flap" is taken from IP network routing, where a "flapping
+route" is an oscillating state of churn within the routing fabric
+where one or more routes change, usually in a rapid & very disruptive
+manner.)
+
+If flapping is suspected, then the count of number of flap cycles is
+counted.  If the local node sees all participants (including itself)
+flappign with the same relative proposed projection for 5 times in a
+row, then the local node has firm evidence that there is an asymmetric
+network partition somewhere in the system.  The pattern of proposals
+is analyzed, and the local node makes a decision:
+
+1. The local node is directly affected by the network partition.  The
+   result: stop making new projection proposals until the failure
+   detector belives that a new status change has taken place.
+
+2. The local node is not directly affected by the network partition.
+   The result: continue participating in the system by continuing new
+   self-management algorithm iterations.
+
+After the asymmetric partition victims have "taken themselves out of
+the game" temporarily, then the remaining participants rapidly
+converge to rough consensus and then a visibly unanimous proposal.
+For as long as the network remains partitioned but stable, any new
+iteration of the self-management algorithm stops without
+externally-visible effects.  (I.e., it stops at the bottom of the
+flowchart's Column A.)
+