WIP: more restructuring (yay)
This commit is contained in:
parent
776f5ee9b3
commit
7a89d8daeb
1 changed files with 65 additions and 33 deletions
|
@ -268,6 +268,21 @@ Such extra churn is regrettable and will cause periods of delay as the
|
||||||
humming consensus algorithm (decribed below) makes decisions. However, the
|
humming consensus algorithm (decribed below) makes decisions. However, the
|
||||||
churn cannot {\bf (we assert/believe)} cause data loss.
|
churn cannot {\bf (we assert/believe)} cause data loss.
|
||||||
|
|
||||||
|
\subsection{Data consistency: strong unless otherwise noted}
|
||||||
|
|
||||||
|
Most discussion in this document assumes a desire to preserve strong
|
||||||
|
consistency in all data managed by Machi's chain replication. We
|
||||||
|
use the short-hand notation ``CP mode'' to describe this default mode
|
||||||
|
of operation, where ``C'' and ``P'' refer to the CAP Theorem
|
||||||
|
\cite{cap-theorem}.
|
||||||
|
|
||||||
|
However, there are interesting use cases where Machi is useful in a
|
||||||
|
more relaxed, eventual consistency environment. We may use the
|
||||||
|
short-hand ``AP mode'' when describing features that preserve only
|
||||||
|
eventual consistency. Discussion of AP mode features in this document
|
||||||
|
will always be explictly noted --- discussion of strongly consistent CP
|
||||||
|
mode is always the default.
|
||||||
|
|
||||||
\subsection{Use of the ``wedge state''}
|
\subsection{Use of the ``wedge state''}
|
||||||
|
|
||||||
A participant in Chain Replication will enter "wedge state", as
|
A participant in Chain Replication will enter "wedge state", as
|
||||||
|
@ -621,7 +636,7 @@ will be resolved in the exact same manner that would be used as if we
|
||||||
had found the disagreeing values at the earlier time $t$ (see previous
|
had found the disagreeing values at the earlier time $t$ (see previous
|
||||||
paragraph).
|
paragraph).
|
||||||
|
|
||||||
\section{Phases of projection change}
|
\section{Phases of projection change, a prelude to Humming Consensus}
|
||||||
\label{sec:phases-of-projection-change}
|
\label{sec:phases-of-projection-change}
|
||||||
|
|
||||||
Machi's use of projections is in four discrete phases and are
|
Machi's use of projections is in four discrete phases and are
|
||||||
|
@ -666,7 +681,7 @@ A new projection may be
|
||||||
required whenever an administrative change is requested or in response
|
required whenever an administrative change is requested or in response
|
||||||
to network conditions (e.g., network partitions, crashed server).
|
to network conditions (e.g., network partitions, crashed server).
|
||||||
|
|
||||||
Projection calculation is be a pure computation, based on input of:
|
Projection calculation is a pure computation, based on input of:
|
||||||
|
|
||||||
\begin{enumerate}
|
\begin{enumerate}
|
||||||
\item The current projection epoch's data structure
|
\item The current projection epoch's data structure
|
||||||
|
@ -683,45 +698,58 @@ changes may require retry logic and delay/sleep time intervals.
|
||||||
\subsection{Writing a new projection}
|
\subsection{Writing a new projection}
|
||||||
\label{sub:proj-storage-writing}
|
\label{sub:proj-storage-writing}
|
||||||
|
|
||||||
This phase is very straightforward; see
|
Let's ignore humming consensus for a moment and consider the general
|
||||||
|
case for Chain Replication and strong consistency. Any manager of
|
||||||
|
chain state metadata must maintain a history of the current chain
|
||||||
|
state and some history of prior states. Strong consistency can be
|
||||||
|
violated if this history is forgotten.
|
||||||
|
|
||||||
|
In Machi's case, this phase is very straightforward; see
|
||||||
Section~\ref{sub:proj-store-writing} for the technique for writing
|
Section~\ref{sub:proj-store-writing} for the technique for writing
|
||||||
projections to all participating servers' projection stores. We don't
|
projections to all participating servers' projection stores.
|
||||||
really care if the writes succeed or not. The final phase, adopting a
|
Humming Consensus does not care
|
||||||
|
if the writes succeed or not: its final phase, adopting a
|
||||||
new projection, will determine which write operations did/did not
|
new projection, will determine which write operations did/did not
|
||||||
succeed.
|
succeed.
|
||||||
|
|
||||||
\subsection{Adoption a new projection}
|
\subsection{Adoption a new projection}
|
||||||
\label{sub:proj-adoption}
|
\label{sub:proj-adoption}
|
||||||
|
|
||||||
The first step in this phase is to read latest projection from all
|
It may be helpful to consider the projections written to the cluster's
|
||||||
available public projection stores. If the result is a {\em
|
public projection stores as ``suggestions'' for what the cluster's new
|
||||||
unanimous} projection $P_{new}$ in epoch $E_{new}$, then we may
|
projection ought to be. (We avoid using the word ``proposal'' here,
|
||||||
proceed forward. If the result is not a single unanmous projection,
|
to avoid direct parallels with protocols such as Raft and Paxos.)
|
||||||
then we return to the step in Section~\ref{sub:projection-calculation}.
|
|
||||||
|
|
||||||
A projection $P_{new}$ is used by a server only if:
|
In general, a projection $P_{new}$ at epoch $E_{new}$ is adopted by a
|
||||||
|
server only if
|
||||||
|
the change in state from the local server's current projection to new
|
||||||
|
projection, $P_{current} \rightarrow P_{new}$ will not cause data loss,
|
||||||
|
e.g., the Update Propagation Invariant and all other safety checks
|
||||||
|
required by chain repair in Section~\ref{sec:repair-entire-files}
|
||||||
|
are correct. For example, any new epoch must be strictly larger than
|
||||||
|
the current epoch, i.e., $E_{new} > E_{current}$.
|
||||||
|
|
||||||
\begin{itemize}
|
Returning to Machi's case, first, we read latest projection from all
|
||||||
\item The server can determine that the projection has been replicated
|
available public projection stores. If the result is not a single
|
||||||
unanimously across all currently available servers.
|
unanmous projection, then we return to the step in
|
||||||
\item The change in state from the local server's current projection to new
|
Section~\ref{sub:projection-calculation}. If the result is a {\em
|
||||||
projection, $P_{current} \rightarrow P_{new}$ will not cause data loss,
|
unanimous} projection $P_{new}$ in epoch $E_{new}$, and if $P_{new}$
|
||||||
e.g., the Update Propagation Invariant and all other safety checks
|
does not violate chain safety checks, then the local node may adopt
|
||||||
required by chain repair in Section~\ref{sec:repair-entire-files}
|
$P_{new}$ to replace its local $P_{current}$ projection.
|
||||||
are correct. For example, any new epoch must be strictly larger than
|
|
||||||
the current epoch, i.e., $E_{new} > E_{current}$.
|
|
||||||
\end{itemize}
|
|
||||||
|
|
||||||
Both of these steps are performed as part of humming consensus's
|
Not all safe projection transitions are useful, however. For example,
|
||||||
normal operation. It may be counter-intuitive that the minimum number of
|
it's trivally safe to suggest projection $P_{zero}$, where the chain
|
||||||
available servers is only one, but ``one'' is the correct minimum
|
length is zero. In an eventual consistency environment, projection
|
||||||
number for humming consensus.
|
$P_{one}$ where the chain length is exactly one is also trivially
|
||||||
|
safe.\footnote{Although, if the total number of participants is more
|
||||||
|
than one, eventual consistency would demand that $P_{self}$ cannot
|
||||||
|
be used forever.}
|
||||||
|
|
||||||
\section{Humming Consensus}
|
\section{Humming Consensus}
|
||||||
\label{sec:humming-consensus}
|
\label{sec:humming-consensus}
|
||||||
|
|
||||||
Humming consensus describes consensus that is derived only from data
|
Humming consensus describes consensus that is derived only from data
|
||||||
that is visible/known at the current time. It's OK if a network
|
that is visible/available at the current time. It's OK if a network
|
||||||
partition is in effect and that not all chain members are available;
|
partition is in effect and that not all chain members are available;
|
||||||
the algorithm will calculate a rough consensus despite not
|
the algorithm will calculate a rough consensus despite not
|
||||||
having input from all/majority of chain members. Humming consensus
|
having input from all/majority of chain members. Humming consensus
|
||||||
|
@ -777,8 +805,6 @@ for an allegory in homage to the style of Leslie Lamport's original Paxos
|
||||||
paper.
|
paper.
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\paragraph{Aside: origin of the analogy to composing music}
|
\paragraph{Aside: origin of the analogy to composing music}
|
||||||
The ``humming'' part of humming consensus comes from the action taken
|
The ``humming'' part of humming consensus comes from the action taken
|
||||||
when the environment changes. If we imagine an egalitarian group of
|
when the environment changes. If we imagine an egalitarian group of
|
||||||
|
@ -786,13 +812,13 @@ people, all in the same room humming some pitch together, then we take
|
||||||
action to change our humming pitch if:
|
action to change our humming pitch if:
|
||||||
|
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item Some member departs the room (because we can witness the person
|
\item Some member departs the room (we hear that the volume drops) or
|
||||||
walking out the door) or if someone else in the room starts humming a
|
if someone else in the room starts humming a
|
||||||
new pitch with a new epoch number.\footnote{It's very difficult for
|
new pitch with a new epoch number.\footnote{It's very difficult for
|
||||||
the human ear to hear the epoch number part of a hummed pitch, but
|
the human ear to hear the epoch number part of a hummed pitch, but
|
||||||
for the sake of the analogy, let's assume that it can.}
|
for the sake of the analogy, let's assume that it can.}
|
||||||
\item If a member enters the room and starts humming with the same
|
\item If a member enters the room (we hear that the volume rises) and
|
||||||
epoch number but a different note.
|
perhaps hums a different pitch.
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
If someone were to transcribe onto a musical score the pitches that
|
If someone were to transcribe onto a musical score the pitches that
|
||||||
|
@ -1626,6 +1652,12 @@ Fritchie, Scott Lystig.
|
||||||
On “Humming Consensus”, an allegory.
|
On “Humming Consensus”, an allegory.
|
||||||
{\tt http://www.snookles.com/slf-blog/2015/03/ 01/on-humming-consensus-an-allegory/}
|
{\tt http://www.snookles.com/slf-blog/2015/03/ 01/on-humming-consensus-an-allegory/}
|
||||||
|
|
||||||
|
\bibitem{cap-theorem}
|
||||||
|
Seth Gilbert and Nancy Lynch.
|
||||||
|
Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services.
|
||||||
|
SigAct News, June 2002.
|
||||||
|
{\tt http://webpages.cs.luc.edu/~pld/353/ gilbert\_lynch\_brewer\_proof.pdf}
|
||||||
|
|
||||||
\bibitem{rfc-7282}
|
\bibitem{rfc-7282}
|
||||||
Internet Engineering Task Force.
|
Internet Engineering Task Force.
|
||||||
RFC 7282: On Consensus and Humming in the IETF.
|
RFC 7282: On Consensus and Humming in the IETF.
|
||||||
|
|
Loading…
Reference in a new issue