Doc update, including mid-December 2015 status #54
1 changed files with 0 additions and 170 deletions
|
@ -1,170 +0,0 @@
|
||||||
|
|
||||||
@title Machi: a small village of replicated files
|
|
||||||
|
|
||||||
@doc
|
|
||||||
|
|
||||||
== About This EDoc Documentation ==
|
|
||||||
|
|
||||||
This EDoc-style documentation will concern itself only with Erlang
|
|
||||||
function APIs and function & data types. Higher-level design and
|
|
||||||
commentary will remain outside of the Erlang EDoc system; please see
|
|
||||||
the "Pointers to Other Machi Documentation" section below for more
|
|
||||||
details.
|
|
||||||
|
|
||||||
Readers should beware that this documentation may be out-of-sync with
|
|
||||||
the source code. When in doubt, use the `make edoc' command to
|
|
||||||
regenerate all HTML pages.
|
|
||||||
|
|
||||||
It is the developer's responsibility to re-generate the documentation
|
|
||||||
periodically and commit it to the Git repo.
|
|
||||||
|
|
||||||
== Machi Code Overview ==
|
|
||||||
|
|
||||||
=== Chain Manager ===
|
|
||||||
|
|
||||||
The Chain Manager is responsible for managing the state of Machi's
|
|
||||||
"Chain Replication" state. This role is roughly analogous to the
|
|
||||||
"Riak Core" application inside of Riak, which takes care of
|
|
||||||
coordinating replica placement and replica repair.
|
|
||||||
|
|
||||||
For each primitive data server in the cluster, a Machi FLU, there is a
|
|
||||||
Chain Manager process that manages its FLU's role within the Machi
|
|
||||||
cluster's Chain Replication scheme. Each Chain Manager process
|
|
||||||
executes locally and independently to manage the distributed state of
|
|
||||||
a single Machi Chain Replication chain.
|
|
||||||
|
|
||||||
<ul>
|
|
||||||
|
|
||||||
<li> To contrast with Riak Core ... Riak Core's claimant process is
|
|
||||||
solely responsible for managing certain critical aspects of
|
|
||||||
Riak Core distributed state. Machi's Chain Manager process
|
|
||||||
performs similar tasks as Riak Core's claimant. However, Machi
|
|
||||||
has several active Chain Manager processes, one per FLU server,
|
|
||||||
instead of a single active process like Core's claimant. Each
|
|
||||||
Chain Manager process acts independently; each is constrained
|
|
||||||
so that it will reach consensus via independent computation
|
|
||||||
& action.
|
|
||||||
|
|
||||||
Full discussion of this distributed consensus is outside the
|
|
||||||
scope of this document; see the "Pointers to Other Machi
|
|
||||||
Documentation" section below for more information.
|
|
||||||
</li>
|
|
||||||
<li> Machi differs from a Riak Core application because Machi's
|
|
||||||
replica placement policy is simply, "All Machi servers store
|
|
||||||
replicas of all Machi files".
|
|
||||||
Machi is intended to be a primitive building block for creating larger
|
|
||||||
cluster-of-clusters where files are
|
|
||||||
distributed/fragmented/sharded across a large pool of
|
|
||||||
independent Machi clusters.
|
|
||||||
</li>
|
|
||||||
<li> See
|
|
||||||
[https://www.usenix.org/legacy/events/osdi04/tech/renesse.html]
|
|
||||||
for a copy of the paper, "Chain Replication for Supporting High
|
|
||||||
Throughput and Availability" by Robbert van Renesse and Fred
|
|
||||||
B. Schneider.
|
|
||||||
</li>
|
|
||||||
</ul>
|
|
||||||
|
|
||||||
=== FLU ===
|
|
||||||
|
|
||||||
The FLU is the basic storage server for Machi.
|
|
||||||
|
|
||||||
<ul>
|
|
||||||
<li> The name FLU is taken from "flash storage unit" from the paper
|
|
||||||
"CORFU: A Shared Log Design for Flash Clusters" by
|
|
||||||
Balakrishnan, Malkhi, Prabhakaran, and Wobber. See
|
|
||||||
[https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/balakrishnan]
|
|
||||||
</li>
|
|
||||||
<li> In CORFU, the sequencer step is a prerequisite step that is
|
|
||||||
performed by a separate component, the Sequencer.
|
|
||||||
In Machi, the `append_chunk()' protocol message has
|
|
||||||
an implicit "sequencer" operation applied by the "head" of the
|
|
||||||
Machi Chain Replication chain. If a client wishes to write
|
|
||||||
data that has already been assigned a sequencer position, then
|
|
||||||
the `write_chunk()' API function is used.
|
|
||||||
</li>
|
|
||||||
</ul>
|
|
||||||
|
|
||||||
For each FLU, there are three independent tasks that are implemented
|
|
||||||
using three different Erlang processes:
|
|
||||||
|
|
||||||
<ul>
|
|
||||||
<li> A FLU server, implemented primarily by `machi_flu.erl'.
|
|
||||||
</li>
|
|
||||||
<li> A projection store server, implemented primarily by
|
|
||||||
`machi_projection_store.erl'.
|
|
||||||
</li>
|
|
||||||
<li> A chain state manager server, implemented primarily by
|
|
||||||
`machi_chain_manager1.erl'.
|
|
||||||
</li>
|
|
||||||
</ul>
|
|
||||||
|
|
||||||
From the perspective of failure detection, it is very convenient that
|
|
||||||
all three FLU-related services (file server, sequencer server, and
|
|
||||||
projection server) are accessed using the same single TCP port.
|
|
||||||
|
|
||||||
=== Projection (data structure) ===
|
|
||||||
|
|
||||||
The projection is a data structure that specifies the current state
|
|
||||||
of the Machi cluster: all FLUs, which FLUS are considered
|
|
||||||
up/running or down/crashed/stopped, which FLUs are actively
|
|
||||||
participants in the Chain Replication protocol, and which FLUs are
|
|
||||||
under "repair" (i.e., having their data resyncronized when
|
|
||||||
newly-added to a cluster or when restarting after a crash).
|
|
||||||
|
|
||||||
=== Projection Store (server) ===
|
|
||||||
|
|
||||||
The projection store is a storage service that is implemented by an
|
|
||||||
Erlang/OTP `gen_server' process that is associated with each
|
|
||||||
FLU. Conceptually, the projection store is an array of
|
|
||||||
write-once registers. For each projection store register, the
|
|
||||||
key is a 2-tuple of an epoch number (`non_neg_integer()' type)
|
|
||||||
and a projection type (`public' or `private' type); the value is
|
|
||||||
a projection data structure (`projection_v1()' type).
|
|
||||||
|
|
||||||
=== Client and Proxy Client ===
|
|
||||||
|
|
||||||
Machi is intentionally avoiding using distributed Erlang for Machi's
|
|
||||||
communication. This design decision makes Erlang-side code more
|
|
||||||
difficult & complex but allows us the freedom of implementing
|
|
||||||
parts of Machi in other languages without major
|
|
||||||
protocol&API&glue code changes later in the product's
|
|
||||||
lifetime.
|
|
||||||
|
|
||||||
There are two layers of interface for Machi clients.
|
|
||||||
|
|
||||||
<ul>
|
|
||||||
<li> The `machi_flu1_client' module implements an API that uses a
|
|
||||||
TCP socket directly.
|
|
||||||
</li>
|
|
||||||
<li> The `machi_proxy_flu1_client' module implements an API that
|
|
||||||
uses a local, long-lived `gen_server' process as a proxy for
|
|
||||||
the remote, perhaps disconnected-or-crashed Machi FLU server.
|
|
||||||
</li>
|
|
||||||
</ul>
|
|
||||||
|
|
||||||
The types for both modules ought to be the same. However, due to
|
|
||||||
rapid code churn, some differences might exist. Any major difference
|
|
||||||
is (almost by definition) a bug: please open a GitHub issue to request
|
|
||||||
a correction.
|
|
||||||
|
|
||||||
== TODO notes ==
|
|
||||||
|
|
||||||
Any use of the string "TODO" in upper/lower/mixed case, anywhere in
|
|
||||||
the code, is a reminder signal of unfinished work.
|
|
||||||
|
|
||||||
== Pointers to Other Machi Documentation ==
|
|
||||||
|
|
||||||
<ul>
|
|
||||||
<li> If you are viewing this document locally, please look in the
|
|
||||||
`../doc/' directory,
|
|
||||||
</li>
|
|
||||||
<li> If you are viewing this document via the Web, please find the
|
|
||||||
documentation via this link:
|
|
||||||
[http://github.com/basho/machi/tree/master/doc/]
|
|
||||||
Please be aware that this link points to the `master' branch
|
|
||||||
of the Machi source repository and therefore may be
|
|
||||||
out-of-sync with non-`master' branch code.
|
|
||||||
</li>
|
|
||||||
|
|
||||||
</ul>
|
|
Loading…
Reference in a new issue