Doc update, including mid-December 2015 status #54
1 changed files with 0 additions and 170 deletions
|
@ -1,170 +0,0 @@
|
|||
|
||||
@title Machi: a small village of replicated files
|
||||
|
||||
@doc
|
||||
|
||||
== About This EDoc Documentation ==
|
||||
|
||||
This EDoc-style documentation will concern itself only with Erlang
|
||||
function APIs and function & data types. Higher-level design and
|
||||
commentary will remain outside of the Erlang EDoc system; please see
|
||||
the "Pointers to Other Machi Documentation" section below for more
|
||||
details.
|
||||
|
||||
Readers should beware that this documentation may be out-of-sync with
|
||||
the source code. When in doubt, use the `make edoc' command to
|
||||
regenerate all HTML pages.
|
||||
|
||||
It is the developer's responsibility to re-generate the documentation
|
||||
periodically and commit it to the Git repo.
|
||||
|
||||
== Machi Code Overview ==
|
||||
|
||||
=== Chain Manager ===
|
||||
|
||||
The Chain Manager is responsible for managing the state of Machi's
|
||||
"Chain Replication" state. This role is roughly analogous to the
|
||||
"Riak Core" application inside of Riak, which takes care of
|
||||
coordinating replica placement and replica repair.
|
||||
|
||||
For each primitive data server in the cluster, a Machi FLU, there is a
|
||||
Chain Manager process that manages its FLU's role within the Machi
|
||||
cluster's Chain Replication scheme. Each Chain Manager process
|
||||
executes locally and independently to manage the distributed state of
|
||||
a single Machi Chain Replication chain.
|
||||
|
||||
<ul>
|
||||
|
||||
<li> To contrast with Riak Core ... Riak Core's claimant process is
|
||||
solely responsible for managing certain critical aspects of
|
||||
Riak Core distributed state. Machi's Chain Manager process
|
||||
performs similar tasks as Riak Core's claimant. However, Machi
|
||||
has several active Chain Manager processes, one per FLU server,
|
||||
instead of a single active process like Core's claimant. Each
|
||||
Chain Manager process acts independently; each is constrained
|
||||
so that it will reach consensus via independent computation
|
||||
& action.
|
||||
|
||||
Full discussion of this distributed consensus is outside the
|
||||
scope of this document; see the "Pointers to Other Machi
|
||||
Documentation" section below for more information.
|
||||
</li>
|
||||
<li> Machi differs from a Riak Core application because Machi's
|
||||
replica placement policy is simply, "All Machi servers store
|
||||
replicas of all Machi files".
|
||||
Machi is intended to be a primitive building block for creating larger
|
||||
cluster-of-clusters where files are
|
||||
distributed/fragmented/sharded across a large pool of
|
||||
independent Machi clusters.
|
||||
</li>
|
||||
<li> See
|
||||
[https://www.usenix.org/legacy/events/osdi04/tech/renesse.html]
|
||||
for a copy of the paper, "Chain Replication for Supporting High
|
||||
Throughput and Availability" by Robbert van Renesse and Fred
|
||||
B. Schneider.
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
=== FLU ===
|
||||
|
||||
The FLU is the basic storage server for Machi.
|
||||
|
||||
<ul>
|
||||
<li> The name FLU is taken from "flash storage unit" from the paper
|
||||
"CORFU: A Shared Log Design for Flash Clusters" by
|
||||
Balakrishnan, Malkhi, Prabhakaran, and Wobber. See
|
||||
[https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/balakrishnan]
|
||||
</li>
|
||||
<li> In CORFU, the sequencer step is a prerequisite step that is
|
||||
performed by a separate component, the Sequencer.
|
||||
In Machi, the `append_chunk()' protocol message has
|
||||
an implicit "sequencer" operation applied by the "head" of the
|
||||
Machi Chain Replication chain. If a client wishes to write
|
||||
data that has already been assigned a sequencer position, then
|
||||
the `write_chunk()' API function is used.
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
For each FLU, there are three independent tasks that are implemented
|
||||
using three different Erlang processes:
|
||||
|
||||
<ul>
|
||||
<li> A FLU server, implemented primarily by `machi_flu.erl'.
|
||||
</li>
|
||||
<li> A projection store server, implemented primarily by
|
||||
`machi_projection_store.erl'.
|
||||
</li>
|
||||
<li> A chain state manager server, implemented primarily by
|
||||
`machi_chain_manager1.erl'.
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
From the perspective of failure detection, it is very convenient that
|
||||
all three FLU-related services (file server, sequencer server, and
|
||||
projection server) are accessed using the same single TCP port.
|
||||
|
||||
=== Projection (data structure) ===
|
||||
|
||||
The projection is a data structure that specifies the current state
|
||||
of the Machi cluster: all FLUs, which FLUS are considered
|
||||
up/running or down/crashed/stopped, which FLUs are actively
|
||||
participants in the Chain Replication protocol, and which FLUs are
|
||||
under "repair" (i.e., having their data resyncronized when
|
||||
newly-added to a cluster or when restarting after a crash).
|
||||
|
||||
=== Projection Store (server) ===
|
||||
|
||||
The projection store is a storage service that is implemented by an
|
||||
Erlang/OTP `gen_server' process that is associated with each
|
||||
FLU. Conceptually, the projection store is an array of
|
||||
write-once registers. For each projection store register, the
|
||||
key is a 2-tuple of an epoch number (`non_neg_integer()' type)
|
||||
and a projection type (`public' or `private' type); the value is
|
||||
a projection data structure (`projection_v1()' type).
|
||||
|
||||
=== Client and Proxy Client ===
|
||||
|
||||
Machi is intentionally avoiding using distributed Erlang for Machi's
|
||||
communication. This design decision makes Erlang-side code more
|
||||
difficult & complex but allows us the freedom of implementing
|
||||
parts of Machi in other languages without major
|
||||
protocol&API&glue code changes later in the product's
|
||||
lifetime.
|
||||
|
||||
There are two layers of interface for Machi clients.
|
||||
|
||||
<ul>
|
||||
<li> The `machi_flu1_client' module implements an API that uses a
|
||||
TCP socket directly.
|
||||
</li>
|
||||
<li> The `machi_proxy_flu1_client' module implements an API that
|
||||
uses a local, long-lived `gen_server' process as a proxy for
|
||||
the remote, perhaps disconnected-or-crashed Machi FLU server.
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
The types for both modules ought to be the same. However, due to
|
||||
rapid code churn, some differences might exist. Any major difference
|
||||
is (almost by definition) a bug: please open a GitHub issue to request
|
||||
a correction.
|
||||
|
||||
== TODO notes ==
|
||||
|
||||
Any use of the string "TODO" in upper/lower/mixed case, anywhere in
|
||||
the code, is a reminder signal of unfinished work.
|
||||
|
||||
== Pointers to Other Machi Documentation ==
|
||||
|
||||
<ul>
|
||||
<li> If you are viewing this document locally, please look in the
|
||||
`../doc/' directory,
|
||||
</li>
|
||||
<li> If you are viewing this document via the Web, please find the
|
||||
documentation via this link:
|
||||
[http://github.com/basho/machi/tree/master/doc/]
|
||||
Please be aware that this link points to the `master' branch
|
||||
of the Machi source repository and therefore may be
|
||||
out-of-sync with non-`master' branch code.
|
||||
</li>
|
||||
|
||||
</ul>
|
Loading…
Reference in a new issue