The FAQ grows
This commit is contained in:
parent
e0fd2e909c
commit
48e4bf2c1a
3 changed files with 368 additions and 32 deletions
|
@ -11,7 +11,14 @@ hands on in this process, reach out to
|
||||||
|
|
||||||
Thank you for being part of the community! We love you for it.
|
Thank you for being part of the community! We love you for it.
|
||||||
|
|
||||||
## General process
|
## If you have a question or wish to provide design feedback/criticism
|
||||||
|
|
||||||
|
Please
|
||||||
|
[open a support ticket at GitHub](https://github.com/basho/machi/issues/new)
|
||||||
|
to ask questions and to provide feedback about Machi's
|
||||||
|
design/documentation/source code.
|
||||||
|
|
||||||
|
## General development process
|
||||||
|
|
||||||
Machi is still a very young project within Basho, with a small team of
|
Machi is still a very young project within Basho, with a small team of
|
||||||
developers; please bear with us as we grow out of "toddler" stage into
|
developers; please bear with us as we grow out of "toddler" stage into
|
||||||
|
|
364
FAQ.md
364
FAQ.md
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
<!-- Formatting: -->
|
<!-- Formatting: -->
|
||||||
<!-- All headings omitted from outline are H1 -->
|
<!-- All headings omitted from outline are H1 -->
|
||||||
<!-- All other headings must be on a single line -->
|
<!-- All other headings must be on a single line! -->
|
||||||
<!-- Run: ./priv/make-faq.pl ./FAQ.md > ./tmpfoo; mv ./tmpfoo ./FAQ.md -->
|
<!-- Run: ./priv/make-faq.pl ./FAQ.md > ./tmpfoo; mv ./tmpfoo ./FAQ.md -->
|
||||||
|
|
||||||
# Outline
|
# Outline
|
||||||
|
@ -11,13 +11,27 @@
|
||||||
|
|
||||||
+ [1 Questions about Machi in general](#n1)
|
+ [1 Questions about Machi in general](#n1)
|
||||||
+ [1.1 What is Machi?](#n1.1)
|
+ [1.1 What is Machi?](#n1.1)
|
||||||
+ [1.2 What does Machi's API look like?](#n1.2)
|
+ [1.2 What is a Machi "cluster of clusters"?](#n1.2)
|
||||||
+ [2 Questions about Machi relative to something else](#n2)
|
+ [1.3 What is Machi like when operating in "eventually consistent"/"AP mode"?](#n1.3)
|
||||||
|
+ [1.4 What is Machi like when operating in "strongly consistent"/"CP mode"?](#n1.4)
|
||||||
|
+ [1.5 What does Machi's API look like?](#n1.5)
|
||||||
|
+ [1.6 What licensing terms are used by Machi?](#n1.6)
|
||||||
|
+ [1.7 What is Machi's expected release schedule, packaging, and operating system/OS distribution support?](#n1.7)
|
||||||
|
+ [2 Questions about Machi relative to {{something else}}](#n2)
|
||||||
+ [2.1 How is Machi better than Hadoop?](#n2.1)
|
+ [2.1 How is Machi better than Hadoop?](#n2.1)
|
||||||
+ [2.2 How does Machi differ from HadoopFS?](#n2.2)
|
+ [2.2 How does Machi differ from HadoopFS/HDFS?](#n2.2)
|
||||||
+ [2.3 How does Machi differ from Kafka?](#n2.3)
|
+ [2.3 How does Machi differ from Kafka?](#n2.3)
|
||||||
+ [2.4 How does Machi differ from Bookkeeper?](#n2.4)
|
+ [2.4 How does Machi differ from Bookkeeper?](#n2.4)
|
||||||
+ [2.5 How does Machi differ from CORFU and Tango?](#n2.5)
|
+ [2.5 How does Machi differ from CORFU and Tango?](#n2.5)
|
||||||
|
+ [3 Machi's specifics](#n3)
|
||||||
|
+ [3.1 What technique is used to replicate Machi's files? Can other techniques be used?](#n3.1)
|
||||||
|
+ [3.2 Does Machi have a reliance on a coordination service such as ZooKeeper or etcd?](#n3.2)
|
||||||
|
+ [3.3 How is Machi tested?](#n3.3)
|
||||||
|
+ [3.4 Does Machi require shared disk storage? e.g. iSCSI, NBD (Network Block Device), Fibre Channel disks](#n3.4)
|
||||||
|
+ [3.5 Does Machi require or assume that servers with large numbers of disks must use RAID-0/1/5/6/10/50/60 to create a single block device?](#n3.5)
|
||||||
|
+ [3.6 What language(s) is Machi written in?](#n3.6)
|
||||||
|
+ [3.7 Does Machi use the Erlang/OTP network distribution system (aka "disterl")?](#n3.7)
|
||||||
|
+ [3.8 Can I use HTTP to write/read stuff into/from Machi?](#n3.8)
|
||||||
|
|
||||||
<!-- ENDOUTLINE -->
|
<!-- ENDOUTLINE -->
|
||||||
|
|
||||||
|
@ -27,9 +41,9 @@
|
||||||
<a name="n1.1">
|
<a name="n1.1">
|
||||||
### 1.1. What is Machi?
|
### 1.1. What is Machi?
|
||||||
|
|
||||||
TODO: expand this topic.
|
Very briefly, Machi is a very simple append-only file store.
|
||||||
|
|
||||||
Very briefly, Machi is a very simple append-only file store; it is
|
Machi is
|
||||||
"dumber" than many other file stores (i.e., lacking many features
|
"dumber" than many other file stores (i.e., lacking many features
|
||||||
found in other file stores) such as HadoopFS or simple NFS or CIFS file
|
found in other file stores) such as HadoopFS or simple NFS or CIFS file
|
||||||
server.
|
server.
|
||||||
|
@ -37,15 +51,124 @@ However, Machi is a distributed file store, which makes it different
|
||||||
(and, in some ways, more complicated) than a simple NFS or CIFS file
|
(and, in some ways, more complicated) than a simple NFS or CIFS file
|
||||||
server.
|
server.
|
||||||
|
|
||||||
|
All Machi data is protected by SHA-1 checksums. By default, these
|
||||||
|
checksums are calculated by the client to provide strong end-to-end
|
||||||
|
protection against data corruption. (If the client does not provide a
|
||||||
|
checksum, one will be generated by the first Machi server to handle
|
||||||
|
the write request.) Internally, Machi uses these checksums for local
|
||||||
|
data integrity checks and for server-to-server file synchronization
|
||||||
|
and corrupt data repair.
|
||||||
|
|
||||||
As a distributed system, Machi can be configured to operate with
|
As a distributed system, Machi can be configured to operate with
|
||||||
either eventually consistent mode or strongly consistent mode. (See
|
either eventually consistent mode or strongly consistent mode. In
|
||||||
the high level design document for definitions and details.)
|
strongly consistent mode, Machi can provide write-once file store
|
||||||
|
service in the same style as CORFU. Machi can be an easy to use tool
|
||||||
|
for building fully ordered, log-based distributed systems and
|
||||||
|
distributed data structures.
|
||||||
|
|
||||||
|
In eventually consistent mode, Machi can remain available for writes
|
||||||
|
during arbitrary network partitions. When a network partition is
|
||||||
|
fixed, Machi can safely merge all file data together without data
|
||||||
|
loss. Similar to the operation of
|
||||||
|
Basho's
|
||||||
|
[Riak key-value store, Riak KV](http://basho.com/products/riak-kv/),
|
||||||
|
Machi can provide file writes during arbitrary network partitions and
|
||||||
|
later merge all results together safely when the cluster recovers.
|
||||||
|
|
||||||
For a much longer answer, please see the
|
For a much longer answer, please see the
|
||||||
[Machi high level design doc](./doc/high-level-machi.pdf).
|
[Machi high level design doc](https://github.com/basho/machi/doc/high-level-machi.pdf).
|
||||||
|
|
||||||
<a name="n1.2">
|
<a name="n1.2">
|
||||||
### 1.2. What does Machi's API look like?
|
### 1.2. What is a Machi "cluster of clusters"?
|
||||||
|
|
||||||
|
Machi's design is based on using small, well-understood and provable
|
||||||
|
(mathematically) techniques to maintain multiple file copies without
|
||||||
|
data loss or data corruption. At its lowest level, Machi contains no
|
||||||
|
support for distribution/partitioning/sharding of files across many
|
||||||
|
servers. A typical, fully-functional Machi cluster will likely be two
|
||||||
|
or three machines.
|
||||||
|
|
||||||
|
However, Machi is designed to be an excellent building block for
|
||||||
|
building larger systems. A deployment of Machi "cluster of clusters"
|
||||||
|
will use the "random slicing" technique for partitioning files across
|
||||||
|
multiple Machi clusters that, as individuals, are unaware of the
|
||||||
|
larger cluster-of-clusters scheme.
|
||||||
|
|
||||||
|
The cluster-of-clusters management service will be fully decentralized
|
||||||
|
and run as a separate software service installed on each Machi
|
||||||
|
cluster. This manager will appear to the local Machi server as simply
|
||||||
|
another Machi file client. The cluster-of-clusters managers will take
|
||||||
|
care of file migration as the cluster grows and shrinks in capacity
|
||||||
|
and in response to day-to-day changes in workload.
|
||||||
|
|
||||||
|
Though the cluster-of-clusters manager has not yet been implemented,
|
||||||
|
its design is fully decentralized and capable of operating despite
|
||||||
|
multiple partial failure of its member clusters. We expect this
|
||||||
|
design to scale easily to at least one thousand servers.
|
||||||
|
|
||||||
|
<a name="n1.3">
|
||||||
|
### 1.3. What is Machi like when operating in "eventually consistent"/"AP mode"?
|
||||||
|
|
||||||
|
Machi's operating mode dictates how a Machi cluster will react to
|
||||||
|
network partitions. A network partition may be caused by:
|
||||||
|
|
||||||
|
* A network failure
|
||||||
|
* A server failure
|
||||||
|
* An extreme server software "hang" or "pause", e.g. caused by OS
|
||||||
|
scheduling problems such as a failing/stuttering disk device.
|
||||||
|
|
||||||
|
"AP mode" refers to the "A" and "P" properties of the "CAP
|
||||||
|
conjecture", meaning that the cluster will be "Available" and
|
||||||
|
"Partition tolerant".
|
||||||
|
|
||||||
|
The consistency semantics of file operations while in "AP mode" are
|
||||||
|
eventually consistent during and after network partitions:
|
||||||
|
|
||||||
|
* File write operations are permitted by any client on the "same side"
|
||||||
|
of the network partition.
|
||||||
|
* File read operations are successful for any file contents where the
|
||||||
|
client & server are on the "same side" of the network partition.
|
||||||
|
* After the network partition(s) is resolved, files are merged
|
||||||
|
together from "all sides" of the partition(s).
|
||||||
|
* Unique files are copied in their entirety.
|
||||||
|
* Byte ranges within the same file are merged. This is possible
|
||||||
|
due to Machi's restrictions on file naming (files names are
|
||||||
|
alwoys assigned by Machi servers) and file offset assignments
|
||||||
|
(byte offsets are also always chosen by Machi servers according
|
||||||
|
to rules which guarantee safe mergeability.).
|
||||||
|
|
||||||
|
<a name="n1.4">
|
||||||
|
### 1.4. What is Machi like when operating in "strongly consistent"/"CP mode"?
|
||||||
|
|
||||||
|
Machi's operating mode dictates how a Machi cluster will react to
|
||||||
|
network partitions.
|
||||||
|
"CP mode" refers to the "C" and "P" properties of the "CAP
|
||||||
|
conjecture", meaning that the cluster will be "Consistent" and
|
||||||
|
"Partition tolerant".
|
||||||
|
|
||||||
|
The consistency semantics of file operations while in "CP mode" are
|
||||||
|
strongly consistent during and after network partitions:
|
||||||
|
|
||||||
|
* File write operations are permitted by any client on the "same side"
|
||||||
|
of the network partition if and only if a quorum majority of Machi servers
|
||||||
|
are also accessible within that partition.
|
||||||
|
* In other words, file write service is unavailable in any
|
||||||
|
partition where only a minority of Machi servers are accessible.
|
||||||
|
* File read operations are successful for any file contents where the
|
||||||
|
client & server are on the "same side" of the network partition.
|
||||||
|
* After the network partition(s) is resolved, files are repaired from
|
||||||
|
the surviving quorum majority members to out-of-sync minority
|
||||||
|
members.
|
||||||
|
|
||||||
|
Machi's design can provide the illusion of quorum minority write
|
||||||
|
availability if the cluster is configured to operate with "witness
|
||||||
|
servers". (This feaure is not implemented yet, as of June 2015.)
|
||||||
|
See Section 11 of
|
||||||
|
[Machi chain manager high level design doc](https://github.com/basho/machi/doc/high-level-chain-mgr.pdf)
|
||||||
|
for more details.
|
||||||
|
|
||||||
|
<a name="n1.5">
|
||||||
|
### 1.5. What does Machi's API look like?
|
||||||
|
|
||||||
The Machi API only contains a handful of API operations. The function
|
The Machi API only contains a handful of API operations. The function
|
||||||
arguments shown below use Erlang-style type annotations.
|
arguments shown below use Erlang-style type annotations.
|
||||||
|
@ -68,11 +191,32 @@ healed.
|
||||||
Internally, there is a more complex protocol used by individual
|
Internally, there is a more complex protocol used by individual
|
||||||
cluster members to manage file contents and to repair damaged/missing
|
cluster members to manage file contents and to repair damaged/missing
|
||||||
files. See Figure 3 in
|
files. See Figure 3 in
|
||||||
[Machi high level design doc](./doc/high-level-machi.pdf)
|
[Machi high level design doc](https://github.com/basho/machi/doc/high-level-machi.pdf)
|
||||||
for more details.
|
for more details.
|
||||||
|
|
||||||
|
<a name="n1.6">
|
||||||
|
### 1.6. What licensing terms are used by Machi?
|
||||||
|
|
||||||
|
All Machi source code and documentation is licensed by
|
||||||
|
[Basho Technologies, Inc.](http://www.basho.com/)
|
||||||
|
under the [Apache Public License version 2](https://github.com/basho/machi/LICENSE).
|
||||||
|
|
||||||
|
<a name="n1.7">
|
||||||
|
### 1.7. What is Machi's expected release schedule, packaging, and operating system/OS distribution support?
|
||||||
|
|
||||||
|
Basho expects that Machi's first release will take place near the end
|
||||||
|
of calendar year 2015.
|
||||||
|
|
||||||
|
Basho's official support for operating systems (e.g. Linux, FreeBSD),
|
||||||
|
operating system packaging (e.g. CentOS rpm/yum package management,
|
||||||
|
Ubuntu debian/apt-get package management), and
|
||||||
|
container/virtualization have not yet been chosen. If you wish to
|
||||||
|
provide your opinion, we'd love to hear it. Please
|
||||||
|
[open a support ticket at GitHub](https://github.com/basho/machi/issues/new)
|
||||||
|
and let us know.
|
||||||
|
|
||||||
<a name="n2">
|
<a name="n2">
|
||||||
## 2. Questions about Machi relative to something else
|
## 2. Questions about Machi relative to {{something else}}
|
||||||
|
|
||||||
<a name="better-than-hadoop">
|
<a name="better-than-hadoop">
|
||||||
<a name="n2.1">
|
<a name="n2.1">
|
||||||
|
@ -89,7 +233,7 @@ and focus on the description of Hadoop's MapReduce and YARN; Machi
|
||||||
contains neither.
|
contains neither.
|
||||||
|
|
||||||
<a name="n2.2">
|
<a name="n2.2">
|
||||||
### 2.2. How does Machi differ from HadoopFS?
|
### 2.2. How does Machi differ from HadoopFS/HDFS?
|
||||||
|
|
||||||
This is a much better question than the
|
This is a much better question than the
|
||||||
[How is Machi better than Hadoop?](#better-than-hadoop)
|
[How is Machi better than Hadoop?](#better-than-hadoop)
|
||||||
|
@ -105,7 +249,7 @@ contrast.
|
||||||
|
|
||||||
<tr>
|
<tr>
|
||||||
<td> <b>Machi</b>
|
<td> <b>Machi</b>
|
||||||
<td> <b>Hadoop</b>
|
<td> <b>HadoopFS (HDFS)</b>
|
||||||
|
|
||||||
<tr>
|
<tr>
|
||||||
<td> Not POSIX compliant
|
<td> Not POSIX compliant
|
||||||
|
@ -122,7 +266,9 @@ appended to.
|
||||||
<td> File must be closed before a client can read it.
|
<td> File must be closed before a client can read it.
|
||||||
|
|
||||||
<tr>
|
<tr>
|
||||||
<td> No concept (yet) of users, directories, or ACLs
|
<td> No concept (yet) of users or authentication (though the initial
|
||||||
|
supported release will support basic user + password authentication).
|
||||||
|
Machi will probably never natively support directories or ACLs.
|
||||||
<td> Has concepts of users, directories, and ACLs.
|
<td> Has concepts of users, directories, and ACLs.
|
||||||
|
|
||||||
<tr>
|
<tr>
|
||||||
|
@ -161,8 +307,10 @@ Chain Replication chain. Typically, *N=2*, but this is configurable.
|
||||||
configurable.
|
configurable.
|
||||||
|
|
||||||
<tr>
|
<tr>
|
||||||
<td>
|
<td> All Machi file data is protected by SHA-1 checksums generated by
|
||||||
<td>
|
the client prior to writing by Machi servers.
|
||||||
|
<td> Optional file checksum protection may be implemented on the
|
||||||
|
server side.
|
||||||
|
|
||||||
</table>
|
</table>
|
||||||
|
|
||||||
|
@ -189,6 +337,11 @@ failover, etc.
|
||||||
environments.
|
environments.
|
||||||
<td> "Battle tested" in large production environments.
|
<td> "Battle tested" in large production environments.
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td> All Machi file data is protected by SHA-1 checksums generated by
|
||||||
|
the client prior to writing by Machi servers.
|
||||||
|
<td> Each log entry is protected by a 32 bit CRC checksum.
|
||||||
|
|
||||||
</table>
|
</table>
|
||||||
|
|
||||||
In theory, it should be "quite straightforward" to remove these parts
|
In theory, it should be "quite straightforward" to remove these parts
|
||||||
|
@ -227,9 +380,48 @@ Machi's design borrows very heavily from CORFU. We acknowledge a deep
|
||||||
debt to the original Microsoft Research papers that describe CORFU's
|
debt to the original Microsoft Research papers that describe CORFU's
|
||||||
original design and implementation.
|
original design and implementation.
|
||||||
|
|
||||||
|
<table>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td> <b>Machi</b>
|
||||||
|
<td> <b>CORFU</b>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td> Writes & reads may be on byte boundaries
|
||||||
|
<td> Wries & reads must be on page boundaries, e.g. 4 or 8 KBytes, to
|
||||||
|
align with server storage based on flash NVRAM/solid state disk (SSD).
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td> Provides multiple "logs", where each log has a name and is
|
||||||
|
appended to & read from like a file. A read operation requires a 3-tuple:
|
||||||
|
file name, starting byte offset, number of bytes.
|
||||||
|
<td> Provides a single "log". A read operation requires only a
|
||||||
|
1-tuple: the log page number. (A protocol option exists to
|
||||||
|
request multiple pages in a single read query?)
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td> Offers service in either strongly consistent mode or eventually
|
||||||
|
consistent mode.
|
||||||
|
<td> Offers service in strongly consistent mode.
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td> May be deployed on solid state disk (SSD) or Winchester hard disks.
|
||||||
|
<td> Designed for use with solid state disk (SSD) but can also be used
|
||||||
|
with Winchester hard disks (with a performance penalty if used as
|
||||||
|
suggested by use cases described by the CORFU papers).
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td> All Machi file data is protected by SHA-1 checksums generated by
|
||||||
|
the client prior to writing by Machi servers.
|
||||||
|
<td> Depending on server & flash device capabilities, each data page
|
||||||
|
may be protected by a checksum (calculated independently by each
|
||||||
|
server rather than the client).
|
||||||
|
|
||||||
|
</table>
|
||||||
|
|
||||||
See also: the "Recommended reading & related work" and "References"
|
See also: the "Recommended reading & related work" and "References"
|
||||||
sections of the
|
sections of the
|
||||||
[Machi high level design doc](./doc/high-level-machi.pdf)
|
[Machi high level design doc](https://github.com/basho/machi/doc/high-level-machi.pdf)
|
||||||
for pointers to the MSR papers related to CORFU.
|
for pointers to the MSR papers related to CORFU.
|
||||||
|
|
||||||
Machi does not implement Tango directly. (Not yet, at least.)
|
Machi does not implement Tango directly. (Not yet, at least.)
|
||||||
|
@ -237,3 +429,139 @@ However, there is a prototype implementation included in the Machi
|
||||||
source tree. See
|
source tree. See
|
||||||
[the prototype/tango source code directory](https://github.com/basho/machi/tree/master/prototype/tango)
|
[the prototype/tango source code directory](https://github.com/basho/machi/tree/master/prototype/tango)
|
||||||
for details.
|
for details.
|
||||||
|
|
||||||
|
<a name="n3">
|
||||||
|
## 3. Machi's specifics
|
||||||
|
|
||||||
|
<a name="n3.1">
|
||||||
|
### 3.1. What technique is used to replicate Machi's files? Can other techniques be used?
|
||||||
|
|
||||||
|
Machi uses Chain Replication to replicate all file data. Each byte of
|
||||||
|
a file is stored using a "write-once register", which is a mechanism
|
||||||
|
to enforce immutability after the byte has been written exactly once.
|
||||||
|
|
||||||
|
In order to ensure availability in the event of *F* failures, Chain
|
||||||
|
Replication requires a minimum of *F + 1* servers to be configured.
|
||||||
|
|
||||||
|
Alternative mechanisms could be used to manage file replicas, such as
|
||||||
|
Paxos or Raft. Both Paxos and Raft have some requirements that are
|
||||||
|
difficult to adapt to Machi's design goals:
|
||||||
|
|
||||||
|
* Both protocols use quorum majority consensus, which requires a
|
||||||
|
minimum of *2F + 1* working servers to tolerate *F* failures. For
|
||||||
|
example, to tolerate 2 server failures, quorum majority protocols
|
||||||
|
require a minium of 5 servers. To tolerate the same number of
|
||||||
|
failures, Chain replication requires only 3 servers.
|
||||||
|
* Machi's use of "humming consensus" to manage internal server
|
||||||
|
metadata state would also (probably) require conversion to Paxos or
|
||||||
|
Raft. (Or "outsourced" to a service such as ZooKeeper.)
|
||||||
|
|
||||||
|
<a name="n3.2">
|
||||||
|
### 3.2. Does Machi have a reliance on a coordination service such as ZooKeeper or etcd?
|
||||||
|
|
||||||
|
No. Machi maintains critical internal cluster information in an
|
||||||
|
internal, immutable data service called the "projection store". The
|
||||||
|
contents of the projection store are maintained by a new technique
|
||||||
|
called "humming consensus".
|
||||||
|
|
||||||
|
Humming consensus is described in the
|
||||||
|
[Machi chain manager high level design doc](https://github.com/basho/machi/doc/high-level-chain-mgr.pdf).
|
||||||
|
|
||||||
|
<a name="n3.3">
|
||||||
|
### 3.3. How is Machi tested?
|
||||||
|
|
||||||
|
While not formally proven yet, Machi's implementation of Chain
|
||||||
|
Replication and of humming consensus have been extensively tested with
|
||||||
|
several techniques:
|
||||||
|
|
||||||
|
* We use an executable model based on the QuickCheck framework for
|
||||||
|
property based testing.
|
||||||
|
* In addition to QuickCheck alone, we use the PULSE extension to
|
||||||
|
QuickCheck is used to test the implementation
|
||||||
|
under extremely skewed & unfair scheduling/timing conditions.
|
||||||
|
|
||||||
|
The model includes simulation of asymmetric network partitions. For
|
||||||
|
example, actor A can send messages to actor B, but B cannot send
|
||||||
|
messages to A. If such a partition happens somewhere in a traditional
|
||||||
|
network stack (e.g. a faulty Ethernet cable), any TCP connection
|
||||||
|
between A & B will quickly interrupt communication in _both_
|
||||||
|
directions. In the Machi network partition simulator, network
|
||||||
|
partitions can be truly one-way only.
|
||||||
|
|
||||||
|
After randomly generating a series of network partitions (which may
|
||||||
|
change several times during any single test case) and a random series
|
||||||
|
of cluster operations, an event trace of all cluster activity is used
|
||||||
|
to verify that no safety-critical rules have been violated.
|
||||||
|
|
||||||
|
<a name="n3.4">
|
||||||
|
### 3.4. Does Machi require shared disk storage? e.g. iSCSI, NBD (Network Block Device), Fibre Channel disks
|
||||||
|
|
||||||
|
No, Machi's design assumes that each Machi server is a fully
|
||||||
|
independent hardware and assumes only standard local disks (Winchester
|
||||||
|
and/or SSD style) with local-only interfaces (e.g. SATA, SCSI, PCI) in
|
||||||
|
each machine.
|
||||||
|
|
||||||
|
<a name="n3.5">
|
||||||
|
### 3.5. Does Machi require or assume that servers with large numbers of disks must use RAID-0/1/5/6/10/50/60 to create a single block device?
|
||||||
|
|
||||||
|
No. When used with servers with multiple disks, the intent is to
|
||||||
|
deploy multiple Machi servers per machine: one Machi server per disk.
|
||||||
|
|
||||||
|
* Pro: disk bandwidth and disk storage capacity can be managed at the
|
||||||
|
level of an individual disk.
|
||||||
|
* Pro: failure of an individual disk does not risk data loss on other
|
||||||
|
disks.
|
||||||
|
* Con (or pro, depending on the circumstances): in this configuration,
|
||||||
|
Machi would require additional network bandwidth to repair data on a lost
|
||||||
|
drive instead of intra-machine disk & bus & memory bandwidth that
|
||||||
|
would be required for RAID volume repair
|
||||||
|
* Con: replica placement policy, such as "rack awareness", becomes a
|
||||||
|
larger problem that must be automated. For example, a problem of
|
||||||
|
placement relative to 12 servers is smaller than a placement problem
|
||||||
|
of managing 264 seprate disks (if each of 12 servers has 22 disks).
|
||||||
|
|
||||||
|
<a name="n3.6">
|
||||||
|
### 3.6. What language(s) is Machi written in?
|
||||||
|
|
||||||
|
So far, Machi is written in 100% Erlang.
|
||||||
|
|
||||||
|
In the event that we encounter a performance problem that cannot be
|
||||||
|
solved within the Erlang/OTP runtime environment, all of Machi's
|
||||||
|
performance-critical components are small enough to be re-implemented
|
||||||
|
in C, Java, or other "gotta go fast fast FAST!!" programming
|
||||||
|
language. We expect that the Chain Replication manager and other
|
||||||
|
critical "control plane" software will remain in Erlang.
|
||||||
|
|
||||||
|
<a name="n3.7">
|
||||||
|
### 3.7. Does Machi use the Erlang/OTP network distribution system (aka "disterl")?
|
||||||
|
|
||||||
|
No, Machi doesn't use Erlang/OTP's built-in distributed message
|
||||||
|
passing system. The code would be *much* simpler if we did use
|
||||||
|
"disterl". However, due to (premature?) worries about performance, we
|
||||||
|
wanted to have the option of re-writing some Machi components in C or
|
||||||
|
Java or Go or OCaml or COBOL or in-kernel assembly hexadecimal
|
||||||
|
bit-twiddling magicSPEED ... without also having to find a replacement
|
||||||
|
for disterl. (Or without having to re-invent disterl's features in
|
||||||
|
another language.)
|
||||||
|
|
||||||
|
<a name="artisanal-protocol">
|
||||||
|
In the first drafts of the Machi code, the inter-node communication
|
||||||
|
uses a hand-crafted, artisanal, mostly ASCII protocol as part of a
|
||||||
|
"demo day" quick & dirty prototype. Work is underway (summer of 2015)
|
||||||
|
to replace that protocol gradually with a well-structured,
|
||||||
|
well-documented protocol based on Protocol Buffers data serialization.
|
||||||
|
|
||||||
|
<a name="n3.8">
|
||||||
|
### 3.8. Can I use HTTP to write/read stuff into/from Machi?
|
||||||
|
|
||||||
|
Yes, sort of. For as long as the legacy of
|
||||||
|
[Machi's first internal protocol](#artisanal-protocol) code still
|
||||||
|
survives, it's possible to use a
|
||||||
|
[primitive/hack'y HTTP interface that is described in this source code commit log](https://github.com/basho/machi/commit/6cebf397232cba8e63c5c9a0a8c02ba391b20fef).
|
||||||
|
|
||||||
|
In the long term, we'll probably want the option of an HTTP interface
|
||||||
|
that is as well designed and REST'ful as possible. It's on the
|
||||||
|
internal Basho roadmap. If you'd like to work on a real, not-kludgy
|
||||||
|
HTTP interface to Machi,
|
||||||
|
[please contact us!](https://github.com/basho/machi/blob/master/CONTRIBUTING.md)
|
||||||
|
|
||||||
|
|
27
README.md
27
README.md
|
@ -21,7 +21,7 @@ doc](./doc/high-level-machi.pdf) for further references.)
|
||||||
(*) Capable of operating in "AP mode" or "CP mode" relative to the
|
(*) Capable of operating in "AP mode" or "CP mode" relative to the
|
||||||
CAP Theorem.
|
CAP Theorem.
|
||||||
|
|
||||||
## Status: early May 2015: work is underway
|
## Status: mid-June 2015: work is underway
|
||||||
|
|
||||||
The two major design documents for Machi are now ready or nearly ready
|
The two major design documents for Machi are now ready or nearly ready
|
||||||
for internal Basho and external party review. Please see the
|
for internal Basho and external party review. Please see the
|
||||||
|
@ -34,25 +34,23 @@ The work of implementing first draft of Machi is now underway. The
|
||||||
code from the [prototype/demo-day-hack](prototype/demo-day-hack/) directory is
|
code from the [prototype/demo-day-hack](prototype/demo-day-hack/) directory is
|
||||||
being used as the initial scaffolding.
|
being used as the initial scaffolding.
|
||||||
|
|
||||||
* The chain manager is nearly ready for "AP mode" use in eventual
|
* The chain manager is ready for "AP mode" use in eventual
|
||||||
consistency use cases. The remaining major item is file repair,
|
consistency use cases.
|
||||||
i.e., (re)syncing file data to replicas that have been offline (due
|
|
||||||
to crashes or network partition) or added to the cluster for the
|
|
||||||
first time.
|
|
||||||
|
|
||||||
* The Machi client/server protocol is still the hand-crafted,
|
* The Machi client/server protocol is still the hand-crafted,
|
||||||
artisanal, bogus protocol that I hacked together for a "demo day"
|
artisanal, bogus protocol that I hacked together for a "demo day"
|
||||||
back in January and appears in the
|
back in January and appears in the
|
||||||
[prototype/demo-day-hack](prototype/demo-day-hack/) code.
|
[prototype/demo-day-hack](prototype/demo-day-hack/) code.
|
||||||
* Today: the only client language supported is Erlang.
|
* Today: the only client language supported is Erlang.
|
||||||
* Plan: replace the current protocol to something based on Protocol Buffers
|
* Today: an HTTP interface that, at the moment, is a big kludge.
|
||||||
* Plan: add a protocol handler that is HTTP-like but probably not
|
If someone who really cares about an
|
||||||
exactly 100% REST'ish. Unless someone who really cares about an
|
|
||||||
HTTP interface that is 100% REST-ful cares to contribute some
|
HTTP interface that is 100% REST-ful cares to contribute some
|
||||||
code. (Contributions are welcome!)
|
code ... contributions are welcome!
|
||||||
* Plan: if you'd like to work on a protocol such as Thrift, UBF,
|
* Work in progress now: replace the current protocol to something
|
||||||
msgpack over UDP, or some other protocol, let us know by
|
based on Protocol Buffers
|
||||||
[opening an issue](./issues/new) to discuss it.
|
* If you'd like to work on a protocol such as Thrift, UBF,
|
||||||
|
msgpack over UDP, or some other protocol, let us know by
|
||||||
|
[opening an issue](./issues/new) to discuss it.
|
||||||
|
|
||||||
## Contributing to Machi: source code, documentation, etc.
|
## Contributing to Machi: source code, documentation, etc.
|
||||||
|
|
||||||
|
@ -72,6 +70,9 @@ working with the Basho development team.
|
||||||
|
|
||||||
## A brief survey of this directories in this repository
|
## A brief survey of this directories in this repository
|
||||||
|
|
||||||
|
* A list of Frequently Asked Questions, a.k.a.
|
||||||
|
[the Machi FAQ](./FAQ.md).
|
||||||
|
|
||||||
* The [doc](./doc/) directory: home for major documents about Machi:
|
* The [doc](./doc/) directory: home for major documents about Machi:
|
||||||
high level design documents as well as exploration of features still
|
high level design documents as well as exploration of features still
|
||||||
under design & review within Basho.
|
under design & review within Basho.
|
||||||
|
|
Loading…
Reference in a new issue