240 lines
7.9 KiB
Markdown
240 lines
7.9 KiB
Markdown
|
# Frequently Asked Questions (FAQ)
|
||
|
|
||
|
<!-- Formatting: -->
|
||
|
<!-- All headings omitted from outline are H1 -->
|
||
|
<!-- All other headings must be on a single line -->
|
||
|
<!-- Run: ./priv/make-faq.pl ./FAQ.md > ./tmpfoo; mv ./tmpfoo ./FAQ.md -->
|
||
|
|
||
|
# Outline
|
||
|
|
||
|
<!-- OUTLINE -->
|
||
|
|
||
|
+ [1 Questions about Machi in general](#n1)
|
||
|
+ [1.1 What is Machi?](#n1.1)
|
||
|
+ [1.2 What does Machi's API look like?](#n1.2)
|
||
|
+ [2 Questions about Machi relative to something else](#n2)
|
||
|
+ [2.1 How is Machi better than Hadoop?](#n2.1)
|
||
|
+ [2.2 How does Machi differ from HadoopFS?](#n2.2)
|
||
|
+ [2.3 How does Machi differ from Kafka?](#n2.3)
|
||
|
+ [2.4 How does Machi differ from Bookkeeper?](#n2.4)
|
||
|
+ [2.5 How does Machi differ from CORFU and Tango?](#n2.5)
|
||
|
|
||
|
<!-- ENDOUTLINE -->
|
||
|
|
||
|
<a name="n1">
|
||
|
## 1. Questions about Machi in general
|
||
|
|
||
|
<a name="n1.1">
|
||
|
### 1.1. What is Machi?
|
||
|
|
||
|
TODO: expand this topic.
|
||
|
|
||
|
Very briefly, Machi is a very simple append-only file store; it is
|
||
|
"dumber" than many other file stores (i.e., lacking many features
|
||
|
found in other file stores) such as HadoopFS or simple NFS or CIFS file
|
||
|
server.
|
||
|
However, Machi is a distributed file store, which makes it different
|
||
|
(and, in some ways, more complicated) than a simple NFS or CIFS file
|
||
|
server.
|
||
|
|
||
|
As a distributed system, Machi can be configured to operate with
|
||
|
either eventually consistent mode or strongly consistent mode. (See
|
||
|
the high level design document for definitions and details.)
|
||
|
|
||
|
For a much longer answer, please see the
|
||
|
[Machi high level design doc](./doc/high-level-machi.pdf).
|
||
|
|
||
|
<a name="n1.2">
|
||
|
### 1.2. What does Machi's API look like?
|
||
|
|
||
|
The Machi API only contains a handful of API operations. The function
|
||
|
arguments shown below use Erlang-style type annotations.
|
||
|
|
||
|
append_chunk(Prefix:binary(), Chunk:binary()).
|
||
|
append_chunk_extra(Prefix:binary(), Chunk:binary(), ExtraSpace:non_neg_integer()).
|
||
|
read_chunk(File:binary(), Offset:non_neg_integer(), Size:non_neg_integer()).
|
||
|
|
||
|
checksum_list(File:binary()).
|
||
|
list_files().
|
||
|
|
||
|
Machi allows the client to choose the prefix of the file name to
|
||
|
append data to, but the Machi server will always choose the final file
|
||
|
name and byte offset for each `append_chunk()` operation. This
|
||
|
restriction on file naming makes it easy to operate in "eventually
|
||
|
consistent" mode: files may be written to any server during network
|
||
|
partitions and can be easily merged together after the partition is
|
||
|
healed.
|
||
|
|
||
|
Internally, there is a more complex protocol used by individual
|
||
|
cluster members to manage file contents and to repair damaged/missing
|
||
|
files. See Figure 3 in
|
||
|
[Machi high level design doc](./doc/high-level-machi.pdf)
|
||
|
for more details.
|
||
|
|
||
|
<a name="n2">
|
||
|
## 2. Questions about Machi relative to something else
|
||
|
|
||
|
<a name="better-than-hadoop">
|
||
|
<a name="n2.1">
|
||
|
### 2.1. How is Machi better than Hadoop?
|
||
|
|
||
|
This question is frequently asked by trolls. If this is a troll
|
||
|
question, the answer is either, "Nothing is better than Hadoop," or
|
||
|
else "Everything is better than Hadoop."
|
||
|
|
||
|
The real answer is that Machi is not a distributed data processing
|
||
|
framework like Hadoop is.
|
||
|
See [Hadoop's entry in Wikipedia](https://en.wikipedia.org/wiki/Apache_Hadoop)
|
||
|
and focus on the description of Hadoop's MapReduce and YARN; Machi
|
||
|
contains neither.
|
||
|
|
||
|
<a name="n2.2">
|
||
|
### 2.2. How does Machi differ from HadoopFS?
|
||
|
|
||
|
This is a much better question than the
|
||
|
[How is Machi better than Hadoop?](#better-than-hadoop)
|
||
|
question.
|
||
|
|
||
|
[HadoopFS's entry in Wikipedia](https://en.wikipedia.org/wiki/Apache_Hadoop#HDFS)
|
||
|
|
||
|
One way to look at Machi is to consider Machi as a distributed file
|
||
|
store. HadoopFS is also a distributed file store. Let's compare and
|
||
|
contrast.
|
||
|
|
||
|
<table>
|
||
|
|
||
|
<tr>
|
||
|
<td> <b>Machi</b>
|
||
|
<td> <b>Hadoop</b>
|
||
|
|
||
|
<tr>
|
||
|
<td> Not POSIX compliant
|
||
|
<td> Not POSIX compliant
|
||
|
|
||
|
<tr>
|
||
|
<td> Immutable file store with append-only semantics (simplifying
|
||
|
things a little bit).
|
||
|
<td> Immutable file store with append-only semantics
|
||
|
|
||
|
<tr>
|
||
|
<td> File data may be read concurrently while file is being actively
|
||
|
appended to.
|
||
|
<td> File must be closed before a client can read it.
|
||
|
|
||
|
<tr>
|
||
|
<td> No concept (yet) of users, directories, or ACLs
|
||
|
<td> Has concepts of users, directories, and ACLs.
|
||
|
|
||
|
<tr>
|
||
|
<td> Machi oes not allow clients to name their own files or to specify data
|
||
|
placement/offset within a file.
|
||
|
<td> While not POSIX compliant, HDFS allows a fairly flexible API for
|
||
|
managing file names and file writing position within a file (during a
|
||
|
file's writable phase).
|
||
|
|
||
|
<tr>
|
||
|
<td> Does not have any file distribution/partitioning/sharding across
|
||
|
Machi clusters: in a single Machi cluster, all files are replicated by
|
||
|
all servers in the cluster. The "cluster of clusters" concept is used
|
||
|
to distribute/partition/shard files across multiple Machi clusters.
|
||
|
<td> File distribution/partitioning/sharding is performed
|
||
|
automatically by the HDFS "name node".
|
||
|
|
||
|
<tr>
|
||
|
<td> Machi requires no central "name node" for single cluster use.
|
||
|
Machi requires no central "name node" for "cluster of clusters" use
|
||
|
<td> Requires a single "namenode" server to maintain file system contents
|
||
|
and file content mapping. (May be deployed with a "secondary
|
||
|
namenode" to reduce unavailability when the primary namenode fails.)
|
||
|
|
||
|
<tr>
|
||
|
<td> Machi uses Chain Replication to manage all file replicas.
|
||
|
<td> The HDFS name node uses an ad hoc mechanism for replicating file
|
||
|
contents. The HDFS file system metadata (file names, file block(s)
|
||
|
locations, ACLs, etc.) is stored by the name node in the local file
|
||
|
system and is replicated to any secondary namenode using snapshots.
|
||
|
|
||
|
<tr>
|
||
|
<td> Machi replicates files *N* ways where *N* is the length of the
|
||
|
Chain Replication chain. Typically, *N=2*, but this is configurable.
|
||
|
<td> HDFS typical replicates file contents *N=3* ways, but this is
|
||
|
configurable.
|
||
|
|
||
|
<tr>
|
||
|
<td>
|
||
|
<td>
|
||
|
|
||
|
</table>
|
||
|
|
||
|
<a name="n2.3">
|
||
|
### 2.3. How does Machi differ from Kafka?
|
||
|
|
||
|
Machi is rather close to Kafka in spirit, though its implementation is
|
||
|
quite different.
|
||
|
|
||
|
<table>
|
||
|
|
||
|
<tr>
|
||
|
<td> <b>Machi</b>
|
||
|
<td> <b>Kafka</b>
|
||
|
|
||
|
<tr>
|
||
|
<td> Append-only, strongly consistent file store only
|
||
|
<td> Append-only, strongly consistent log file store + additional
|
||
|
services: for example, producer topics & sharding, consumer groups &
|
||
|
failover, etc.
|
||
|
|
||
|
<tr>
|
||
|
<td> Not yet code complete nor "battle tested" in large production
|
||
|
environments.
|
||
|
<td> "Battle tested" in large production environments.
|
||
|
|
||
|
</table>
|
||
|
|
||
|
In theory, it should be "quite straightforward" to remove these parts
|
||
|
of Kafka's code base:
|
||
|
|
||
|
* local file system I/O for all topic/partition/log files
|
||
|
* leader/follower file replication, ISR ("In Sync Replica") state
|
||
|
management, and related log file replication logic
|
||
|
|
||
|
... and replace those parts with Machi client API calls. Those parts
|
||
|
of Kafka are what Machi has been designed to do from the very
|
||
|
beginning.
|
||
|
|
||
|
See also:
|
||
|
<a href="#corfu-and-tango">How does Machi differ from CORFU and Tango?</a>
|
||
|
|
||
|
<a name="n2.4">
|
||
|
### 2.4. How does Machi differ from Bookkeeper?
|
||
|
|
||
|
Sorry, we haven't studied Bookkeeper very deeply or used Bookkeeper
|
||
|
for any non-trivial project.
|
||
|
|
||
|
One notable limitation of the Bookkeeper API is that a ledger cannot
|
||
|
be read by other clients until it has been closed. Any byte in a
|
||
|
Machi file that has been written successfully may
|
||
|
be read immedately by any other Machi client.
|
||
|
|
||
|
The name "Machi" does not have three consecutive pairs of repeating
|
||
|
letters. The name "Bookkeeper" does.
|
||
|
|
||
|
<a name="corfu-and-tango">
|
||
|
<a name="n2.5">
|
||
|
### 2.5. How does Machi differ from CORFU and Tango?
|
||
|
|
||
|
Machi's design borrows very heavily from CORFU. We acknowledge a deep
|
||
|
debt to the original Microsoft Research papers that describe CORFU's
|
||
|
original design and implementation.
|
||
|
|
||
|
See also: the "Recommended reading & related work" and "References"
|
||
|
sections of the
|
||
|
[Machi high level design doc](./doc/high-level-machi.pdf)
|
||
|
for pointers to the MSR papers related to CORFU.
|
||
|
|
||
|
Machi does not implement Tango directly. (Not yet, at least.)
|
||
|
However, there is a prototype implementation included in the Machi
|
||
|
source tree. See
|
||
|
[the prototype/tango source code directory](https://github.com/basho/machi/tree/master/prototype/tango)
|
||
|
for details.
|