# Frequently Asked Questions (FAQ) # Outline + [1 Questions about Machi in general](#n1) + [1.1 What is Machi?](#n1.1) + [1.2 What does Machi's API look like?](#n1.2) + [2 Questions about Machi relative to something else](#n2) + [2.1 How is Machi better than Hadoop?](#n2.1) + [2.2 How does Machi differ from HadoopFS?](#n2.2) + [2.3 How does Machi differ from Kafka?](#n2.3) + [2.4 How does Machi differ from Bookkeeper?](#n2.4) + [2.5 How does Machi differ from CORFU and Tango?](#n2.5) ## 1. Questions about Machi in general ### 1.1. What is Machi? TODO: expand this topic. Very briefly, Machi is a very simple append-only file store; it is "dumber" than many other file stores (i.e., lacking many features found in other file stores) such as HadoopFS or simple NFS or CIFS file server. However, Machi is a distributed file store, which makes it different (and, in some ways, more complicated) than a simple NFS or CIFS file server. As a distributed system, Machi can be configured to operate with either eventually consistent mode or strongly consistent mode. (See the high level design document for definitions and details.) For a much longer answer, please see the [Machi high level design doc](./doc/high-level-machi.pdf). ### 1.2. What does Machi's API look like? The Machi API only contains a handful of API operations. The function arguments shown below use Erlang-style type annotations. append_chunk(Prefix:binary(), Chunk:binary()). append_chunk_extra(Prefix:binary(), Chunk:binary(), ExtraSpace:non_neg_integer()). read_chunk(File:binary(), Offset:non_neg_integer(), Size:non_neg_integer()). checksum_list(File:binary()). list_files(). Machi allows the client to choose the prefix of the file name to append data to, but the Machi server will always choose the final file name and byte offset for each `append_chunk()` operation. This restriction on file naming makes it easy to operate in "eventually consistent" mode: files may be written to any server during network partitions and can be easily merged together after the partition is healed. Internally, there is a more complex protocol used by individual cluster members to manage file contents and to repair damaged/missing files. See Figure 3 in [Machi high level design doc](./doc/high-level-machi.pdf) for more details. ## 2. Questions about Machi relative to something else ### 2.1. How is Machi better than Hadoop? This question is frequently asked by trolls. If this is a troll question, the answer is either, "Nothing is better than Hadoop," or else "Everything is better than Hadoop." The real answer is that Machi is not a distributed data processing framework like Hadoop is. See [Hadoop's entry in Wikipedia](https://en.wikipedia.org/wiki/Apache_Hadoop) and focus on the description of Hadoop's MapReduce and YARN; Machi contains neither. ### 2.2. How does Machi differ from HadoopFS? This is a much better question than the [How is Machi better than Hadoop?](#better-than-hadoop) question. [HadoopFS's entry in Wikipedia](https://en.wikipedia.org/wiki/Apache_Hadoop#HDFS) One way to look at Machi is to consider Machi as a distributed file store. HadoopFS is also a distributed file store. Let's compare and contrast.
Machi | Hadoop |
Not POSIX compliant | Not POSIX compliant |
Immutable file store with append-only semantics (simplifying things a little bit). | Immutable file store with append-only semantics |
File data may be read concurrently while file is being actively appended to. | File must be closed before a client can read it. |
No concept (yet) of users, directories, or ACLs | Has concepts of users, directories, and ACLs. |
Machi does not allow clients to name their own files or to specify data placement/offset within a file. | While not POSIX compliant, HDFS allows a fairly flexible API for managing file names and file writing position within a file (during a file's writable phase). |
Does not have any file distribution/partitioning/sharding across Machi clusters: in a single Machi cluster, all files are replicated by all servers in the cluster. The "cluster of clusters" concept is used to distribute/partition/shard files across multiple Machi clusters. | File distribution/partitioning/sharding is performed automatically by the HDFS "name node". |
Machi requires no central "name node" for single cluster use. Machi requires no central "name node" for "cluster of clusters" use | Requires a single "namenode" server to maintain file system contents and file content mapping. (May be deployed with a "secondary namenode" to reduce unavailability when the primary namenode fails.) |
Machi uses Chain Replication to manage all file replicas. | The HDFS name node uses an ad hoc mechanism for replicating file contents. The HDFS file system metadata (file names, file block(s) locations, ACLs, etc.) is stored by the name node in the local file system and is replicated to any secondary namenode using snapshots. |
Machi replicates files *N* ways where *N* is the length of the Chain Replication chain. Typically, *N=2*, but this is configurable. | HDFS typical replicates file contents *N=3* ways, but this is configurable. |
Machi | Kafka |
Append-only, strongly consistent file store only | Append-only, strongly consistent log file store + additional services: for example, producer topics & sharding, consumer groups & failover, etc. |
Not yet code complete nor "battle tested" in large production environments. | "Battle tested" in large production environments. |