# Frequently Asked Questions (FAQ) # Outline + [1 Questions about Machi in general](#n1) + [1.1 What is Machi?](#n1.1) + [1.2 What is a Machi "cluster of clusters"?](#n1.2) + [1.2.1 This "cluster of clusters" idea needs a better name, don't you agree?](#n1.2.1) + [1.3 What is Machi like when operating in "eventually consistent"/"AP mode"?](#n1.3) + [1.4 What is Machi like when operating in "strongly consistent"/"CP mode"?](#n1.4) + [1.5 What does Machi's API look like?](#n1.5) + [1.6 What licensing terms are used by Machi?](#n1.6) + [1.7 Where can I find the Machi source code and documentation? Can I contribute?](#n1.7) + [1.8 What is Machi's expected release schedule, packaging, and operating system/OS distribution support?](#n1.8) + [2 Questions about Machi relative to {{something else}}](#n2) + [2.1 How is Machi better than Hadoop?](#n2.1) + [2.2 How does Machi differ from HadoopFS/HDFS?](#n2.2) + [2.3 How does Machi differ from Kafka?](#n2.3) + [2.4 How does Machi differ from Bookkeeper?](#n2.4) + [2.5 How does Machi differ from CORFU and Tango?](#n2.5) + [3 Machi's specifics](#n3) + [3.1 What technique is used to replicate Machi's files? Can other techniques be used?](#n3.1) + [3.2 Does Machi have a reliance on a coordination service such as ZooKeeper or etcd?](#n3.2) + [3.3 Is it true that there's an allegory written to describe humming consensus?](#n3.3) + [3.4 How is Machi tested?](#n3.4) + [3.5 Does Machi require shared disk storage? e.g. iSCSI, NBD (Network Block Device), Fibre Channel disks](#n3.5) + [3.6 Does Machi require or assume that servers with large numbers of disks must use RAID-0/1/5/6/10/50/60 to create a single block device?](#n3.6) + [3.7 What language(s) is Machi written in?](#n3.7) + [3.8 Does Machi use the Erlang/OTP network distribution system (aka "disterl")?](#n3.8) + [3.9 Can I use HTTP to write/read stuff into/from Machi?](#n3.9) ## 1. Questions about Machi in general ### 1.1. What is Machi? Very briefly, Machi is a very simple append-only file store. Machi is "dumber" than many other file stores (i.e., lacking many features found in other file stores) such as HadoopFS or simple NFS or CIFS file server. However, Machi is a distributed file store, which makes it different (and, in some ways, more complicated) than a simple NFS or CIFS file server. All Machi data is protected by SHA-1 checksums. By default, these checksums are calculated by the client to provide strong end-to-end protection against data corruption. (If the client does not provide a checksum, one will be generated by the first Machi server to handle the write request.) Internally, Machi uses these checksums for local data integrity checks and for server-to-server file synchronization and corrupt data repair. As a distributed system, Machi can be configured to operate with either eventually consistent mode or strongly consistent mode. In strongly consistent mode, Machi can provide write-once file store service in the same style as CORFU. Machi can be an easy to use tool for building fully ordered, log-based distributed systems and distributed data structures. In eventually consistent mode, Machi can remain available for writes during arbitrary network partitions. When a network partition is fixed, Machi can safely merge all file data together without data loss. Similar to the operation of Basho's [Riak key-value store, Riak KV](http://basho.com/products/riak-kv/), Machi can provide file writes during arbitrary network partitions and later merge all results together safely when the cluster recovers. For a much longer answer, please see the [Machi high level design doc](https://github.com/basho/machi/tree/master/doc/high-level-machi.pdf). ### 1.2. What is a Machi "cluster of clusters"? Machi's design is based on using small, well-understood and provable (mathematically) techniques to maintain multiple file copies without data loss or data corruption. At its lowest level, Machi contains no support for distribution/partitioning/sharding of files across many servers. A typical, fully-functional Machi cluster will likely be two or three machines. However, Machi is designed to be an excellent building block for building larger systems. A deployment of Machi "cluster of clusters" will use the "random slicing" technique for partitioning files across multiple Machi clusters that, as individuals, are unaware of the larger cluster-of-clusters scheme. The cluster-of-clusters management service will be fully decentralized and run as a separate software service installed on each Machi cluster. This manager will appear to the local Machi server as simply another Machi file client. The cluster-of-clusters managers will take care of file migration as the cluster grows and shrinks in capacity and in response to day-to-day changes in workload. Though the cluster-of-clusters manager has not yet been implemented, its design is fully decentralized and capable of operating despite multiple partial failure of its member clusters. We expect this design to scale easily to at least one thousand servers. Please see the [Machi source repository's 'doc' directory for more details](https://github.com/basho/machi/tree/master/doc/). #### 1.2.1. This "cluster of clusters" idea needs a better name, don't you agree? Yes. Please help us: we are bad at naming things. For proof that naming things is hard, see [http://martinfowler.com/bliki/TwoHardThings.html](http://martinfowler.com/bliki/TwoHardThings.html) ### 1.3. What is Machi like when operating in "eventually consistent"/"AP mode"? Machi's operating mode dictates how a Machi cluster will react to network partitions. A network partition may be caused by: * A network failure * A server failure * An extreme server software "hang" or "pause", e.g. caused by OS scheduling problems such as a failing/stuttering disk device. "AP mode" refers to the "A" and "P" properties of the "CAP conjecture", meaning that the cluster will be "Available" and "Partition tolerant". The consistency semantics of file operations while in "AP mode" are eventually consistent during and after network partitions: * File write operations are permitted by any client on the "same side" of the network partition. * File read operations are successful for any file contents where the client & server are on the "same side" of the network partition. * After the network partition(s) is resolved, files are merged together from "all sides" of the partition(s). * Unique files are copied in their entirety. * Byte ranges within the same file are merged. This is possible due to Machi's restrictions on file naming (files names are alwoys assigned by Machi servers) and file offset assignments (byte offsets are also always chosen by Machi servers according to rules which guarantee safe mergeability.). ### 1.4. What is Machi like when operating in "strongly consistent"/"CP mode"? Machi's operating mode dictates how a Machi cluster will react to network partitions. "CP mode" refers to the "C" and "P" properties of the "CAP conjecture", meaning that the cluster will be "Consistent" and "Partition tolerant". The consistency semantics of file operations while in "CP mode" are strongly consistent during and after network partitions: * File write operations are permitted by any client on the "same side" of the network partition if and only if a quorum majority of Machi servers are also accessible within that partition. * In other words, file write service is unavailable in any partition where only a minority of Machi servers are accessible. * File read operations are successful for any file contents where the client & server are on the "same side" of the network partition. * After the network partition(s) is resolved, files are repaired from the surviving quorum majority members to out-of-sync minority members. Machi's design can provide the illusion of quorum minority write availability if the cluster is configured to operate with "witness servers". (This feaure is not implemented yet, as of June 2015.) See Section 11 of [Machi chain manager high level design doc](https://github.com/basho/machi/tree/master/doc/high-level-chain-mgr.pdf) for more details. ### 1.5. What does Machi's API look like? The Machi API only contains a handful of API operations. The function arguments shown below use Erlang-style type annotations. append_chunk(Prefix:binary(), Chunk:binary()). append_chunk_extra(Prefix:binary(), Chunk:binary(), ExtraSpace:non_neg_integer()). read_chunk(File:binary(), Offset:non_neg_integer(), Size:non_neg_integer()). checksum_list(File:binary()). list_files(). Machi allows the client to choose the prefix of the file name to append data to, but the Machi server will always choose the final file name and byte offset for each `append_chunk()` operation. This restriction on file naming makes it easy to operate in "eventually consistent" mode: files may be written to any server during network partitions and can be easily merged together after the partition is healed. Internally, there is a more complex protocol used by individual cluster members to manage file contents and to repair damaged/missing files. See Figure 3 in [Machi high level design doc](https://github.com/basho/machi/tree/master/doc/high-level-machi.pdf) for more details. ### 1.6. What licensing terms are used by Machi? All Machi source code and documentation is licensed by [Basho Technologies, Inc.](http://www.basho.com/) under the [Apache Public License version 2](https://github.com/basho/machi/tree/master/LICENSE). ### 1.7. Where can I find the Machi source code and documentation? Can I contribute? All Machi source code and documentation can be found at GitHub: [https://github.com/basho/machi](https://github.com/basho/machi). The full URL for this FAQ is [https://github.com/basho/machi/blob/master/FAQ.md](https://github.com/basho/machi/blob/master/FAQ.md). There are several "README" files in the source repository. We hope they provide useful guidance for first-time readers. If you're interested in contributing code or documentation or ideas for improvement, please see our contributing & collaboration guidelines at [https://github.com/basho/machi/blob/master/CONTRIBUTING.md](https://github.com/basho/machi/blob/master/CONTRIBUTING.md). ### 1.8. What is Machi's expected release schedule, packaging, and operating system/OS distribution support? Basho expects that Machi's first release will take place near the end of calendar year 2015. Basho's official support for operating systems (e.g. Linux, FreeBSD), operating system packaging (e.g. CentOS rpm/yum package management, Ubuntu debian/apt-get package management), and container/virtualization have not yet been chosen. If you wish to provide your opinion, we'd love to hear it. Please [open a support ticket at GitHub](https://github.com/basho/machi/issues/new) and let us know. ## 2. Questions about Machi relative to {{something else}} ### 2.1. How is Machi better than Hadoop? This question is frequently asked by trolls. If this is a troll question, the answer is either, "Nothing is better than Hadoop," or else "Everything is better than Hadoop." The real answer is that Machi is not a distributed data processing framework like Hadoop is. See [Hadoop's entry in Wikipedia](https://en.wikipedia.org/wiki/Apache_Hadoop) and focus on the description of Hadoop's MapReduce and YARN; Machi contains neither. ### 2.2. How does Machi differ from HadoopFS/HDFS? This is a much better question than the [How is Machi better than Hadoop?](#better-than-hadoop) question. [HadoopFS's entry in Wikipedia](https://en.wikipedia.org/wiki/Apache_Hadoop#HDFS) One way to look at Machi is to consider Machi as a distributed file store. HadoopFS is also a distributed file store. Let's compare and contrast.
Machi | HadoopFS (HDFS) |
Not POSIX compliant | Not POSIX compliant |
Immutable file store with append-only semantics (simplifying things a little bit). | Immutable file store with append-only semantics |
File data may be read concurrently while file is being actively appended to. | File must be closed before a client can read it. |
No concept (yet) of users or authentication (though the initial supported release will support basic user + password authentication). Machi will probably never natively support directories or ACLs. | Has concepts of users, directories, and ACLs. |
Machi does not allow clients to name their own files or to specify data placement/offset within a file. | While not POSIX compliant, HDFS allows a fairly flexible API for managing file names and file writing position within a file (during a file's writable phase). |
Does not have any file distribution/partitioning/sharding across Machi clusters: in a single Machi cluster, all files are replicated by all servers in the cluster. The "cluster of clusters" concept is used to distribute/partition/shard files across multiple Machi clusters. | File distribution/partitioning/sharding is performed automatically by the HDFS "name node". |
Machi requires no central "name node" for single cluster use. Machi requires no central "name node" for "cluster of clusters" use | Requires a single "namenode" server to maintain file system contents and file content mapping. (May be deployed with a "secondary namenode" to reduce unavailability when the primary namenode fails.) |
Machi uses Chain Replication to manage all file replicas. | The HDFS name node uses an ad hoc mechanism for replicating file contents. The HDFS file system metadata (file names, file block(s) locations, ACLs, etc.) is stored by the name node in the local file system and is replicated to any secondary namenode using snapshots. |
Machi replicates files *N* ways where *N* is the length of the Chain Replication chain. Typically, *N=2*, but this is configurable. | HDFS typical replicates file contents *N=3* ways, but this is configurable. |
All Machi file data is protected by SHA-1 checksums generated by the client prior to writing by Machi servers. | Optional file checksum protection may be implemented on the server side. |
Machi | Kafka |
Append-only, strongly consistent file store only | Append-only, strongly consistent log file store + additional services: for example, producer topics & sharding, consumer groups & failover, etc. |
Not yet code complete nor "battle tested" in large production environments. | "Battle tested" in large production environments. |
All Machi file data is protected by SHA-1 checksums generated by the client prior to writing by Machi servers. | Each log entry is protected by a 32 bit CRC checksum. |
Machi | CORFU |
Writes & reads may be on byte boundaries | Wries & reads must be on page boundaries, e.g. 4 or 8 KBytes, to align with server storage based on flash NVRAM/solid state disk (SSD). |
Provides multiple "logs", where each log has a name and is appended to & read from like a file. A read operation requires a 3-tuple: file name, starting byte offset, number of bytes. | Provides a single "log". A read operation requires only a 1-tuple: the log page number. (A protocol option exists to request multiple pages in a single read query?) |
Offers service in either strongly consistent mode or eventually consistent mode. | Offers service in strongly consistent mode. |
May be deployed on solid state disk (SSD) or Winchester hard disks. | Designed for use with solid state disk (SSD) but can also be used with Winchester hard disks (with a performance penalty if used as suggested by use cases described by the CORFU papers). |
All Machi file data is protected by SHA-1 checksums generated by the client prior to writing by Machi servers. | Depending on server & flash device capabilities, each data page may be protected by a checksum (calculated independently by each server rather than the client). |