From 73b6a90e78f06adf16df537211353f10089ce811 Mon Sep 17 00:00:00 2001 From: Scott Lystig Fritchie Date: Sun, 21 Jun 2015 17:43:43 +0900 Subject: [PATCH] Create FAQ.md --- FAQ.md | 239 +++++++++++++++++++++++++++++++++++++++++++++++ priv/make-faq.pl | 81 ++++++++++++++++ 2 files changed, 320 insertions(+) create mode 100644 FAQ.md create mode 100755 priv/make-faq.pl diff --git a/FAQ.md b/FAQ.md new file mode 100644 index 0000000..6a04c8c --- /dev/null +++ b/FAQ.md @@ -0,0 +1,239 @@ +# Frequently Asked Questions (FAQ) + + + + + + +# Outline + + + ++ [1 Questions about Machi in general](#n1) + + [1.1 What is Machi?](#n1.1) + + [1.2 What does Machi's API look like?](#n1.2) ++ [2 Questions about Machi relative to something else](#n2) + + [2.1 How is Machi better than Hadoop?](#n2.1) + + [2.2 How does Machi differ from HadoopFS?](#n2.2) + + [2.3 How does Machi differ from Kafka?](#n2.3) + + [2.4 How does Machi differ from Bookkeeper?](#n2.4) + + [2.5 How does Machi differ from CORFU and Tango?](#n2.5) + + + + +## 1. Questions about Machi in general + + +### 1.1. What is Machi? + +TODO: expand this topic. + +Very briefly, Machi is a very simple append-only file store; it is +"dumber" than many other file stores (i.e., lacking many features +found in other file stores) such as HadoopFS or simple NFS or CIFS file +server. +However, Machi is a distributed file store, which makes it different +(and, in some ways, more complicated) than a simple NFS or CIFS file +server. + +As a distributed system, Machi can be configured to operate with +either eventually consistent mode or strongly consistent mode. (See +the high level design document for definitions and details.) + +For a much longer answer, please see the +[Machi high level design doc](./doc/high-level-machi.pdf). + + +### 1.2. What does Machi's API look like? + +The Machi API only contains a handful of API operations. The function +arguments shown below use Erlang-style type annotations. + + append_chunk(Prefix:binary(), Chunk:binary()). + append_chunk_extra(Prefix:binary(), Chunk:binary(), ExtraSpace:non_neg_integer()). + read_chunk(File:binary(), Offset:non_neg_integer(), Size:non_neg_integer()). + + checksum_list(File:binary()). + list_files(). + +Machi allows the client to choose the prefix of the file name to +append data to, but the Machi server will always choose the final file +name and byte offset for each `append_chunk()` operation. This +restriction on file naming makes it easy to operate in "eventually +consistent" mode: files may be written to any server during network +partitions and can be easily merged together after the partition is +healed. + +Internally, there is a more complex protocol used by individual +cluster members to manage file contents and to repair damaged/missing +files. See Figure 3 in +[Machi high level design doc](./doc/high-level-machi.pdf) +for more details. + + +## 2. Questions about Machi relative to something else + + + +### 2.1. How is Machi better than Hadoop? + +This question is frequently asked by trolls. If this is a troll +question, the answer is either, "Nothing is better than Hadoop," or +else "Everything is better than Hadoop." + +The real answer is that Machi is not a distributed data processing +framework like Hadoop is. +See [Hadoop's entry in Wikipedia](https://en.wikipedia.org/wiki/Apache_Hadoop) +and focus on the description of Hadoop's MapReduce and YARN; Machi +contains neither. + + +### 2.2. How does Machi differ from HadoopFS? + +This is a much better question than the +[How is Machi better than Hadoop?](#better-than-hadoop) +question. + +[HadoopFS's entry in Wikipedia](https://en.wikipedia.org/wiki/Apache_Hadoop#HDFS) + +One way to look at Machi is to consider Machi as a distributed file +store. HadoopFS is also a distributed file store. Let's compare and +contrast. + + + + + + + + + + + + + + +
Machi + Hadoop + +
Not POSIX compliant + Not POSIX compliant + +
Immutable file store with append-only semantics (simplifying +things a little bit). + Immutable file store with append-only semantics + +
File data may be read concurrently while file is being actively +appended to. + File must be closed before a client can read it. + +
No concept (yet) of users, directories, or ACLs + Has concepts of users, directories, and ACLs. + +
Machi oes not allow clients to name their own files or to specify data +placement/offset within a file. + While not POSIX compliant, HDFS allows a fairly flexible API for +managing file names and file writing position within a file (during a +file's writable phase). + +
Does not have any file distribution/partitioning/sharding across +Machi clusters: in a single Machi cluster, all files are replicated by +all servers in the cluster. The "cluster of clusters" concept is used +to distribute/partition/shard files across multiple Machi clusters. + File distribution/partitioning/sharding is performed +automatically by the HDFS "name node". + +
Machi requires no central "name node" for single cluster use. +Machi requires no central "name node" for "cluster of clusters" use + Requires a single "namenode" server to maintain file system contents +and file content mapping. (May be deployed with a "secondary +namenode" to reduce unavailability when the primary namenode fails.) + +
Machi uses Chain Replication to manage all file replicas. + The HDFS name node uses an ad hoc mechanism for replicating file +contents. The HDFS file system metadata (file names, file block(s) +locations, ACLs, etc.) is stored by the name node in the local file +system and is replicated to any secondary namenode using snapshots. + +
Machi replicates files *N* ways where *N* is the length of the +Chain Replication chain. Typically, *N=2*, but this is configurable. + HDFS typical replicates file contents *N=3* ways, but this is +configurable. + +
+ + +
+ +
+### 2.3. How does Machi differ from Kafka? + +Machi is rather close to Kafka in spirit, though its implementation is +quite different. + + + + + + +
Machi + Kafka + +
Append-only, strongly consistent file store only + Append-only, strongly consistent log file store + additional +services: for example, producer topics & sharding, consumer groups & +failover, etc. + +
Not yet code complete nor "battle tested" in large production +environments. + "Battle tested" in large production environments. + +
+ +In theory, it should be "quite straightforward" to remove these parts +of Kafka's code base: + +* local file system I/O for all topic/partition/log files +* leader/follower file replication, ISR ("In Sync Replica") state + management, and related log file replication logic + +... and replace those parts with Machi client API calls. Those parts +of Kafka are what Machi has been designed to do from the very +beginning. + +See also: +
How does Machi differ from CORFU and Tango? + + +### 2.4. How does Machi differ from Bookkeeper? + +Sorry, we haven't studied Bookkeeper very deeply or used Bookkeeper +for any non-trivial project. + +One notable limitation of the Bookkeeper API is that a ledger cannot +be read by other clients until it has been closed. Any byte in a +Machi file that has been written successfully may +be read immedately by any other Machi client. + +The name "Machi" does not have three consecutive pairs of repeating +letters. The name "Bookkeeper" does. + + + +### 2.5. How does Machi differ from CORFU and Tango? + +Machi's design borrows very heavily from CORFU. We acknowledge a deep +debt to the original Microsoft Research papers that describe CORFU's +original design and implementation. + +See also: the "Recommended reading & related work" and "References" +sections of the +[Machi high level design doc](./doc/high-level-machi.pdf) +for pointers to the MSR papers related to CORFU. + +Machi does not implement Tango directly. (Not yet, at least.) +However, there is a prototype implementation included in the Machi +source tree. See +[the prototype/tango source code directory](https://github.com/basho/machi/tree/master/prototype/tango) +for details. diff --git a/priv/make-faq.pl b/priv/make-faq.pl new file mode 100755 index 0000000..7edee07 --- /dev/null +++ b/priv/make-faq.pl @@ -0,0 +1,81 @@ +#!/usr/bin/perl + +$input = shift; +$tmp1 = "/tmp/my-tmp.1.$$"; +$tmp2 = "/tmp/my-tmp.2.$$"; +$l1 = 0; +$l2 = 0; +$l3 = 0; + +open(I, $input); +open(T1, "> $tmp1"); +open(T2, "> $tmp2"); + +while () { + if (/^##*/) { + $line = $_; + chomp; + @a = split; + $count = length($a[0]) - 2; + if ($count >= 0) { + if ($count == 0) { + $l1++; + $l2 = 0; + $l3 = 0; + $label = "$l1" + } + if ($count == 1) { + $l2++; + $l3 = 0; + $label = "$l1.$l2" + } + if ($count == 2) { + $l3++; + $label = "$l1.$l2.$l3" + } + $indent = " " x ($count * 4); + s/^#*\s*[0-9. ]*//; + $anchor = "n$label"; + printf T1 "%s+ [%s %s](#%s)\n", $indent, $label, $_, $anchor; + printf T2 "\n", $anchor; + $line =~ s/(#+)\s*[0-9. ]*/$1 $label. /; + print T2 $line; + } else { + print T2 $_, "\n"; + } + } else { + next if /^/; + print T2 $_; + } +} + +close(I); +close(T1); +close(T2); +open(T2, $tmp2); + +while () { + if (//) { + print; + print "\n"; + open(T1, $tmp1); + while () { + print; + } + close(T1); + while () { + if (//) { + print "\n"; + print; + last; + } + } + } else { + print; + } +} +close(T2); + +unlink($tmp1); +unlink($tmp2); +exit(0);