From fa71a918b882a2f29acbef9663e70b88cfde2e6f Mon Sep 17 00:00:00 2001 From: Scott Lystig Fritchie Date: Wed, 9 Mar 2016 12:12:34 -0800 Subject: [PATCH] README and FAQ updates for mid-March 2016 --- FAQ.md | 13 ++-- README.md | 110 ++++++++++++++++++++-------------- doc/humming-consensus-demo.md | 2 +- 3 files changed, 74 insertions(+), 51 deletions(-) diff --git a/FAQ.md b/FAQ.md index 6d43e8f..ee563c9 100644 --- a/FAQ.md +++ b/FAQ.md @@ -46,13 +46,13 @@ ### 1.1. What is Machi? -Very briefly, Machi is a very simple append-only file store. +Very briefly, Machi is a very simple append-only blob/file store. Machi is "dumber" than many other file stores (i.e., lacking many features found in other file stores) such as HadoopFS or a simple NFS or CIFS file server. -However, Machi is a distributed file store, which makes it different +However, Machi is a distributed blob/file store, which makes it different (and, in some ways, more complicated) than a simple NFS or CIFS file server. @@ -142,7 +142,8 @@ consistency mode during and after network partitions are: due to Machi's restrictions on file naming and file offset assignment. Both file names and file offsets are always chosen by Machi servers according to rules which guarantee safe - mergeability. + mergeability. Server-assigned names are a characteristic of a + "blob store". ### 1.5. What is Machi like when operating in "strongly consistent" mode? @@ -172,10 +173,10 @@ for more details. ### 1.6. What does Machi's API look like? The Machi API only contains a handful of API operations. The function -arguments shown below use Erlang-style type annotations. +arguments shown below (in simplifed form) use Erlang-style type annotations. - append_chunk(Prefix:binary(), Chunk:binary()). - append_chunk_extra(Prefix:binary(), Chunk:binary(), ExtraSpace:non_neg_integer()). + append_chunk(Prefix:binary(), Chunk:binary(), CheckSum:binary()). + append_chunk_extra(Prefix:binary(), Chunk:binary(), CheckSum:binary(), ExtraSpace:non_neg_integer()). read_chunk(File:binary(), Offset:non_neg_integer(), Size:non_neg_integer()). checksum_list(File:binary()). diff --git a/README.md b/README.md index 6e86eb9..25b9fff 100644 --- a/README.md +++ b/README.md @@ -4,16 +4,16 @@ Outline -1. [Why another file store?](#sec1) +1. [Why another blob/file store?](#sec1) 2. [Where to learn more about Machi](#sec2) 3. [Development status summary](#sec3) 4. [Contributing to Machi's development](#sec4) -## 1. Why another file store? +## 1. Why another blob/file store? Our goal is a robust & reliable, distributed, highly available, large -file store. Such stores already exist, both in the open source world +file and blob store. Such stores already exist, both in the open source world and in the commercial world. Why reinvent the wheel? We believe there are three reasons, ordered by decreasing rarity. @@ -25,9 +25,8 @@ there are three reasons, ordered by decreasing rarity. 3. We want to manage file replicas in a way that's provably correct and also easy to test. -Of all the file stores in the open source & commercial worlds, only -criteria #3 is a viable option. Or so we hope. Or we just don't -care, and if data gets lost or corrupted, then ... so be it. +Criteria #3 is difficult to find in the open source world but perhaps +not impossible. If we have app use cases where availability is more important than consistency, then systems that meet criteria #2 are also rare. @@ -39,12 +38,13 @@ file data and attempts best-effort file reads? If we really do care about data loss and/or data corruption, then we really want both #3 and #1. Unfortunately, systems that meet -criteria #1 are _very rare_. +criteria #1 are _very rare_. (Nonexistant?) Why? This is 2015. We have decades of research that shows that computer hardware can (and indeed does) corrupt data at nearly every level of the modern client/server application stack. Systems with end-to-end data corruption detection should be ubiquitous today. Alas, they are not. + Machi is an effort to change the deplorable state of the world, one Erlang function at a time. @@ -70,46 +70,62 @@ including the network partition simulator. ## 3. Development status summary -Mid-December 2015: work is underway. +Mid-March 2016: The Machi development team has been downsized in +recent months, and the pace of development has slowed. Here is a +summary of the status of Machi's major components. -* In progress: - * Code refactoring: metadata management using - [ELevelDB](https://github.com/basho/eleveldb) - * File repair using file-centric, Merkle-style hash tree. - * Server-side socket handling is now performed by - [ranch](https://github.com/ninenines/ranch) - * QuickCheck tests for file repair correctness - * 2015-12-15: The EUnit test `machi_ap_repair_eqc` is - currently failing occasionally because it (correctly) detects - double-write errors. Double-write errors will be eliminated - when the ELevelDB integration work is complete. - * The `make stage` and `make release` commands can be used to - create a primitive "package". Use `./rel/machi/bin/machi console` - to start the Machi app in interactive mode. Substitute the word - `start` instead of console to start Machi in background/daemon - mode. The `./rel/machi/bin/machi` command without any arguments - will give a short usage summary. - * Chain Replication management using the Humming Consensus - algorithm to manage chain state is stable. - * ... with the caveat that it runs very well in a very harsh - and unforgiving network partition simulator but has not run - much yet in the real world. - * All Machi client/server protocols are based on - [Protocol Buffers](https://developers.google.com/protocol-buffers/docs/overview). - * The current specification for Machi's protocols can be found at - [https://github.com/basho/machi/blob/master/src/machi.proto](https://github.com/basho/machi/blob/master/src/machi.proto). - * The Machi PB protocol is not yet stable. Expect change! - * The Erlang language client implementation of the high-level - protocol flavor is brittle (e.g., little error handling yet). +* Humming Consensus and the chain manager + * No new safety bugs have been found by model-checking tests. + * A new document, + (Hand-on experiments with Machi and Humming Consensus)[doc/humming-consensus-demo.md] + is now available. It is a tutorial for setting up a 3 virtual + machine Machi cluster and how to demonstrate the chain manager's + reactions to server stops & starts, crashes & restarts, and pauses + (simulated by `SIGSTOP` and `SIGCONT`). + * The chain manager can still make suboptimal-but-safe choices for + chain transitions when a server hangs/pauses temporarily. + * Recent chain manager changes have made the instability window + much shorter when the slow/paused server resumes execution. + * Scott believes that a modest change to the chain manager's + calculation of a new projection can reduce flapping in this (and + many other cases) less likely. Currently, the new local + projection is calculated using only local state (i.e., the chain + manager's internal state + the fitness server's state). + However, if the "latest" projection read from the public + projection stores were also input to the new projection + calculation function, then many obviously bad projections can be + avoided without needing rounds of Humming Consensus to + demonstrate that a bad projection is bad. -If you would like to run the Humming Consensus code (with or without -the network partition simulator) as described in the RICON 2015 -presentation, please see the -[Humming Consensus demo doc.](./doc/humming_consensus_demo.md). +* FLU/data server process + * All known correctness bugs have been fixed. + * Performance has not yet been measured. Performance measurement + and enhancements are scheduled to start in the middle of March 2016. + (This will include a much-needed update to the `basho_bench` driver.) -If you'd like to work on a protocol such as Thrift, UBF, -msgpack over UDP, or some other protocol, let us know by -[opening an issue to discuss it](./issues/new). +* Access protocols and client libraries + * The protocol used by both external clients and internally (instead + of using Erlang's native message passing mechanisms) is based on + Protocol Buffers. + * (Machi PB protocol specification: ./src/machi.proto)[./src/machi.proto] + * At the moment, the PB specification contains two protocols. + Sometime in the near future, the spec will be split to separate + the external client API (the "high" protocol) from the internal + communication API (the "low" protocol). + +* Recent conference talks about Machi + * Erlang Factory San Francisco 2016 + (the slides and video recording)[http://www.erlang-factory.com/sfbay2016/scott-lystig-fritchie] + will be available a few weeks after the conference ends on March + 11, 2016. + * Ricon 2015 + * (The slides)[http://ricon.io/archive/2015/slides/Scott_Fritchie_Ricon_2015.pdf] + * and the (video recording)[https://www.youtube.com/watch?v=yR5kHL1bu1Q&index=13&list=PL9Jh2HsAWHxIc7Tt2M6xez_TOP21GBH6M] + are now available. + * If you would like to run the Humming Consensus code (with or without + the network partition simulator) as described in the RICON 2015 + presentation, please see the + [Humming Consensus demo doc](./doc/humming_consensus_demo.md). ## 4. Contributing to Machi's development @@ -150,3 +166,9 @@ Machi has a dependency on the supports UNIX/Linux OSes and 64-bit versions of Erlang/OTP only; we apologize to Windows-based and 32-bit-based Erlang developers for this restriction. + +### 4.3 New protocols and features + +If you'd like to work on a protocol such as Thrift, UBF, +msgpack over UDP, or some other protocol, let us know by +[opening an issue to discuss it](./issues/new). diff --git a/doc/humming-consensus-demo.md b/doc/humming-consensus-demo.md index ffed8bb..f92858f 100644 --- a/doc/humming-consensus-demo.md +++ b/doc/humming-consensus-demo.md @@ -220,7 +220,7 @@ To help interpret the output of the test, please skip ahead to the If you don't have `git` and/or the Erlang 17 runtime system available on your OS X, FreeBSD, Linux, or Solaris machine, please take a look -at the [Prerequistes section](#prerequisites) first. When you have +at the [Prerequisites section](#prerequisites) first. When you have installed the prerequisite software, please return back here. ## Clone and compile the code