From e4a784d3dd5e1c9120ec6ea009d8bce64ea1a52d Mon Sep 17 00:00:00 2001 From: Scott Lystig Fritchie Date: Mon, 14 Dec 2015 19:14:56 +0900 Subject: [PATCH 1/5] Part 1 of X --- README.md | 69 ++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 46 insertions(+), 23 deletions(-) diff --git a/README.md b/README.md index d802080..80850ba 100644 --- a/README.md +++ b/README.md @@ -1,32 +1,46 @@ -# Machi +# Machi: a robust & reliable, distributed, highly available, large file store [Travis-CI](http://travis-ci.org/basho/machi) :: ![Travis-CI](https://secure.travis-ci.org/basho/machi.png) -Our goal is a robust & reliable, distributed, highly available(*), -large file store based upon write-once registers, append-only files, -Chain Replication, and client-server style architecture. All members -of the cluster store all of the files. Distributed load -balancing/sharding of files is __outside__ of the scope of this -system. However, it is a high priority that this system be able to -integrate easily into systems that do provide distributed load -balancing, e.g., Riak Core. Although strong consistency is a major -feature of Chain Replication, first use cases will focus mainly on -eventual consistency features --- strong consistency design will be -discussed in a separate design document (read more below). +Our goal is a robust & reliable, distributed, highly available, large +file store. Such stores already exist, both in the open source world +and in the commercial world. Why reinvent the wheel? We believe +there are three reasons, ordered by decreasing rarity. -The ability for Machi to maintain strong consistency will make it -attractive as a toolkit for building things like CORFU and Tango as -well as better-known open source software such as Kafka's file -replication. (See the bibliography of the [Machi high level design -doc](./doc/high-level-machi.pdf) for further references.) +1. We want end-to-end checksums for all file data, from the initial + file writer to every file reader, anywhere, all the time. +2. We need flexibility to trade consistency for availability: + e.g. weak consistency in exchange for being available in cases + of partial system failure. +3. We want to manage file replicas in a way that's provably correct + and also easy to test. - (*) When operating in strong consistency mode (supporting - sequential or linearizable semantics), the availability of the - system is restricted to quorum majority availability. When in - eventual consistency mode, service can be provided by any - available server. +Of all the file stores in the open source & commercial worlds, only +criteria #3 is a viable option. Or so we hope. Or we just don't +care, and if data gets lost or corrupted, then ... so be it. -## Status: mid-October 2015: work is underway +If we have app use cases where availability is more important than +consistency, then systems that meet criteria #2 are also rare. +Most file stores provide only strong consistency and therefore +have unavoidable, unavailable behavior when parts of the system +fail. +What if we want a file store that is always available to write new +file data and attempts best-effort file reads? + +If we really do care about data loss and/or data corruption, then we +really want both #3 and #1. Unfortunately, systems that meet criteria +#1 are *very* +rare. Why? This is 2015. We have decades of research that shows +that computer hardware can (and +indeed does) corrupt data at nearly every level of the modern +client/server application stack. Systems with end-to-end data +corruption detection should be ubiquitous today. Alas, they are not. +Machi is an effort to change the deplorable state of the world, one +Erlang function at a time. + +## Status: mid-December 2015: work is underway + +TODO: status update here. * The chain manager is ready for both eventual consistency use ("available mode") and strong consistency use ("consistent mode"). Both modes use a new @@ -53,9 +67,18 @@ If you'd like to work on a protocol such as Thrift, UBF, msgpack over UDP, or some other protocol, let us know by [opening an issue to discuss it](./issues/new). +## Where to learn more about Machi + The two major design documents for Machi are now mostly stable. Please see the [doc](./doc) directory's [README](./doc) for details. +Scott recently (November 2015) gave a presentation at the +[RICON 2015 conference](http://ricon.io) about one of the techniques +used by Machi; "Managing Chain Replication Metadata with +Humming Consensus" is available online now. +* [slides (PDF format)](http://ricon.io/speakers/slides/Scott_Fritchie_Ricon_2015.pdf) +* [video](https://www.youtube.com/watch?v=yR5kHL1bu1Q) + ## Contributing to Machi: source code, documentation, etc. Basho Technologies, Inc. as committed to licensing all work for Machi -- 2.45.2 From d196dbcee5c1e24c1929add648814dd4c76990a3 Mon Sep 17 00:00:00 2001 From: Scott Lystig Fritchie Date: Mon, 14 Dec 2015 19:21:20 +0900 Subject: [PATCH 2/5] MD fixup --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 80850ba..6a8a64d 100644 --- a/README.md +++ b/README.md @@ -28,8 +28,8 @@ What if we want a file store that is always available to write new file data and attempts best-effort file reads? If we really do care about data loss and/or data corruption, then we -really want both #3 and #1. Unfortunately, systems that meet criteria -#1 are *very* +really want both #3 and #1. Unfortunately, systems that meet +criteria #1 are *very* rare. Why? This is 2015. We have decades of research that shows that computer hardware can (and indeed does) corrupt data at nearly every level of the modern -- 2.45.2 From e5f7f3ba9ad62d36f3fa566122b1b0930b7c9b88 Mon Sep 17 00:00:00 2001 From: Scott Lystig Fritchie Date: Tue, 15 Dec 2015 15:07:18 +0900 Subject: [PATCH 3/5] Remove doc/overview.edoc --- doc/overview.edoc | 170 ---------------------------------------------- 1 file changed, 170 deletions(-) delete mode 100644 doc/overview.edoc diff --git a/doc/overview.edoc b/doc/overview.edoc deleted file mode 100644 index 6182f6b..0000000 --- a/doc/overview.edoc +++ /dev/null @@ -1,170 +0,0 @@ - -@title Machi: a small village of replicated files - -@doc - -== About This EDoc Documentation == - -This EDoc-style documentation will concern itself only with Erlang -function APIs and function & data types. Higher-level design and -commentary will remain outside of the Erlang EDoc system; please see -the "Pointers to Other Machi Documentation" section below for more -details. - -Readers should beware that this documentation may be out-of-sync with -the source code. When in doubt, use the `make edoc' command to -regenerate all HTML pages. - -It is the developer's responsibility to re-generate the documentation -periodically and commit it to the Git repo. - -== Machi Code Overview == - -=== Chain Manager === - -The Chain Manager is responsible for managing the state of Machi's -"Chain Replication" state. This role is roughly analogous to the -"Riak Core" application inside of Riak, which takes care of -coordinating replica placement and replica repair. - -For each primitive data server in the cluster, a Machi FLU, there is a -Chain Manager process that manages its FLU's role within the Machi -cluster's Chain Replication scheme. Each Chain Manager process -executes locally and independently to manage the distributed state of -a single Machi Chain Replication chain. - -
    - -
  • To contrast with Riak Core ... Riak Core's claimant process is - solely responsible for managing certain critical aspects of - Riak Core distributed state. Machi's Chain Manager process - performs similar tasks as Riak Core's claimant. However, Machi - has several active Chain Manager processes, one per FLU server, - instead of a single active process like Core's claimant. Each - Chain Manager process acts independently; each is constrained - so that it will reach consensus via independent computation - & action. - - Full discussion of this distributed consensus is outside the - scope of this document; see the "Pointers to Other Machi - Documentation" section below for more information. -
  • -
  • Machi differs from a Riak Core application because Machi's - replica placement policy is simply, "All Machi servers store - replicas of all Machi files". - Machi is intended to be a primitive building block for creating larger - cluster-of-clusters where files are - distributed/fragmented/sharded across a large pool of - independent Machi clusters. -
  • -
  • See - [https://www.usenix.org/legacy/events/osdi04/tech/renesse.html] - for a copy of the paper, "Chain Replication for Supporting High - Throughput and Availability" by Robbert van Renesse and Fred - B. Schneider. -
  • -
- -=== FLU === - -The FLU is the basic storage server for Machi. - -
    -
  • The name FLU is taken from "flash storage unit" from the paper - "CORFU: A Shared Log Design for Flash Clusters" by - Balakrishnan, Malkhi, Prabhakaran, and Wobber. See - [https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/balakrishnan] -
  • -
  • In CORFU, the sequencer step is a prerequisite step that is - performed by a separate component, the Sequencer. - In Machi, the `append_chunk()' protocol message has - an implicit "sequencer" operation applied by the "head" of the - Machi Chain Replication chain. If a client wishes to write - data that has already been assigned a sequencer position, then - the `write_chunk()' API function is used. -
  • -
- -For each FLU, there are three independent tasks that are implemented -using three different Erlang processes: - -
    -
  • A FLU server, implemented primarily by `machi_flu.erl'. -
  • -
  • A projection store server, implemented primarily by - `machi_projection_store.erl'. -
  • -
  • A chain state manager server, implemented primarily by - `machi_chain_manager1.erl'. -
  • -
- -From the perspective of failure detection, it is very convenient that -all three FLU-related services (file server, sequencer server, and -projection server) are accessed using the same single TCP port. - -=== Projection (data structure) === - -The projection is a data structure that specifies the current state -of the Machi cluster: all FLUs, which FLUS are considered -up/running or down/crashed/stopped, which FLUs are actively -participants in the Chain Replication protocol, and which FLUs are -under "repair" (i.e., having their data resyncronized when -newly-added to a cluster or when restarting after a crash). - -=== Projection Store (server) === - -The projection store is a storage service that is implemented by an -Erlang/OTP `gen_server' process that is associated with each -FLU. Conceptually, the projection store is an array of -write-once registers. For each projection store register, the -key is a 2-tuple of an epoch number (`non_neg_integer()' type) -and a projection type (`public' or `private' type); the value is -a projection data structure (`projection_v1()' type). - -=== Client and Proxy Client === - -Machi is intentionally avoiding using distributed Erlang for Machi's -communication. This design decision makes Erlang-side code more -difficult & complex but allows us the freedom of implementing -parts of Machi in other languages without major -protocol&API&glue code changes later in the product's -lifetime. - -There are two layers of interface for Machi clients. - -
    -
  • The `machi_flu1_client' module implements an API that uses a - TCP socket directly. -
  • -
  • The `machi_proxy_flu1_client' module implements an API that - uses a local, long-lived `gen_server' process as a proxy for - the remote, perhaps disconnected-or-crashed Machi FLU server. -
  • -
- -The types for both modules ought to be the same. However, due to -rapid code churn, some differences might exist. Any major difference -is (almost by definition) a bug: please open a GitHub issue to request -a correction. - -== TODO notes == - -Any use of the string "TODO" in upper/lower/mixed case, anywhere in -the code, is a reminder signal of unfinished work. - -== Pointers to Other Machi Documentation == - -
    -
  • If you are viewing this document locally, please look in the - `../doc/' directory, -
  • -
  • If you are viewing this document via the Web, please find the - documentation via this link: - [http://github.com/basho/machi/tree/master/doc/] - Please be aware that this link points to the `master' branch - of the Machi source repository and therefore may be - out-of-sync with non-`master' branch code. -
  • - -
-- 2.45.2 From ec56164bd167084ad2eaa784c27ae4c78cbe1c07 Mon Sep 17 00:00:00 2001 From: Scott Lystig Fritchie Date: Tue, 15 Dec 2015 15:52:29 +0900 Subject: [PATCH 4/5] Part 2 of X --- README.md | 119 ++++++++++++++++++++++---------------------------- doc/README.md | 37 ++++++++-------- 2 files changed, 72 insertions(+), 84 deletions(-) diff --git a/README.md b/README.md index 6a8a64d..f589f32 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,16 @@ [Travis-CI](http://travis-ci.org/basho/machi) :: ![Travis-CI](https://secure.travis-ci.org/basho/machi.png) +Outline + +1. [Why another file store?](#sec1) +2. [Development status summary](#sec2) +3. [Where to learn more about Machi](#sec3) +4. [Contributing to Machi's development](#sec4) + + +## 1. Why another file store? + Our goal is a robust & reliable, distributed, highly available, large file store. Such stores already exist, both in the open source world and in the commercial world. Why reinvent the wheel? We believe @@ -38,40 +48,51 @@ corruption detection should be ubiquitous today. Alas, they are not. Machi is an effort to change the deplorable state of the world, one Erlang function at a time. -## Status: mid-December 2015: work is underway + +## 2. Development status summary -TODO: status update here. +Mid-December 2015: work is underway. -* The chain manager is ready for both eventual consistency use ("available - mode") and strong consistency use ("consistent mode"). Both modes use a new - consensus technique, Humming Consensus. - * Scott will be - [speaking about Humming Consensus](http://ricon.io/agenda/#managing-chain-replication-metadata-with-humming-consensus) - at the [Ricon 2015 conference] (http://ricon.io) in San Francisco, - CA, USA on Thursday, November 5th, 2015. - * If you would like to run the network partition simulator - mentioned in that Ricon presentation, please see the - [partition simulator convergence test doc.](./doc/machi_chain_manager1_converge_demo.md) - * Implementation of the file repair process for strong consistency - is still in progress. +* In progress: + * Code refactoring: metadata management using + [ELevelDB](https://github.com/basho/eleveldb) + * File repair using file-centric, Merkle-style hash tree. + * QuickCheck tests for file repair correctness + * 2015-12-15: The EUnit test `machi_ap_repair_eqc` is + currently failing occasionally because it (correctly) detects + double-write errors. Double-write errors will be eliminated + when the ELevelDB integration work is complete. + * Chain Replication management using the Humming Consensus + algorithm to manage chain state is stable. + * ... with the caveat that it runs very well in a very harsh + and unforgiving network partition simulator but has not run + much yet in the real world. + * All Machi client/server protocols are based on + [Protocol Buffers](https://developers.google.com/protocol-buffers/docs/overview). + * The current specification for Machi's protocols can be found at + [https://github.com/basho/machi/blob/master/src/machi.proto](https://github.com/basho/machi/blob/master/src/machi.proto). + * The Machi PB protocol is not yet stable. Expect change! + * The Erlang language client implementation of the high-level + protocol flavor is brittle (e.g., little error handling yet). -* All Machi client/server protocols are based on - [Protocol Buffers](https://developers.google.com/protocol-buffers/docs/overview). - * The current specification for Machi's protocols can be found at - [https://github.com/basho/machi/blob/master/src/machi.proto](https://github.com/basho/machi/blob/master/src/machi.proto). - * The Machi PB protocol is not yet stable. Expect change! - * The Erlang language client implementation of the high-level - protocol flavor is brittle (e.g., little error handling yet). +If you would like to run the network partition simulator +mentioned in the Ricon 2015 presentation about Humming Consensus, +please see the +[partition simulator convergence test doc.](./doc/machi_chain_manager1_converge_demo.md) If you'd like to work on a protocol such as Thrift, UBF, msgpack over UDP, or some other protocol, let us know by [opening an issue to discuss it](./issues/new). -## Where to learn more about Machi + +## 3. Where to learn more about Machi The two major design documents for Machi are now mostly stable. Please see the [doc](./doc) directory's [README](./doc) for details. +We also have a +[Frequently Asked Questions (FAQ) list](./FAQ.md). + Scott recently (November 2015) gave a presentation at the [RICON 2015 conference](http://ricon.io) about one of the techniques used by Machi; "Managing Chain Replication Metadata with @@ -79,7 +100,10 @@ Humming Consensus" is available online now. * [slides (PDF format)](http://ricon.io/speakers/slides/Scott_Fritchie_Ricon_2015.pdf) * [video](https://www.youtube.com/watch?v=yR5kHL1bu1Q) -## Contributing to Machi: source code, documentation, etc. + +## 4. Contributing to Machi's development + +### 4.1 License Basho Technologies, Inc. as committed to licensing all work for Machi under the @@ -95,26 +119,7 @@ We invite all contributors to review the [CONTRIBUTING.md](./CONTRIBUTING.md) document for guidelines for working with the Basho development team. -## A brief survey of this directories in this repository - -* A list of Frequently Asked Questions, a.k.a. - [the Machi FAQ](./FAQ.md). - -* The [doc](./doc/) directory: home for major documents about Machi: - high level design documents as well as exploration of features still - under design & review within Basho. - -* The `ebin` directory: used for compiled application code - -* The `include`, `src`, and `test` directories: contain the header - files, source files, and test code for Machi, respectively. - -* The [prototype](./prototype/) directory: contains proof of concept - code, scaffolding libraries, and other exploratory code. Curious - readers should see the [prototype/README.md](./prototype/README.md) - file for more explanation of the small sub-projects found here. - -## Development environment requirements +### 4.2 Development environment requirements All development to date has been done with Erlang/OTP version 17 on OS X. The only known limitations for using R16 are minor type @@ -126,26 +131,8 @@ tool chain for C and C++ applications. Specifically, we assume `make` is available. The utility used to compile the Machi source code, `rebar`, is pre-compiled and included in the repo. -There are no known OS limits at this time: any platform that supports -Erlang/OTP should be sufficient for Machi. This may change over time -(e.g., adding NIFs which can make full portability to Windows OTP -environments difficult), but it hasn't happened yet. - -## Contributions - -Basho encourages contributions to Riak from the community. Here’s how -to get started. - -* Fork the appropriate sub-projects that are affected by your change. -* Create a topic branch for your change and checkout that branch. - git checkout -b some-topic-branch -* Make your changes and run the test suite if one is provided. (see below) -* Commit your changes and push them to your fork. -* Open pull-requests for the appropriate projects. -* Contributors will review your pull request, suggest changes, and merge it when it’s ready and/or offer feedback. -* To report a bug or issue, please open a new issue against this repository. - --The Machi team at Basho, -[Scott Lystig Fritchie](mailto:scott@basho.com), technical lead, and -[Matt Brender](mailto:mbrender@basho.com), your developer advocate. - +Machi has a dependency on the +[ELevelDB](https://github.com/basho/eleveldb) library. ELevelDB only +supports UNIX/Linux OSes and 64-bit versions of Erlang/OTP only; we +apologize to Windows-based and 32-bit-based Erlang developers for this +restriction. diff --git a/doc/README.md b/doc/README.md index 181278b..3ad424c 100644 --- a/doc/README.md +++ b/doc/README.md @@ -6,20 +6,6 @@ Erlang documentation, please use this link: ## Documents in this directory -### chain-self-management-sketch.org - -[chain-self-management-sketch.org](chain-self-management-sketch.org) -is a mostly-deprecated draft of -an introduction to the -self-management algorithm proposed for Machi. Most material has been -moved to the [high-level-chain-mgr.pdf](high-level-chain-mgr.pdf) document. - -### cluster-of-clusters (directory) - -This directory contains the sketch of the "cluster of clusters" design -strawman for partitioning/distributing/sharding files across a large -number of independent Machi clusters. - ### high-level-machi.pdf [high-level-machi.pdf](high-level-machi.pdf) @@ -50,9 +36,9 @@ introduction to the Humming Consensus algorithm. Its abstract: > of file updates to all replica servers in a Machi cluster. Chain > Replication is a variation of primary/backup replication where the > order of updates between the primary server and each of the backup -> servers is strictly ordered into a single ``chain''. Management of -> Chain Replication's metadata, e.g., ``What is the current order of -> servers in the chain?'', remains an open research problem. The +> servers is strictly ordered into a single "chain". Management of +> Chain Replication's metadata, e.g., "What is the current order of +> servers in the chain?", remains an open research problem. The > current state of the art for Chain Replication metadata management > relies on an external oracle (e.g., ZooKeeper) or the Elastic > Replication algorithm. @@ -60,7 +46,7 @@ introduction to the Humming Consensus algorithm. Its abstract: > This document describes the Machi chain manager, the component > responsible for managing Chain Replication metadata state. The chain > manager uses a new technique, based on a variation of CORFU, called -> ``humming consensus''. +> "humming consensus". > Humming consensus does not require active participation by all or even > a majority of participants to make decisions. Machi's chain manager > bases its logic on humming consensus to make decisions about how to @@ -71,3 +57,18 @@ introduction to the Humming Consensus algorithm. Its abstract: > decision during that epoch. When a differing decision is discovered, > new time epochs are proposed in which a new consensus is reached and > disseminated to all available participants. + +### chain-self-management-sketch.org + +[chain-self-management-sketch.org](chain-self-management-sketch.org) +is a mostly-deprecated draft of +an introduction to the +self-management algorithm proposed for Machi. Most material has been +moved to the [high-level-chain-mgr.pdf](high-level-chain-mgr.pdf) document. + +### cluster-of-clusters (directory) + +This directory contains the sketch of the "cluster of clusters" design +strawman for partitioning/distributing/sharding files across a large +number of independent Machi clusters. + -- 2.45.2 From 9bd885ccb42a6af58186a05e982a2ab45b3755f8 Mon Sep 17 00:00:00 2001 From: Scott Lystig Fritchie Date: Tue, 15 Dec 2015 16:20:57 +0900 Subject: [PATCH 5/5] Part 3 of X --- FAQ.md | 63 ++++++++++++++++++++++++++++--------------------------- README.md | 50 ++++++++++++++++++++++++------------------- 2 files changed, 61 insertions(+), 52 deletions(-) diff --git a/FAQ.md b/FAQ.md index 5885bc5..f2e37c1 100644 --- a/FAQ.md +++ b/FAQ.md @@ -13,8 +13,8 @@ + [1.1 What is Machi?](#n1.1) + [1.2 What is a Machi "cluster of clusters"?](#n1.2) + [1.2.1 This "cluster of clusters" idea needs a better name, don't you agree?](#n1.2.1) - + [1.3 What is Machi like when operating in "eventually consistent"/"AP mode"?](#n1.3) - + [1.4 What is Machi like when operating in "strongly consistent"/"CP mode"?](#n1.4) + + [1.3 What is Machi like when operating in "eventually consistent" mode?](#n1.3) + + [1.4 What is Machi like when operating in "strongly consistent" mode?](#n1.4) + [1.5 What does Machi's API look like?](#n1.5) + [1.6 What licensing terms are used by Machi?](#n1.6) + [1.7 Where can I find the Machi source code and documentation? Can I contribute?](#n1.7) @@ -120,7 +120,7 @@ For proof that naming things is hard, see [http://martinfowler.com/bliki/TwoHardThings.html](http://martinfowler.com/bliki/TwoHardThings.html) -### 1.3. What is Machi like when operating in "eventually consistent"/"AP mode"? +### 1.3. What is Machi like when operating in "eventually consistent" mode? Machi's operating mode dictates how a Machi cluster will react to network partitions. A network partition may be caused by: @@ -130,17 +130,15 @@ network partitions. A network partition may be caused by: * An extreme server software "hang" or "pause", e.g. caused by OS scheduling problems such as a failing/stuttering disk device. -"AP mode" refers to the "A" and "P" properties of the "CAP -conjecture", meaning that the cluster will be "Available" and -"Partition tolerant". - -The consistency semantics of file operations while in "AP mode" are -eventually consistent during and after network partitions: +The consistency semantics of file operations while in eventual +consistency mode during and after network partitions are: * File write operations are permitted by any client on the "same side" of the network partition. * File read operations are successful for any file contents where the client & server are on the "same side" of the network partition. + * File read operations will probably fail for any file contents where the + client & server are on "different sides" of the network partition. * After the network partition(s) is resolved, files are merged together from "all sides" of the partition(s). * Unique files are copied in their entirety. @@ -151,16 +149,10 @@ eventually consistent during and after network partitions: to rules which guarantee safe mergeability.). -### 1.4. What is Machi like when operating in "strongly consistent"/"CP mode"? +### 1.4. What is Machi like when operating in "strongly consistent" mode? -Machi's operating mode dictates how a Machi cluster will react to -network partitions. -"CP mode" refers to the "C" and "P" properties of the "CAP -conjecture", meaning that the cluster will be "Consistent" and -"Partition tolerant". - -The consistency semantics of file operations while in "CP mode" are -strongly consistent during and after network partitions: +The consistency semantics of file operations while in strongly +consistency mode during and after network partitions are: * File write operations are permitted by any client on the "same side" of the network partition if and only if a quorum majority of Machi servers @@ -205,7 +197,12 @@ Internally, there is a more complex protocol used by individual cluster members to manage file contents and to repair damaged/missing files. See Figure 3 in [Machi high level design doc](https://github.com/basho/machi/tree/master/doc/high-level-machi.pdf) -for more details. +for more description. + +The definitions of both the "high level" external protocol and "low +level" internal protocol are in a +[Protocol Buffers](https://developers.google.com/protocol-buffers/docs/overview) +definition at [./src/machi.proto](./src/machi.proto). ### 1.6. What licensing terms are used by Machi? @@ -232,8 +229,8 @@ guidelines at ### 1.8. What is Machi's expected release schedule, packaging, and operating system/OS distribution support? -Basho expects that Machi's first release will take place near the end -of calendar year 2015. +Basho expects that Machi's first major product release will take place +during the 2nd quarter of 2016. Basho's official support for operating systems (e.g. Linux, FreeBSD), operating system packaging (e.g. CentOS rpm/yum package management, @@ -537,6 +534,10 @@ change several times during any single test case) and a random series of cluster operations, an event trace of all cluster activity is used to verify that no safety-critical rules have been violated. +All test code is available in the [./test](./test) subdirectory. +Modules that use QuickCheck will use a file suffix of `_eqc`, for +example, [./test/machi_ap_repair_eqc.erl](./test/machi_ap_repair_eqc.erl). + ### 3.5. Does Machi require shared disk storage? e.g. iSCSI, NBD (Network Block Device), Fibre Channel disks @@ -567,7 +568,10 @@ deploy multiple Machi servers per machine: one Machi server per disk. ### 3.7. What language(s) is Machi written in? -So far, Machi is written in 100% Erlang. +So far, Machi is written in 100% Erlang. Machi uses at least one +library, [ELevelDB](https://github.com/basho/eleveldb), that is +implemented both in C++ and in Erlang, using Erlang NIFs (Native +Interface Functions) to allow Erlang code to call C++ functions. In the event that we encounter a performance problem that cannot be solved within the Erlang/OTP runtime environment, all of Machi's @@ -588,19 +592,16 @@ bit-twiddling magicSPEED ... without also having to find a replacement for disterl. (Or without having to re-invent disterl's features in another language.) - -In the first drafts of the Machi code, the inter-node communication -uses a hand-crafted, artisanal, mostly ASCII protocol as part of a -"demo day" quick & dirty prototype. Work is underway (summer of 2015) -to replace that protocol gradually with a well-structured, -well-documented protocol based on Protocol Buffers data serialization. +All wire protocols used by Machi are defined & implemented using +[Protocol Buffers](https://developers.google.com/protocol-buffers/docs/overview). +The definition file can be found at [./src/machi.proto](./src/machi.proto). ### 3.9. Can I use HTTP to write/read stuff into/from Machi? -Yes, sort of. For as long as the legacy of -Machi's first internal protocol & code still -survives, it's possible to use a +Short answer: No, not yet. + +Longer answer: No, but it was possible as a hack, many months ago, see [primitive/hack'y HTTP interface that is described in this source code commit log](https://github.com/basho/machi/commit/6cebf397232cba8e63c5c9a0a8c02ba391b20fef). Please note that commit `6cebf397232cba8e63c5c9a0a8c02ba391b20fef` is required to try using this feature: the code has since bit-rotted and diff --git a/README.md b/README.md index f589f32..28f77d2 100644 --- a/README.md +++ b/README.md @@ -5,8 +5,8 @@ Outline 1. [Why another file store?](#sec1) -2. [Development status summary](#sec2) -3. [Where to learn more about Machi](#sec3) +2. [Where to learn more about Machi](#sec2) +3. [Development status summary](#sec3) 4. [Contributing to Machi's development](#sec4) @@ -39,8 +39,8 @@ file data and attempts best-effort file reads? If we really do care about data loss and/or data corruption, then we really want both #3 and #1. Unfortunately, systems that meet -criteria #1 are *very* -rare. Why? This is 2015. We have decades of research that shows +criteria #1 are _very rare_. +Why? This is 2015. We have decades of research that shows that computer hardware can (and indeed does) corrupt data at nearly every level of the modern client/server application stack. Systems with end-to-end data @@ -49,7 +49,23 @@ Machi is an effort to change the deplorable state of the world, one Erlang function at a time. -## 2. Development status summary +## 2. Where to learn more about Machi + +The two major design documents for Machi are now mostly stable. +Please see the [doc](./doc) directory's [README](./doc) for details. + +We also have a +[Frequently Asked Questions (FAQ) list](./FAQ.md). + +Scott recently (November 2015) gave a presentation at the +[RICON 2015 conference](http://ricon.io) about one of the techniques +used by Machi; "Managing Chain Replication Metadata with +Humming Consensus" is available online now. +* [slides (PDF format)](http://ricon.io/speakers/slides/Scott_Fritchie_Ricon_2015.pdf) +* [video](https://www.youtube.com/watch?v=yR5kHL1bu1Q) + + +## 3. Development status summary Mid-December 2015: work is underway. @@ -57,11 +73,19 @@ Mid-December 2015: work is underway. * Code refactoring: metadata management using [ELevelDB](https://github.com/basho/eleveldb) * File repair using file-centric, Merkle-style hash tree. + * Server-side socket handling is now performed by + [ranch](https://github.com/ninenines/ranch) * QuickCheck tests for file repair correctness * 2015-12-15: The EUnit test `machi_ap_repair_eqc` is currently failing occasionally because it (correctly) detects double-write errors. Double-write errors will be eliminated when the ELevelDB integration work is complete. + * The `make stage` and `make release` commands can be used to + create a primitive "package". Use `./rel/machi/bin/machi console` + to start the Machi app in interactive mode. Substitute the word + `start` instead of console to start Machi in background/daemon + mode. The `./rel/machi/bin/machi` command without any arguments + will give a short usage summary. * Chain Replication management using the Humming Consensus algorithm to manage chain state is stable. * ... with the caveat that it runs very well in a very harsh @@ -84,22 +108,6 @@ If you'd like to work on a protocol such as Thrift, UBF, msgpack over UDP, or some other protocol, let us know by [opening an issue to discuss it](./issues/new). - -## 3. Where to learn more about Machi - -The two major design documents for Machi are now mostly stable. -Please see the [doc](./doc) directory's [README](./doc) for details. - -We also have a -[Frequently Asked Questions (FAQ) list](./FAQ.md). - -Scott recently (November 2015) gave a presentation at the -[RICON 2015 conference](http://ricon.io) about one of the techniques -used by Machi; "Managing Chain Replication Metadata with -Humming Consensus" is available online now. -* [slides (PDF format)](http://ricon.io/speakers/slides/Scott_Fritchie_Ricon_2015.pdf) -* [video](https://www.youtube.com/watch?v=yR5kHL1bu1Q) - ## 4. Contributing to Machi's development -- 2.45.2