Machi: a distributed, decentralized blob/large file store using chain replication and "Humming Consensus".
Find a file
UENISHI Kota e882f774ef Unify LevelDB usage to single instance
* Perfile LevelDB instance usage are changed to use single instance
  per FLU server.
* machi_csum_file reference is managed with machi_flu_filename_mgr
  as an aim to manage filenames with leveldb
* Not only chunk checksums, but the list of trimmed files are also
  stored in LevelDB.
* Remove 1024 bytes file header; instead put any metadata into
  LevelDB if needed.
* LevelDB `db_ref()` lifecycle is same as that of `machi_metadata_mgr`
* `machi_file_proxy` just uses it as it's passed at process startup
* There are several optimization space still left as it is

WIP
2016-02-09 13:36:25 +09:00
doc Merge slf/flu-config-rcd-style 2015-12-18 15:41:02 +09:00
ebin Single server client & server code (squashed) 2015-04-01 16:14:24 +09:00
include Unify LevelDB usage to single instance 2016-02-09 13:36:25 +09:00
priv Add 'quick admin' config management tool/hack 2015-12-16 16:41:11 +09:00
prototype Update on the status of prototype/chain-manager code: now moved to TOP/src on 2015-04-18 01:42:47 +09:00
rel Add 'quick admin' config management: better file handling 2015-12-16 19:05:25 +09:00
src Unify LevelDB usage to single instance 2016-02-09 13:36:25 +09:00
test Unify LevelDB usage to single instance 2016-02-09 13:36:25 +09:00
.gitignore Ignore RUNLOG* 2015-12-18 13:43:18 +09:00
.travis.yml Do not use 18.x for TravisCI testing 2015-12-18 17:40:16 +09:00
CONTRIBUTING.md The FAQ grows 2015-06-22 00:09:35 +09:00
dialyzer.ignore-warnings Dialyzer warning cleanup 2015-12-18 17:48:33 +09:00
FAQ.md Part 3 of X 2015-12-15 16:20:57 +09:00
INSTALLATION.md Add a bit more to INSTALLATION.md 2015-05-21 15:58:00 +09:00
LICENSE Add APL v2 LICENSE file 2015-03-02 17:12:39 +09:00
Makefile EDoc fixes 2015-12-08 22:05:11 +09:00
NOTICE Add NOTICE 2015-03-02 21:06:31 +09:00
README.md Part 3 of X 2015-12-15 16:20:57 +09:00
rebar Add test/machi_pb_test.erl, finish PB refactoring 2015-06-19 13:00:28 +09:00
rebar.config Introduce ranch and add transport callback 2015-12-09 09:58:33 +09:00
rebar.config.script Hrm, fewer deadlocks, but sometimes unreliable shutdown 2015-07-16 17:59:02 +09:00
TODO-shortterm.org Update TODO-shortterm.org for completion of fitness work 2015-09-22 16:44:49 +09:00
tools.mk Ubuntu /bin/sh is dash then something wrong happens sometimes 2015-11-06 12:35:02 +09:00

Machi: a robust & reliable, distributed, highly available, large file store

Travis-CI :: Travis-CI

Outline

  1. Why another file store?
  2. Where to learn more about Machi
  3. Development status summary
  4. Contributing to Machi's development
## 1. Why another file store?

Our goal is a robust & reliable, distributed, highly available, large file store. Such stores already exist, both in the open source world and in the commercial world. Why reinvent the wheel? We believe there are three reasons, ordered by decreasing rarity.

  1. We want end-to-end checksums for all file data, from the initial file writer to every file reader, anywhere, all the time.
  2. We need flexibility to trade consistency for availability: e.g. weak consistency in exchange for being available in cases of partial system failure.
  3. We want to manage file replicas in a way that's provably correct and also easy to test.

Of all the file stores in the open source & commercial worlds, only criteria #3 is a viable option. Or so we hope. Or we just don't care, and if data gets lost or corrupted, then ... so be it.

If we have app use cases where availability is more important than consistency, then systems that meet criteria #2 are also rare. Most file stores provide only strong consistency and therefore have unavoidable, unavailable behavior when parts of the system fail. What if we want a file store that is always available to write new file data and attempts best-effort file reads?

If we really do care about data loss and/or data corruption, then we really want both #3 and #1. Unfortunately, systems that meet criteria #1 are very rare. Why? This is 2015. We have decades of research that shows that computer hardware can (and indeed does) corrupt data at nearly every level of the modern client/server application stack. Systems with end-to-end data corruption detection should be ubiquitous today. Alas, they are not. Machi is an effort to change the deplorable state of the world, one Erlang function at a time.

## 2. Where to learn more about Machi

The two major design documents for Machi are now mostly stable. Please see the doc directory's README for details.

We also have a Frequently Asked Questions (FAQ) list.

Scott recently (November 2015) gave a presentation at the RICON 2015 conference about one of the techniques used by Machi; "Managing Chain Replication Metadata with Humming Consensus" is available online now.

## 3. Development status summary

Mid-December 2015: work is underway.

  • In progress:
    • Code refactoring: metadata management using ELevelDB
    • File repair using file-centric, Merkle-style hash tree.
    • Server-side socket handling is now performed by ranch
    • QuickCheck tests for file repair correctness
      • 2015-12-15: The EUnit test machi_ap_repair_eqc is currently failing occasionally because it (correctly) detects double-write errors. Double-write errors will be eliminated when the ELevelDB integration work is complete.
    • The make stage and make release commands can be used to create a primitive "package". Use ./rel/machi/bin/machi console to start the Machi app in interactive mode. Substitute the word start instead of console to start Machi in background/daemon mode. The ./rel/machi/bin/machi command without any arguments will give a short usage summary.
    • Chain Replication management using the Humming Consensus algorithm to manage chain state is stable.
      • ... with the caveat that it runs very well in a very harsh and unforgiving network partition simulator but has not run much yet in the real world.
    • All Machi client/server protocols are based on Protocol Buffers.
      • The current specification for Machi's protocols can be found at https://github.com/basho/machi/blob/master/src/machi.proto.
      • The Machi PB protocol is not yet stable. Expect change!
      • The Erlang language client implementation of the high-level protocol flavor is brittle (e.g., little error handling yet).

If you would like to run the network partition simulator mentioned in the Ricon 2015 presentation about Humming Consensus, please see the partition simulator convergence test doc.

If you'd like to work on a protocol such as Thrift, UBF, msgpack over UDP, or some other protocol, let us know by opening an issue to discuss it.

## 4. Contributing to Machi's development

4.1 License

Basho Technologies, Inc. as committed to licensing all work for Machi under the Apache Public License version 2. All authors of source code and documentation who agree with these licensing terms are welcome to contribute their ideas in any form: suggested design or features, documentation, and source code.

Machi is still a very young project within Basho, with a small team of developers; please bear with us as we grow out of "toddler" stage into a more mature open source software project. We invite all contributors to review the CONTRIBUTING.md document for guidelines for working with the Basho development team.

4.2 Development environment requirements

All development to date has been done with Erlang/OTP version 17 on OS X. The only known limitations for using R16 are minor type specification difference between R16 and 17, but we strongly suggest continuing development using version 17.

We also assume that you have the standard UNIX/Linux developers tool chain for C and C++ applications. Specifically, we assume make is available. The utility used to compile the Machi source code, rebar, is pre-compiled and included in the repo.

Machi has a dependency on the ELevelDB library. ELevelDB only supports UNIX/Linux OSes and 64-bit versions of Erlang/OTP only; we apologize to Windows-based and 32-bit-based Erlang developers for this restriction.