diff --git a/edoc/machi_admin_util.html b/edoc/machi_admin_util.html index dca2800..d26d2cf 100644 --- a/edoc/machi_admin_util.html +++ b/edoc/machi_admin_util.html @@ -55,6 +55,6 @@
-

Generated by EDoc, May 20 2015, 11:11:34.

+

Generated by EDoc, May 20 2015, 12:36:35.

diff --git a/edoc/machi_app.html b/edoc/machi_app.html index 6d4c01b..b141911 100644 --- a/edoc/machi_app.html +++ b/edoc/machi_app.html @@ -34,6 +34,6 @@
-

Generated by EDoc, May 20 2015, 11:11:34.

+

Generated by EDoc, May 20 2015, 12:36:35.

diff --git a/edoc/machi_chain_manager1.html b/edoc/machi_chain_manager1.html index 3fb60dd..911dfc7 100644 --- a/edoc/machi_chain_manager1.html +++ b/edoc/machi_chain_manager1.html @@ -174,6 +174,6 @@ distributed state of a single Machi Chain Replication chain.


-

Generated by EDoc, May 20 2015, 11:11:34.

+

Generated by EDoc, May 20 2015, 12:36:35.

diff --git a/edoc/machi_chain_repair.html b/edoc/machi_chain_repair.html index 61d23fb..d9e48e9 100644 --- a/edoc/machi_chain_repair.html +++ b/edoc/machi_chain_repair.html @@ -10,27 +10,69 @@

Module machi_chain_repair

-Erlang API for the Machi FLU TCP protocol version 1, with a -proxy-process style API for hiding messy details such as TCP -connection/disconnection with the remote Machi server. +Perform "chain repair", i.e., resynchronization of Machi file +contents and metadata as servers are (re-)added to the chain. -

Description

Erlang API for the Machi FLU TCP protocol version 1, with a -proxy-process style API for hiding messy details such as TCP -connection/disconnection with the remote Machi server.

+

Description

Perform "chain repair", i.e., resynchronization of Machi file +contents and metadata as servers are (re-)added to the chain.

-

Machi is intentionally avoiding using distributed Erlang for - Machi's communication. This design decision makes Erlang-side code - more difficult & complex, but it's the price to pay for some -language independence. Later in Machi's life cycle, we need to -(re-)implement some components in a non-Erlang/BEAM-based language.

+

The implementation here is a very basic one, and is probably a bit + slower than the original "demo day" implementation at + https://github.com/basho/machi/blob/master/prototype/demo-day-hack/file0_repair_server.escript

- This module implements a "man in the middle" proxy between the - Erlang client and Machi server (which is on the "far side" of a TCP - connection to somewhere). This proxy process will always execute - on the same Erlang node as the Erlang client that uses it. The - proxy is intended to be a stable, long-lived process that survives - TCP communication problems with the remote server. +

It's so easy to bikeshed this into a 1 year programming exercise.

+ +

General TODO note: There are a lot of areas for exploiting parallelism here. + I've set the bikeshed aside for now, but "make repair faster" has a + lot of room for exploiting concurrency, overlapping reads & writes, +etc etc. There are also lots of different trade-offs to make with +regard to RAM use vs. disk use.

+ +

There's no reason why repair can't be done:

+ +
    +
  1. Repair in parallel across multiple repairees ... Optimization. +
  2. +
  3. Repair multiple byte ranges concurrently ... Optimization. +
  4. +
  5. Use bigger chunks than the client originally used to write the file +... Optimization ... but it would be the easiest to implement, e.g. use +constant-sized 4MB chunks. Unfortuntely, it would also destroy +the ability to verify here that the chunk checksums are correct +*and* also propagate the correct checksum metadata to the +destination FLU.

    + + As an additional optimization, add a bit of #2 to start the next + read while the current write is still in progress. +
  6. +
  7. The current method centralizes the "smarts" required to compare + checksum differences ... move some computation to each FLU, then use + a Merkle- or other-compression-style scheme to reduce the amount of + data sent across a network. +
  8. +
+ +

Most/all of this could be executed in parallel on each FLU relative to +its own files. Then, in another TODO option, perhaps build a Merkle tree +or other summary of the local files and send that data structure to the +repair coordinator.

+ +

Also, as another TODO note, repair_both_present() in the +prototype/demo-day code uses an optimization of calculating the MD5 +checksum of the chunk checksum data as it arrives, and if the two MD5s +match, then we consider the two files in sync. If there isn't a match, +then we sort the lines and try another MD5, and if they match, then we're +in sync. In theory, that's lower overhead than the procedure used here.

+ + NOTE that one reason I chose the "directives list" method is to have an + option, later, of choosing to repair a subset of repairee FLUs if there + is a big discrepency between out of sync files: e.g., if FLU x has N + bytes out of sync but FLU y has 50N bytes out of sync, then it's likely + better to repair x only so that x can return to the UPI list quickly. + Also, in the event that all repairees are roughly comparably out of sync, + then the repair network traffic can be minimized by reading each chunk + only once.

Function Index

repair/7
@@ -44,6 +86,6 @@ language independence. Later in Machi's life cycle, we need to
-

Generated by EDoc, May 20 2015, 11:11:34.

+

Generated by EDoc, May 20 2015, 12:36:35.

diff --git a/edoc/machi_chash.html b/edoc/machi_chash.html index 58f523b..1fec12b 100644 --- a/edoc/machi_chash.html +++ b/edoc/machi_chash.html @@ -166,6 +166,6 @@ list.


-

Generated by EDoc, May 20 2015, 11:11:34.

+

Generated by EDoc, May 20 2015, 12:36:35.

diff --git a/edoc/machi_cr_client.html b/edoc/machi_cr_client.html index 32e7384..0b7f72f 100644 --- a/edoc/machi_cr_client.html +++ b/edoc/machi_cr_client.html @@ -18,6 +18,65 @@

Description

Erlang API for the Machi client-implemented Chain Replication (CORFU-style) protocol.

+

See also the docs for machi_flu1_client for additional +details on data types and operation descriptions.

+ +

The API here is much simpler than the machi_flu1_client or + machi_proxy_flu1_client APIs. This module's API is a + proposed simple-but-complete form for clients who are not + interested in being an active participant in a Machi cluster and to + have the responsibility for Machi internals, i.e., client-side + Chain Replication, client-side read repair, client-side tracking of + internal Machi epoch & projection changes, etc.

+ +

This client is implemented as a long-lived Erlang process using + gen_server-style OTP code practice. A naive client can expect + that this process will manage all transient TCP session + disconnections and Machi chain reconfigurations. This client's + efforts are best-effort and can require some time to retry + operations in certain failure cases, i.e., up to several seconds + during a Machi projection & epoch change when a new server is +added to the chain.

+ +

Doc TODO: Once this API stabilizes, add all relevant data type details +to the EDoc here.

+ + +

Missing API features

+ +

So far, there is one missing client API feature that ought to be +added to Machi in the near future: more flexible checksum +management.

+ +

Add a source annotation to all checksums to indicate where the +checksum was calculated. For example,

+ + + +

Client-side checksums would be the "strongest" type of +checksum, meaning that any data corruption (of the original +data and/or of the checksum itself) can be detected after the +client-side calculation. There are too many horror stories on +The Net about IP PDUs that are corrupted but unnoticed due to +weak TCP checksums, buggy hardware, buggy OS drivers, etc. +Checksum versioning is also desirable if/when the current checksum +implementation changes from SHA-1 to something else.

+ + +

Implementation notes

+

The major operation processing is implemented in a state machine-like manner. Before attempting an operation X, there's an initial operation pre-X that takes care of updating the epoch id, @@ -56,7 +115,7 @@ function name.)

list_files/2Fetch the list of all files on the remote FLU. quit/1Quit & close the connection to remote FLU and stop our proxy process. -read_chunk/4 +read_chunk/4Read a chunk of data of size Size from File at Offset. read_chunk/5Read a chunk of data of size Size from File at Offset. start_link/1Start a local, long-lived process that will be our steady & reliable communication proxy with the fickle & flaky @@ -144,7 +203,7 @@ function name.)

read_chunk/4

read_chunk(PidSpec, File, Offset, Size) -> any()

-
+

Read a chunk of data of size Size from File at Offset.

read_chunk/5

@@ -165,6 +224,6 @@ function name.)


-

Generated by EDoc, May 20 2015, 11:11:34.

+

Generated by EDoc, May 20 2015, 12:36:35.

diff --git a/edoc/machi_flu1.html b/edoc/machi_flu1.html index 2c003c4..145d298 100644 --- a/edoc/machi_flu1.html +++ b/edoc/machi_flu1.html @@ -21,7 +21,18 @@ Please see the EDoc "Overview" for details about the FLU as a primitive file server process vs. the larger Machi design of a FLU as a sequencer + file server + chain manager group of processes.

-

For the moment, this module also implements a rudimentary TCP-based +

The FLU is named after the CORFU server "FLU" or "FLash Unit" server.

+ +

Protocol origins

+ +

The protocol implemented here is an artisanal, hand-crafted, silly +thing that was very quick to put together for a "demo day" proof of +concept. It will almost certainly be replaced with something else, +both in terms of wire format and better code separation of +serialization/deserialization vs. network transport management, +etc.

+ +

For the moment, this module implements a rudimentary TCP-based protocol as the sole supported access method to the server, sequencer, and projection store. Conceptually, those three services are independent and ought to have their own protocols. As @@ -30,7 +41,7 @@ compatibility. Furthermore, from the perspective of failure detection, it is very convenient that all three FLU-related services are accessed using the same single TCP port.

-

The FLU is named after the CORFU server "FLU" or "FLash Unit" server.

+

TODO items

TODO There is a major missing feature in this FLU implementation: there is no "write-once" enforcement for any position in a Machi @@ -90,6 +101,6 @@ replication/chain repair.


-

Generated by EDoc, May 20 2015, 11:11:34.

+

Generated by EDoc, May 20 2015, 12:36:35.

diff --git a/edoc/machi_flu1_client.html b/edoc/machi_flu1_client.html index 8bc99af..e2520d9 100644 --- a/edoc/machi_flu1_client.html +++ b/edoc/machi_flu1_client.html @@ -13,7 +13,34 @@ Erlang API for the Machi FLU TCP protocol version 1. -

Description

Erlang API for the Machi FLU TCP protocol version 1. +

Description

Erlang API for the Machi FLU TCP protocol version 1.

+ +

This client API handles low-level PDU serialization/deserialization + and low-level TCP session management, e.g. open, receive, write, + close. The API for higher-level session management and Machi state + management can be found in machi_proxy_flu1_client and + machi_cr_client.

+ +

TODO This EDoc was written first, and the EDoc and also -type and + -spec definitions for machi_proxy_flu1_client and machi_cr_client must be improved.

+ +

Protocol origins

+ +

The protocol implemented here is an artisanal, hand-crafted, silly +thing that was very quick to put together for a "demo day" proof of +concept. It will almost certainly be replaced with something else, +both in terms of wire format and better code separation of +serialization/deserialization vs. network transport management, +etc.

+ + For the moment, this module implements a rudimentary TCP-based + protocol as the sole supported access method to the server, + sequencer, and projection store. Conceptually, those three + services are independent and ought to have their own protocols. As + a practical matter, there is no need for wire protocol + compatibility. Furthermore, from the perspective of failure + detection, it is very convenient that all three FLU-related + services are accessed using the same single TCP port.

Data Types

chunk()

@@ -337,6 +364,6 @@
-

Generated by EDoc, May 20 2015, 11:11:34.

+

Generated by EDoc, May 20 2015, 12:36:35.

diff --git a/edoc/machi_flu_psup.html b/edoc/machi_flu_psup.html index 502d194..137abfc 100644 --- a/edoc/machi_flu_psup.html +++ b/edoc/machi_flu_psup.html @@ -10,13 +10,23 @@

Module machi_flu_psup

-Supervisor for Machi FLU servers and their related support - servers. +Supervisor for Machi FLU servers and their related support +servers.

Behaviours: supervisor.

-

Description

Supervisor for Machi FLU servers and their related support - servers. +

Description

Supervisor for Machi FLU servers and their related support +servers.

+ +

Our parent supervisor, machi_flu_sup, is responsible for + managing FLUs as a single entity. However, the actual + implementation of a FLU includes three major Erlang processes (not + including support/worker procs): the FLU itself, the FLU's + projection store, and the FLU's local chain manager. This + supervisor is responsible for managing those three major services + as a single "package", to be started & stopped together.

+ +

Function Index

@@ -72,6 +82,6 @@
init/1
make_mgr_supname/1
Overviewerlang logo
-

Generated by EDoc, May 20 2015, 11:11:34.

+

Generated by EDoc, May 20 2015, 12:36:35.

diff --git a/edoc/machi_flu_sup.html b/edoc/machi_flu_sup.html index e90de72..8628dca 100644 --- a/edoc/machi_flu_sup.html +++ b/edoc/machi_flu_sup.html @@ -36,6 +36,6 @@
-

Generated by EDoc, May 20 2015, 11:11:34.

+

Generated by EDoc, May 20 2015, 12:36:35.

diff --git a/edoc/machi_projection.html b/edoc/machi_projection.html index 21ca159..b7622ca 100644 --- a/edoc/machi_projection.html +++ b/edoc/machi_projection.html @@ -89,6 +89,6 @@ correctly.


-

Generated by EDoc, May 20 2015, 11:11:34.

+

Generated by EDoc, May 20 2015, 12:36:35.

diff --git a/edoc/machi_projection_store.html b/edoc/machi_projection_store.html index feb45d6..91ac187 100644 --- a/edoc/machi_projection_store.html +++ b/edoc/machi_projection_store.html @@ -164,6 +164,6 @@ module's API.


-

Generated by EDoc, May 20 2015, 11:11:34.

+

Generated by EDoc, May 20 2015, 12:36:35.

diff --git a/edoc/machi_proxy_flu1_client.html b/edoc/machi_proxy_flu1_client.html index b65a72a..ee2d2fb 100644 --- a/edoc/machi_proxy_flu1_client.html +++ b/edoc/machi_proxy_flu1_client.html @@ -23,15 +23,18 @@ connection/disconnection with the remote Machi server.

Machi is intentionally avoiding using distributed Erlang for Machi's communication. This design decision makes Erlang-side code more difficult & complex, but it's the price to pay for some -language independence. Later in Machi's life cycle, we need to +language independence. Later in Machi's life cycle, we may (?) need to (re-)implement some components in a non-Erlang/BEAM-based language.

- This module implements a "man in the middle" proxy between the - Erlang client and Machi server (which is on the "far side" of a TCP - connection to somewhere). This proxy process will always execute - on the same Erlang node as the Erlang client that uses it. The - proxy is intended to be a stable, long-lived process that survives - TCP communication problems with the remote server. +

This module implements a "man in the middle" proxy between the +Erlang client and Machi server (which is on the "far side" of a TCP +connection to somewhere). This proxy process will always execute +on the same Erlang node as the Erlang client that uses it. The +proxy is intended to be a stable, long-lived process that survives +TCP communication problems with the remote server.

+ + For a higher level interface, see machi_cr_client. + For a lower level interface, see machi_flu1_client.

Function Index

@@ -285,6 +288,6 @@ language independence. Later in Machi's life cycle, we need to
append_chunk/4Append a chunk (binary- or iolist-style) of data to a file with Prefix.
Overviewerlang logo
-

Generated by EDoc, May 20 2015, 11:11:34.

+

Generated by EDoc, May 20 2015, 12:36:35.

diff --git a/edoc/machi_sequencer.html b/edoc/machi_sequencer.html index 3e8a04c..3f5cc45 100644 --- a/edoc/machi_sequencer.html +++ b/edoc/machi_sequencer.html @@ -18,6 +18,6 @@ the future?
-

Generated by EDoc, May 20 2015, 11:11:34.

+

Generated by EDoc, May 20 2015, 12:36:35.

diff --git a/edoc/machi_sup.html b/edoc/machi_sup.html index ff618ef..bf2e47e 100644 --- a/edoc/machi_sup.html +++ b/edoc/machi_sup.html @@ -34,6 +34,6 @@
-

Generated by EDoc, May 20 2015, 11:11:34.

+

Generated by EDoc, May 20 2015, 12:36:35.

diff --git a/edoc/machi_util.html b/edoc/machi_util.html index 713c56a..69f7bf6 100644 --- a/edoc/machi_util.html +++ b/edoc/machi_util.html @@ -163,6 +163,6 @@
-

Generated by EDoc, May 20 2015, 11:11:34.

+

Generated by EDoc, May 20 2015, 12:36:35.

diff --git a/edoc/machi_yessir_client.html b/edoc/machi_yessir_client.html index b690143..091b17d 100644 --- a/edoc/machi_yessir_client.html +++ b/edoc/machi_yessir_client.html @@ -337,6 +337,6 @@
-

Generated by EDoc, May 20 2015, 11:11:34.

+

Generated by EDoc, May 20 2015, 12:36:35.

diff --git a/edoc/overview-summary.html b/edoc/overview-summary.html index 43c0343..6928b8e 100644 --- a/edoc/overview-summary.html +++ b/edoc/overview-summary.html @@ -180,6 +180,6 @@ the code, is a reminder signal of unfinished work.


-

Generated by EDoc, May 20 2015, 11:11:34.

+

Generated by EDoc, May 20 2015, 12:36:35.

diff --git a/images/supervisor-2flus.png b/images/supervisor-2flus.png new file mode 100644 index 0000000..1302360 Binary files /dev/null and b/images/supervisor-2flus.png differ