This commit is contained in:
Scott Lystig Fritchie 2015-12-31 15:44:51 +09:00
parent a3fc1c3d68
commit 3b594504fe
4 changed files with 76 additions and 60 deletions

View file

@ -21,8 +21,9 @@
%% @doc Erlang API for the Machi client-implemented Chain Replication %% @doc Erlang API for the Machi client-implemented Chain Replication
%% (CORFU-style) protocol. %% (CORFU-style) protocol.
%% %%
%% See also the docs for {@link machi_flu1_client} for additional %% Please see {@link machi_flu1_client} the "Client API implemntation notes"
%% details on data types and operation descriptions. %% section for how this module relates to the rest of the client API
%% implementation.
%% %%
%% The API here is much simpler than the {@link machi_flu1_client} or %% The API here is much simpler than the {@link machi_flu1_client} or
%% {@link machi_proxy_flu1_client} APIs. This module's API is a %% {@link machi_proxy_flu1_client} APIs. This module's API is a
@ -43,64 +44,6 @@
%% %%
%% Doc TODO: Once this API stabilizes, add all relevant data type details %% Doc TODO: Once this API stabilizes, add all relevant data type details
%% to the EDoc here. %% to the EDoc here.
%%
%%
%% === Missing API features ===
%%
%% So far, there is one missing client API feature that ought to be
%% added to Machi in the near future: more flexible checksum
%% management.
%%
%% Add a `source' annotation to all checksums to indicate where the
%% checksum was calculated. For example,
%%
%% <ul>
%%
%% <li> Calculated by client that performed the original chunk append,
%% </li>
%%
%% <li> Calculated by the 1st Machi server to receive an
%% un-checksummed append request
%% </li>
%%
%% <li> Re-calculated by Machi to manage fewer checksums of blocks of
%% data larger than the original client-specified chunks.
%% </li>
%% </ul>
%%
%% Client-side checksums would be the "strongest" type of
%% checksum, meaning that any data corruption (of the original
%% data and/or of the checksum itself) can be detected after the
%% client-side calculation. There are too many horror stories on
%% The Net about IP PDUs that are corrupted but unnoticed due to
%% weak TCP checksums, buggy hardware, buggy OS drivers, etc.
%% Checksum versioning is also desirable if/when the current checksum
%% implementation changes from SHA-1 to something else.
%%
%%
%% === Implementation notes ===
%%
%% The major operation processing is implemented in a state machine-like
%% manner. Before attempting an operation `X', there's an initial
%% operation `pre-X' that takes care of updating the epoch id,
%% restarting client protocol proxies, and if there's any server
%% instability (e.g. some server is wedged), then insert some sleep
%% time. When the chain appears to have stabilized, then we try the `X'
%% operation again.
%%
%% Function name for the `pre-X' stuff is usually `X()', and the
%% function name for the `X' stuff is usually `X2()'. (I.e., the `X'
%% stuff follows after `pre-X' and therefore has a `2' suffix on the
%% function name.)
%%
%% In the case of read repair, there are two stages: find the value to
%% perform the repair, then perform the repair writes. In the case of
%% the repair writes, the `pre-X' function is named `read_repair3()',
%% and the `X' function is named `read_repair4()'.
%%
%% TODO: It would be nifty to lift the very-nearly-but-not-quite-boilerplate
%% of the `pre-X' functions into a single common function ... but I'm not
%% sure yet on how to do it without making the code uglier.
-module(machi_cr_client). -module(machi_cr_client).

View file

@ -38,6 +38,71 @@
%% TODO This EDoc was written first, and the EDoc and also `-type' and %% TODO This EDoc was written first, and the EDoc and also `-type' and
%% `-spec' definitions for {@link machi_proxy_flu1_client} and {@link %% `-spec' definitions for {@link machi_proxy_flu1_client} and {@link
%% machi_cr_client} must be improved. %% machi_cr_client} must be improved.
%%
%% == Client API implementation notes ==
%%
%% At the moment, there are several modules that implement various
%% subsets of the Machi API. The table below attempts to show how and
%% why they differ.
%%
%% ```
%% |--------------------------+-------+-----+------+------+-------+----------------|
%% | | PB | | # | | Conn | Epoch & NS |
%% | Module name | Level | CR? | FLUS | Impl | Life? | version aware? |
%% |--------------------------+-------+-----+------+------+-------+----------------|
%% | machi_pb_high_api_client | high | yes | many | proc | long | no |
%% | machi_cr_client | low | yes | many | proc | long | no |
%% | machi_proxy_flu1_client | low | no | 1 | proc | long | yes |
%% | machi_flu1_client | low | no | 1 | lib | short | yes |
%% |--------------------------+-------+-----+------+------+-------+----------------|
%% '''
%%
%% In terms of use and API layering, the table rows are in highest`->'lowest
%% order: each level calls the layer immediately below it.
%%
%% <dl>
%% <dt> <b> PB Level</b> </dt>
%% <dd> The Protocol Buffers API is divided logically into two levels,
%% "low" and "high". The low-level protocol is used for intra-chain
%% communication. The high-level protocol is used for clients outside
%% of a Machi chain or Machi cluster of chains.
%% </dd>
%% <dt> <b> CR?</b> </dt>
%% <dd> Does this API support (directly or indirectly) Chain
%% Replication? If `no', then the API has no awareness of multiple
%% replicas of any file or file chunk; unaware clients can only
%% perform operations at a single Machi FLU's file service or
%% projection store service.
%% </dd>
%% <dt> <b> # FLUs</b> </dt>
%% <dd> Now many FLUs does this API layer communicate with
%% simultaneously? Note that there is a one-to-one correspondence
%% between this value and the "CR?" column's value.
%% </dd>
%% <dt> <b> Impl</b> </dt>
%% <dd> Implementation: library-only or an Erlang process,
%% e.g., `gen_server'.
%% </dd>
%% <dt> <b> Conn Life?</b> </dt>
%% <dd> Expected TCP session connection life: short or long. At the
%% lowest level, the {@link machi_flu1_client} API implementation takes
%% no effort to reconnect to a remote FLU when its single TCP session
%% is broken. For long-lived connection life APIs, the server side will
%% automatically attempt to reconnect to remote FLUs when a TCP session
%% is broken.
%% </dd>
%% <dt> <b> Epoch &amp; NS version aware?</b> </dt>
%% <dd> Are clients of this API responsible for knowing a chain's EpochID
%% and namespace version numbers? If `no', then the server side of the
%% API will automatically attempt to discover/re-discover the EpochID and
%% namespace version numbers whenever they change.
%% </dd>
%% </dl>
%%
%% The only protocol that we expect to be used by entities outside of
%% a single Machi chain or a multi-chain cluster is the "high"
%% Protocol Buffers API. The {@link riak_pb_high_api_client} module
%% is an Erlang reference implementation of this PB API.
-module(machi_flu1_client). -module(machi_flu1_client).

View file

@ -25,6 +25,10 @@
%% to a single socket connection, and there is no code to deal with %% to a single socket connection, and there is no code to deal with
%% multiple connections/load balancing/error handling to several/all %% multiple connections/load balancing/error handling to several/all
%% Machi cluster servers. %% Machi cluster servers.
%%
%% Please see {@link machi_flu1_client} the "Client API implemntation notes"
%% section for how this module relates to the rest of the client API
%% implementation.
-module(machi_pb_high_client). -module(machi_pb_high_client).

View file

@ -22,6 +22,10 @@
%% proxy-process style API for hiding messy details such as TCP %% proxy-process style API for hiding messy details such as TCP
%% connection/disconnection with the remote Machi server. %% connection/disconnection with the remote Machi server.
%% %%
%% Please see {@link machi_flu1_client} the "Client API implemntation notes"
%% section for how this module relates to the rest of the client API
%% implementation.
%%
%% Machi is intentionally avoiding using distributed Erlang for %% Machi is intentionally avoiding using distributed Erlang for
%% Machi's communication. This design decision makes Erlang-side code %% Machi's communication. This design decision makes Erlang-side code
%% more difficult &amp; complex, but it's the price to pay for some %% more difficult &amp; complex, but it's the price to pay for some