* name-game-sketch.org * flu-and-chain-lifecycle.org * FAQ.md I've left out changes to the two design docs for now; most of their respective texts omit multiple chain scenarios entirely, so there isn't a huge amount to change.
25 KiB
- FLU and Chain Life Cycle Management
- Outline
- Terminology review
- Overview of administrative life cycles
- Quick admin: declarative management of Machi FLU and chain life cycles
- The "rc.d" style configuration file scheme
FLU and Chain Life Cycle Management -- mode: org; --
FLU and Chain Life Cycle Management
In an ideal world, we (the Machi development team) would have a full vision of how Machi would be managed, down to the last detail of beautiful CLI character and network protocol bit. Our vision isn't complete yet, so we are working one small step at a time.
Outline
- FLU and Chain Life Cycle Management
-
Terminology review
- Terminology: Machi run-time components/services/thingies
- Terminology: Machi chain data structures
- Terminology: Machi cluster data structures
-
Overview of administrative life cycles
- Cluster administrative life cycle
- Chain administrative life cycle
- FLU server administrative life cycle
-
Quick admin: declarative management of Machi FLU and chain life cycles
- Quick admin uses the "rc.d" config scheme for life cycle management
-
Quick admin's declarative "language": an Erlang-flavored AST
- Term 'host': define a new host for FLU services
- Term 'flu': define a new FLU
- Term 'chain': define or reconfigure a chain
-
Executing quick admin AST files via the 'machi-admin' utility
- Checking the syntax of an AST file
- Executing an AST file
- Using quick admin to manage multiple machines
-
The "rc.d" style configuration file scheme
- Riak had a similar configuration file editing problem (and its solution)
- Machi's "rc.d" file scheme.
-
FLU life cycle management using "rc.d" style files
- The key configuration components of a FLU
-
Chain life cycle management using "rc.d" style files
- The key configuration components of a chain
Terminology review
Terminology: Machi run-time components/services/thingies
- FLU: a basic Machi server, responsible for managing a collection of files.
-
Chain: a small collection of FLUs that maintain replicas of the same collection of files. A chain is usually small, 1-3 servers, where more than 3 would be used only in cases when availability of certain data is critical despite failures of several machines.
- The length of a chain is directly proportional to its replication factor, e.g., a chain length=3 will maintain (nominally) 3 replicas of each file.
- To maintain file availability when
F
failures have occurred, a chain must be at leastF+1
members long. (In comparison, the quorum replication technique requires2F+1
members in the general case.)
- Cluster: A collection of Machi chains that are used to store files in a horizontally partitioned/sharded/distributed manner.
Terminology: Machi data structures
- Projection: used to define a single chain: the chain's consistency mode (strong or eventual consistency), all members (from an administrative point of view), all active members (from a runtime, automatically-managed point of view), repairing/file-syncing members (also runtime, auto-managed), and so on
- Epoch: A version number of a projection. The epoch number is used by both clients & servers to manage transitions from one projection to another, e.g., when the chain is temporarily shortened by the failure of a member FLU server.
Terminology: Machi cluster data structures
-
Namespace: A collection of human-friendly names that are mapped to groups of Machi chains that provide the same type of storage service: consistency mode, replication policy, etc.
- A single namespace name, e.g.
normal-ec
, is paired with a single cluster map (see below). - Example:
normal-ec
might be a collection of Machi chains in eventually-consistent mode that are of length=3. - Example:
risky-ec
might be a collection of Machi chains in eventually-consistent mode that are of length=1. - Example:
mgmt-critical
might be a collection of Machi chains in strongly-consistent mode that are of length=7.
- A single namespace name, e.g.
- Cluster map: Encodes the rules which partition/shard/distribute the files stored in a particular namespace across a group of chains that collectively store the namespace's files.
- Chain weight: A value assigned to each chain within a cluster map structure that defines the relative storage capacity of a chain within the namespace. For example, a chain weight=150 has 50% more capacity than a chain weight=100.
- Cluster map epoch: The version number assigned to a cluster map.
Overview of administrative life cycles
Cluster administrative life cycle
- Cluster is first created
- Adds namespaces (e.g. consistency policy + chain length policy) to the cluster
- Chains are added to/removed from a namespace to increase/decrease the namespace's storage capacity.
- Adjust chain weights within a namespace, e.g., to shift files within the namespace to chains with greater storage capacity resources and/or runtime I/O resources.
A cluster "file migration" is the process of moving files from one namespace member chain to another for purposes of shifting & re-balancing storage capacity and/or runtime I/O capacity.
Chain administrative life cycle
- A chain is created with an initial FLU membership list.
- Chain may be administratively modified zero or more times to add/remove member FLU servers.
- A chain may be decommissioned.
See also: http://basho.github.io/machi/edoc/machi_lifecycle_mgr.html
FLU server administrative life cycle
- A FLU is created after an administrator chooses the FLU's runtime location is selected by the administrator: which machine/virtual machine, IP address and TCP port allocation, etc.
- An unassigned FLU may be added to a chain by chain administrative policy.
-
A FLU that is assigned to a chain may be removed from that chain by chain administrative policy.
- In the current implementation, the FLU's Erlang processes will be halted. Then the FLU's data and metadata files will be moved to another area of the disk for safekeeping. Later, a "garbage collection" process can be used for reclaiming disk space used by halted FLU servers.
See also: http://basho.github.io/machi/edoc/machi_lifecycle_mgr.html
Quick admin: declarative management of Machi FLU and chain life cycles
The "quick admin" scheme is a temporary (?) tool for managing Machi FLU server and chain life cycles in a declarative manner. The API is described in this section.
Quick admin uses the "rc.d" config scheme for life cycle management
As described at the top of http://basho.github.io/machi/edoc/machi_lifecycle_mgr.html, the "rc.d" config files do not manage "policy". "Policy" is doing the right thing with a Machi cluster from a systems administrator's point of view. The "rc.d" config files can only implement decisions made according to policy.
The "quick admin" tool is a first attempt at automating policy decisions in a safe way (we hope) that is also easy to implement (we hope) with a variety of systems management tools, e.g. Chef, Puppet, Ansible, Saltstack, or plain-old-human-at-a-keyboard.
Quick admin's declarative "language": an Erlang-flavored AST
The "language" that an administrator uses to express desired policy changes is not (yet) a true language. As a quick implementation hack, the current language is an Erlang-flavored abstract syntax tree (AST). The tree isn't very deep, either, frequently just one element tall. (Not much of a tree, is it?)
There are three terms in the language currently:
host
, define a new host that can execute FLU serversflu
, define a new FLUchain
, define a new chain or re-configure an existing chain with the same name
Term 'host': define a new host for FLU services
In this context, a host is a machine, virtual machine, or container that can execute the Machi application and can therefore provide FLU services, i.e. file service, Humming Consensus management.
Two formats may be used to define a new host:
{host, Name, Props}.
{host, Name, AdminI, ClientI, Props}.
The shorter tuple is shorthand notation for the latter. If the shorthand form is used, then it will be converted automatically to the long form as:
{host, Name, AdminI=Name, ClientI=Name, Props}.
Type information, description, and restrictions:
-
Name::string()
TheName
attribute must be unique. Note that it is possible to define two different hosts, one using a DNS hostname and one using an IP address. The user must avoid this double-definition because it is not enforced by quick admin.- The
Name
field is used for cross-reference purposes with other terms, e.g.,flu
andchain
. - There is no syntax yet for removing a host definition.
- The
-
AdminI::string()
A DNS hostname or IP address for cluster administration purposes, e.g. SSH access.- This field is unused at the present time.
-
ClientI::string()
A DNS hostname or IP address for Machi's client protocol access, e.g., Protocol Buffers network API service.- This field is unused at the present time.
props::proplist()
is an Erlang-style property list for specifying additional configuration options, debugging information, sysadmin comments, etc.-
A full-featured admin tool should also include managing several other aspects of configuration related to a "host". For example, for any single IP address, quick admin assumes that there will be exactly one Erlang VM that is running the Machi application. Of course, it is possible to have dozens of Erlang VMs on the same (let's assume for clarity) hardware machine and all running Machi … but there are additional aspects of such a machine that quick admin does not account for
- multiple IP addresses per machine
- multiple Machi package installation paths
- multiple Machi config files (e.g. cuttlefish config,
etc.conf
,vm.args
) -
multiple data directories/file system mount points
- This is also a management problem for quick admin for a single Machi package on a machine to take advantage of bulk data storage using multiple multiple file system mount points.
- multiple Erlang VM host names, required for distributed Erlang,
which is used for communication with
machi
andmachi-admin
command line utilities. - and others….
Term 'flu': define a new FLU
A new FLU is defined relative to a previously-defined host
entities;
an exception will be thrown if the host
cannot be cross-referenced.
{flu, Name, HostName, Port, Props}
Type information, description, and restrictions:
-
Name::atom()
The name of the FLU, as a human-friendly name and also for internal management use; please note theatom()
type. This name must be unique.- The
Name
field is used for cross-reference purposes with thechain
term. - There is no syntax yet for removing a FLU definition.
- The
Hostname::string()
The cross-reference name of thehost
that this FLU should run on.Port::non_neg_integer()
The TCP port used by this FLU server's Protocol Buffers network API listener serviceprops::proplist()
is an Erlang-style property list for specifying additional configuration options, debugging information, sysadmin comments, etc.
Term 'chain': define or reconfigure a chain
A chain is defined relative to zero or more previously-defined flu
entities; an exception will be thrown if any flu
cannot be
cross-referenced.
Two formats may be used to define/reconfigure a chain:
{chain, Name, FullList, Props}.
{chain, Name, CMode, FullList, Witnesses, Props}.
The shorter tuple is shorthand notation for the latter. If the shorthand form is used, then it will be converted automatically to the long form as:
{chain, Name, ap_mode, FullList, [], Props}.
Type information, description, and restrictions:
-
Name::atom()
The name of the chain, as a human-friendly name and also for internal management use; please note theatom()
type. This name must be unique.- There is no syntax yet for removing a chain definition.
-
CMode::'ap_mode'|'cp_mode'
Defines the consistency mode of the chain, either eventual consistency or strong consistency, respectively.- A chain cannot change consistency mode, e.g., from strong~->~eventual consistency.
FullList::list(atom())
Specifies the list of full-service FLU servers, i.e. servers that provide file data & metadata services as well as Humming Consensus. Each atom in the list must cross-reference with a previously definedchain
; an exception will be thrown if anyflu
cannot be cross-referenced.-
Witnesses::list(atom())
Specifies the list of witness-only servers, i.e. servers that only participate in Humming Consensus. Each atom in the list must cross-reference with a previously definedchain
; an exception will be thrown if anyflu
cannot be cross-referenced.- This list must be empty for eventual consistency chains.
props::proplist()
is an Erlang-style property list for specifying additional configuration options, debugging information, sysadmin comments, etc.- If this term specifies a new
chain
name, then all of the member FLU servers (full & witness types) will be bootstrapped to a starting configuration. -
If this term specifies a previously-defined
chain
name, then all of the member FLU servers (full & witness types, respectively) will be adjusted to add or remove members, as appropriate.- Any FLU servers added to either list must not be assigned to any other chain, or they must be a member of this specific chain.
- Any FLU servers removed from either list will be halted. (See the "FLU server administrative life cycle" section above.)
Executing quick admin AST files via the 'machi-admin' utility
Examples of quick admin AST files can be found in the
priv/quick-admin/examples
directory. Below is an example that will
define a new host ( "localhost"
), three new FLU servers ( f1
& f2
and f3
), and an eventually consistent chain ( c1
) that uses the new
FLU servers:
{host, "localhost", []}.
{flu,f1,"localhost",20401,[]}.
{flu,f2,"localhost",20402,[]}.
{flu,f3,"localhost",20403,[]}.
{chain,c1,[f1,f2,f3],[]}.
Checking the syntax of an AST file
Given an AST config file, /path/to/ast/file
, its basic syntax and
correctness can be checked without executing it.
./rel/machi/bin/machi-admin quick-admin-check /path/to/ast/file
- The utility will exit with status zero and output
ok
if the syntax and proposed configuration appears to be correct. - If there is an error, the utility will exit with status one, and an error message will be printed.
Executing an AST file
Given an AST config file, /path/to/ast/file
, it can be executed
using the command:
./rel/machi/bin/machi-admin quick-admin-apply /path/to/ast/file RelativeHost
… where the last argument, RelativeHost
, should be the exact
spelling of one of the previously defined AST host
entities,
and also is the same host that the machi-admin
utility is being
executed on.
Restrictions and warnings:
- This is alpha quality software.
-
There is no "undo".
-
Of course there is, but you need to resort to doing things like using
machi attach
to attach to the server's CLI to then execute magic Erlang incantations to stop FLUs, unconfigure chains, etc.- Oh, and delete some files with magic paths, also.
-
Using quick admin to manage multiple machines
A quick sketch follows:
- Create the AST file to specify all of the changes that you wish to
make to all hosts, FLUs, and/or chains, e.g.,
/tmp/ast.txt
. - Check the basic syntax with the
quick-admin-check
argument tomachi-admin
. - If the syntax is good, then copy
/tmp/ast.txt
to all hosts in the cluster, using the same path,/tmp/ast.txt
. - For each machine in the cluster, run:
./rel/machi/bin/machi-admin quick-admin-apply /tmp/ast.txt RelativeHost
… where RelativeHost is the AST host
name of the machine that you
are executing the machi-admin
command on. The command should be
successful, with exit status 0 and outputting the string ok
.
Finally, for each machine in the cluster, a listing of all files in
the directory rel/machi/etc/quick-admin-archive
should show exactly
the same files, one for each time that quick-admin-apply
has been
run successfully on that machine.
The "rc.d" style configuration file scheme
This configuration scheme is inspired by BSD UNIX's init(8)
process
manager's configuration style, called "rc.d" after the name of the
directory where these files are stored, /etc/rc.d
. The init
process is responsible for (among other things) starting UNIX
processes at machine boot time and stopping them when the machine is
shut down.
The original scheme used by init
to start processes at boot time was
a single Bourne shell script called /etc/rc
. When a new software
package was installed that required a daemon to be started at boot
time, text was added to the /etc/rc
file. Uninstalling packages was
much trickier, because it meant removing lines from a file that
is a computer program (run by the Bourne shell, a Turing-complete
programming language). Error-free editing of the /etc/rc
script
was impossible in all cases.
Later, init
's configuration was split into a few master Bourne shell
scripts and a subdirectory, /etc/rc.d
. The subdirectory contained
shell scripts that were responsible for boot time starting of a single
daemon or service, e.g. NFS or an HTTP server. When a new software
package was added, a new file was added to the rc.d
subdirectory.
When a package was removed, the corresponding file in rc.d
was
removed. With this simple scheme, addition & removal of boot time
scripts was vastly simplified.
Riak had a similar configuration file editing problem (and its solution)
Another software product from Basho Technologies, Riak, had a similar
configuration file editing problem. One file in particular,
app.config
, had a syntax that made it difficult both for human
systems administrators and also computer programs to edit the file in
a syntactically correct manner.
Later releases of Riak switched to an alternative configuration file
format, one inspired by the BSD UNIX sysctl(8)
utility and
sysctl.conf(5)
file syntax. The sysctl.conf
format is much easier
to manage by computer programs to add items. Removing items is not
100% simple, however: the correct lines must be identified and then
removed (e.g. with Perl or a text editor or combination of grep -v
and mv
), but removing any comment lines that "belong" to the removed
config item(s) is not any easy for a 1-line shell script to do 100%
correctly.
Machi will use the sysctl.conf
style configuration for some
application configuration variables. However, adding & removing FLUs
and chains will be managed using the "rc.d" style because of the
"rc.d" scheme's simplicity and tolerance of mistakes by administrators
(human or computer).
Machi's "rc.d" file scheme.
Machi will use a single subdirectory that will contain configuration files for some life cycle management task, e.g. a single FLU or a single chain.
The contents of the file should be a single Erlang term, serialized in
ASCII form as Erlang source code statement, i.e. a single Erlang term
T
that is formatted by io:format("~w.",[T]).
. This file must be
parseable by the Erlang function file:consult()
.
Later versions of Machi may change the file format to be more familiar to administrators who are unaccustomed to Erlang language syntax.
FLU life cycle management using "rc.d" style files
The key configuration components of a FLU
- The machine (or virtual machine) to run it on.
- The Machi software package's artifacts to execute.
- The disk device(s) used to store Machi file data & metadata, "rc.d" style config files, etc.
- The name, IP address and TCP port assigned to the FLU service.
- Its chain assignment.
Notes:
- Items 1-3 are currently outside of the scope of this life cycle document. We assume that human administrators know how to do these things.
- Item 4's properties are explicitly managed by a FLU-defining "rc.d" style config file.
- Item 5 is managed by the chain life cycle management system.
Here is an example of a properly formatted FLU config file:
{p_srvr,f1,machi_flu1_client,"192.168.72.23",20401,[]}.
… which corresponds to the following Erlang record definition:
-record(p_srvr, {
name :: atom(),
proto_mod = 'machi_flu1_client' :: atom(), % Module name
address :: term(), % Protocol-specific
port :: term(), % Protocol-specific
props = [] :: list() % proplist for other related info
}).
name
isf1
. This is name of the FLU. This name should be unique over the lifetime of the administrative domain and thus managed by external policy. This name must be the same as the name of the config file that defines the FLU.proto_mod
is used for internal management purposes and should be considered a mandatory constant.address
is "192.168.72.23". The DNS hostname or IP address used by other servers to communicate with this FLU. This must be a valid IP address, previously assigned to this machine/VM using the appropriate operating system-specific procedure.port
is TCP port 20401. The TCP port number that the FLU listens to for incoming Protocol Buffers-serialized communication. This TCP port must not be in use (now or in the future) by another Machi FLU or any other process running on this machine/VM.props
is an Erlang-style property list for specifying additional configuration options, debugging information, sysadmin comments, etc.
Chain life cycle management using "rc.d" style files
Unlike FLUs, chains have a self-management aspect that makes a chain life cycle different from a single FLU server. Machi's chains are self-managing, via Humming Consensus; see the https://github.com/basho/machi/tree/master/doc/ directory for much more detail about Humming Consensus. After FLUs have received their initial chain configuration for Humming Consensus, the FLUs will manage the chain (and each other) by themselves.
However, Humming Consensus does not handle three chain management problems:
- Specifying the very first chain configuration,
- Altering the membership of the chain (i.e. adding/removing FLUs from the chain),
- Stopping the chain permanently.
A chain "rc.d" file will only be used to bootstrap a newly-defined FLU server. It's like a piece of glue information to introduce the new FLU to the Humming Consensus group that is managing the chain's dynamic state (e.g. which members are up or down). In all other respects, chain config files are ignored by life cycle management code. However, to mimic the life cycle of the FLU server's "rc.d" config files, a chain "rc.d" files is not deleted until the chain has been decommissioned (i.e. defined with length=0).
The key configuration components of a chain
- The name of the chain.
- Consistency mode: eventually consistent or strongly consistent.
-
The membership list of all FLU servers in the chain.
- Remember, all servers in a single chain will manage full replicas of the same collection of Machi files.
-
If the chain is defined to use strongly consistent mode, then a list of "witness servers" may also be defined. See the [https://github.com/basho/machi/tree/master/doc/] documentation for more information on witness servers.
- The witness list must be empty for all chains in eventual consistency mode.
Here is an example of a properly formatted chain config file:
{chain_def_v1,c1,ap_mode,
[{p_srvr,f1,machi_flu1_client,"localhost",20401,[]},
{p_srvr,f2,machi_flu1_client,"localhost",20402,[]},
{p_srvr,f3,machi_flu1_client,"localhost",20403,[]}],
[],[],[],
[f1,f2,f3],
[],[]}.
… which corresponds to the following Erlang record definition:
-record(chain_def_v1, {
name :: atom(), % chain name
mode :: 'ap_mode' | 'cp_mode',
full = [] :: [p_srvr()],
witnesses = [] :: [p_srvr()],
old_full = [] :: [atom()], % guard against some races
old_witnesses=[] :: [atom()], % guard against some races
local_run = [] :: [atom()], % must be tailored to each machine!
local_stop = [] :: [atom()], % must be tailored to each machine!
props = [] :: list() % proplist for other related info
}).
name
isc1
, the name of the chain. This name should be unique over the lifetime of the administrative domain and thus managed by external policy. This name must be the same as the name of the config file that defines the chain.mode
isap_mode
, an internal code symbol for eventual consistency mode.full
is a list of Erlang#p_srvr{}
records for full-service members of the chain, i.e., providing Machi file data & metadata storage services.witnesses
is a list of Erlang#p_srvr{}
records for witness-only FLU servers, i.e., providing only Humming Consensus service.- The next four fields are used for internal management only.
props
is an Erlang-style property list for specifying additional configuration options, debugging information, sysadmin comments, etc.