* To Do list ** DONE remove the escript* stuff from machi_util.erl ** DONE Add functions to manipulate 1-chain projections - Add epoch ID = epoch number + checksum of projection! Done via compare() func. ** DONE Change all protocol ops to add epoch ID ** DONE Add projection store to each FLU. *** DONE What should the API look like? (borrow from chain mgr PoC?) Yeah, I think that's pretty complete. Steal it now, worry later. *** DONE Choose protocol & TCP port. Share with get/put? Separate? Hrm, I like the idea of having a single TCP port to talk to any single FLU. To make the protocol "easy" to hack, how about using the same basic method as append/write where there's a variable size blob. But we'll format that blob as a term_to_binary(). Then dispatch to a single func, and pattern match Erlang style in that func. *** DONE Do it. ** DONE Finish OTP'izing the Chain Manager with FLU & proj store processes ** DONE Eliminate the timeout exception for the client: just {error,timeout} ret ** DONE Move prototype/chain-manager code to "top" of source tree *** DONE Preserve current test code (leave as-is? tiny changes?) *** DONE Make chain manager code flexible enough to run "real world" or "sim" ** DONE Add projection wedging logic to each FLU. ** DONE Implement real data repair, orchestrated by the chain manager ** DONE Change all protocol ops to enforce the epoch ID - Add no-wedging state to make testing easier? ** DONE Adapt the projection-aware, CR-implementing client from demo-day ** DONE Add major comment sections to the CR-impl client ** DONE Simple basho_bench driver, put some unscientific chalk on the benchtop ** TODO Create parallel PULSE test for basic API plus chain manager repair ** DONE Add client-side vs. server-side checksum type, expand client API? ** TODO Add gproc and get rid of registered name rendezvous *** TODO Fixes the atom table leak *** TODO Fixes the problem of having active sequencer for the same prefix on two FLUS in the same VM ** TODO Fix all known bugs/cruft with Chain Manager (list below) *** DONE Fix known bugs *** DONE Clean up crufty TODO comments and other obvious cruft *** TODO Re-add verification step of stable epochs, including inner projections! *** TODO Attempt to remove cruft items in flapping_i? ** TODO Move the FLU server to gen_server behavior? * DONE Chain manager CP mode, Plan B ** SKIP Maybe? Change ch_mgr to use middleworker **** DONE Is it worthwhile? Is the parallelism so important? No, probably. **** SKIP Move middleworker func to utility module? ** DONE Add new proc to psup group *** DONE Name: machi_fitness ** DONE ch_mgr keeps its current proc struct: i.e. same 1 proc as today ** NO chmgr asks hosed mgr for hosed list @ start of react_to_env ** DONE For all hosed, do *async*: try to read latest proj. *** NO If OK, inform hosed mgr: status change will be used by next HC iter. *** NO If fail, no change, because that server is already known to be hosed *** DONE For all non-hosed, continue as the chain manager code does today *** DONE Any new errors are added to UpNodes/DownNodes tracking as used today *** DONE At end of react loop, if UpNodes list differs, inform hosed mgr. * TODO fitness_mon, the fitness monitor ** DONE Map key & val sketch Logical sketch: Map key: ObservingServerName::atom() Map val: { ObservingServerLastModTime::now(), UnfitList::list(ServerName::atom()), Props::proplist() } Implementation sketch: 1. Use CRDT map. 2. If map key is not atom, then atom->string or atom->binary is fine. 3. For map value, is it possible CRDT LWW type? ** DONE Investigate riak_dt data structure definition, manipulating, etc. ** DONE Add dependency on riak_dt ** WORKING Update is an entire dict from Observer O *** TODO Merge my pending map + update map + my last mod time + my unfit list *** TODO if merged /= pending: **** TODO Schedule async tick (more) Tick message contains list of servers with differing state as of this instant in time... we want to avoid triggering decisions about fitness/unfitness for other servers where we might have received less than a full time period's worth of waiting. **** TODO Spam merged map to All_list -- [Me, O] **** TODO Set pending <- merged *** TODO When we receive an async tick **** TODO set active map <- pending map for all servers in ticks list **** TODO Send ch_mgr a react_to_env tick trigger *** TODO react_to_env tick trigger actions **** TODO Filter active map to remove stale entries (i.e. no update in 1 hour) **** TODO If time since last map spam is too long, spam our *pending* map **** TODO Proceed with normal react processing, using *active* map for AllHosed!