114 lines
4.6 KiB
Org Mode
114 lines
4.6 KiB
Org Mode
* To Do list
|
|
|
|
** DONE remove the escript* stuff from machi_util.erl
|
|
** DONE Add functions to manipulate 1-chain projections
|
|
|
|
- Add epoch ID = epoch number + checksum of projection!
|
|
Done via compare() func.
|
|
|
|
** DONE Change all protocol ops to add epoch ID
|
|
** DONE Add projection store to each FLU.
|
|
|
|
*** DONE What should the API look like? (borrow from chain mgr PoC?)
|
|
|
|
Yeah, I think that's pretty complete. Steal it now, worry later.
|
|
|
|
*** DONE Choose protocol & TCP port. Share with get/put? Separate?
|
|
|
|
Hrm, I like the idea of having a single TCP port to talk to any single
|
|
FLU.
|
|
|
|
To make the protocol "easy" to hack, how about using the same basic
|
|
method as append/write where there's a variable size blob. But we'll
|
|
format that blob as a term_to_binary(). Then dispatch to a single
|
|
func, and pattern match Erlang style in that func.
|
|
|
|
*** DONE Do it.
|
|
|
|
** DONE Finish OTP'izing the Chain Manager with FLU & proj store processes
|
|
** DONE Eliminate the timeout exception for the client: just {error,timeout} ret
|
|
** DONE Move prototype/chain-manager code to "top" of source tree
|
|
*** DONE Preserve current test code (leave as-is? tiny changes?)
|
|
*** DONE Make chain manager code flexible enough to run "real world" or "sim"
|
|
** DONE Add projection wedging logic to each FLU.
|
|
** DONE Implement real data repair, orchestrated by the chain manager
|
|
** DONE Change all protocol ops to enforce the epoch ID
|
|
|
|
- Add no-wedging state to make testing easier?
|
|
|
|
|
|
** DONE Adapt the projection-aware, CR-implementing client from demo-day
|
|
** DONE Add major comment sections to the CR-impl client
|
|
** DONE Simple basho_bench driver, put some unscientific chalk on the benchtop
|
|
** TODO Create parallel PULSE test for basic API plus chain manager repair
|
|
** DONE Add client-side vs. server-side checksum type, expand client API?
|
|
** TODO Add gproc and get rid of registered name rendezvous
|
|
*** TODO Fixes the atom table leak
|
|
*** TODO Fixes the problem of having active sequencer for the same prefix
|
|
on two FLUS in the same VM
|
|
|
|
** TODO Fix all known bugs/cruft with Chain Manager (list below)
|
|
*** DONE Fix known bugs
|
|
*** DONE Clean up crufty TODO comments and other obvious cruft
|
|
*** TODO Re-add verification step of stable epochs, including inner projections!
|
|
*** TODO Attempt to remove cruft items in flapping_i?
|
|
|
|
** TODO Move the FLU server to gen_server behavior?
|
|
|
|
|
|
* DONE Chain manager CP mode, Plan B
|
|
** SKIP Maybe? Change ch_mgr to use middleworker
|
|
**** DONE Is it worthwhile? Is the parallelism so important? No, probably.
|
|
**** SKIP Move middleworker func to utility module?
|
|
** DONE Add new proc to psup group
|
|
*** DONE Name: machi_fitness
|
|
** DONE ch_mgr keeps its current proc struct: i.e. same 1 proc as today
|
|
** NO chmgr asks hosed mgr for hosed list @ start of react_to_env
|
|
** DONE For all hosed, do *async*: try to read latest proj.
|
|
*** NO If OK, inform hosed mgr: status change will be used by next HC iter.
|
|
*** NO If fail, no change, because that server is already known to be hosed
|
|
*** DONE For all non-hosed, continue as the chain manager code does today
|
|
*** DONE Any new errors are added to UpNodes/DownNodes tracking as used today
|
|
*** DONE At end of react loop, if UpNodes list differs, inform hosed mgr.
|
|
|
|
* DONE fitness_mon, the fitness monitor
|
|
** DONE Map key & val sketch
|
|
|
|
Logical sketch:
|
|
|
|
Map key: ObservingServerName::atom()
|
|
|
|
Map val: { ObservingServerLastModTime::now(),
|
|
UnfitList::list(ServerName::atom()),
|
|
AdminDownList::list(ServerName::atom()),
|
|
Props::proplist() }
|
|
|
|
Implementation sketch:
|
|
|
|
1. Use CRDT map.
|
|
2. If map key is not atom, then atom->string or atom->binary is fine.
|
|
3. For map value, is it possible CRDT LWW type?
|
|
|
|
** DONE Investigate riak_dt data structure definition, manipulating, etc.
|
|
** DONE Add dependency on riak_dt
|
|
** DONE Update is an entire dict from Observer O
|
|
*** DONE Merge my pending map + update map + my last mod time + my unfit list
|
|
*** DONE if merged /= pending:
|
|
**** DONE Schedule async tick (more)
|
|
|
|
Tick message contains list of servers with differing state as of this
|
|
instant in time... we want to avoid triggering decisions about
|
|
fitness/unfitness for other servers where we might have received less
|
|
than a full time period's worth of waiting.
|
|
|
|
**** DONE Spam merged map to All_list -- [Me]
|
|
**** DONE Set pending <- merged
|
|
|
|
*** DONE When we receive an async tick
|
|
**** DONE set active map <- pending map for all servers in ticks list
|
|
**** DONE Send ch_mgr a react_to_env tick trigger
|
|
*** DONE react_to_env tick trigger actions
|
|
**** DONE Filter active map to remove stale entries (i.e. no update in 1 hour)
|
|
**** DONE If time since last map spam is too long, spam our *pending* map
|
|
**** DONE Proceed with normal react processing, using *active* map for AllHosed!
|
|
|