Working on new name-game-sketch.org
This commit is contained in:
parent
4c784613a1
commit
e2d486d347
2 changed files with 115 additions and 0 deletions
BIN
doc/cluster-of-clusters/migration-3to4.png
Normal file
BIN
doc/cluster-of-clusters/migration-3to4.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 19 KiB |
115
doc/cluster-of-clusters/name-game-sketch.org
Normal file
115
doc/cluster-of-clusters/name-game-sketch.org
Normal file
|
@ -0,0 +1,115 @@
|
|||
-*- mode: org; -*-
|
||||
#+TITLE: Machi cluster-of-clusters "name game" sketch
|
||||
#+AUTHOR: Scott
|
||||
#+STARTUP: lognotedone hidestars indent showall inlineimages
|
||||
#+SEQ_TODO: TODO WORKING WAITING DONE
|
||||
|
||||
* "Name Games" with random-slicing style consistent hashing
|
||||
|
||||
Our goal: to distribute lots of files very evenly across a cluster of
|
||||
Machi clusters (hereafter called a "cluster of clusters" or "CoC").
|
||||
|
||||
* Assumptions
|
||||
|
||||
** Basic familiarity with Machi high level design and Machi's "projection"
|
||||
|
||||
The [[https://github.com/basho/machi/blob/master/doc/high-level-machi.pdf][Machi high level design document]] contains all of the basic
|
||||
background assumed by the rest of this document.
|
||||
|
||||
** Familiarity with the Machi cluster-of-clusters/CoC concept
|
||||
|
||||
This isn't yet well-defined (April 2015). However, it's clear from
|
||||
the [[https://github.com/basho/machi/blob/master/doc/high-level-machi.pdf][Machi high level design document]] that Machi alone does not support
|
||||
any kind of file partitioning/distribution/sharding across multiple
|
||||
machines. There must be another layer above a Machi cluster to
|
||||
provide such partitioning services.
|
||||
|
||||
The name "cluster of clusters" orignated within Basho to avoid
|
||||
conflicting use of the word "cluster". A Machi cluster is usually
|
||||
synonymous with a single Chain Replication chain and a single set of
|
||||
machines (e.g. 2-5 machines). However, in the not-so-far future, we
|
||||
expect much more complicated patterns of Chain Replication to be used
|
||||
in real-world deployments.
|
||||
|
||||
"Cluster of clusters" is clunky and long, but we haven't found a good
|
||||
substitute yet. If you have a good suggestion, please contact us!
|
||||
^_^
|
||||
|
||||
Using the [[https://github.com/basho/machi/tree/master/prototype/demo-day-hack][cluster-of-clusters quick-and-dirty prototype]] as an
|
||||
architecture sketch, let's now assume that we have N independent Machi
|
||||
clusters. We wish to provide partitioned/distributed file storage
|
||||
across all N clusters. We call the entire collection of N Machi
|
||||
clusters a "cluster of clusters", or abbreviated "CoC".
|
||||
|
||||
** Analogy: "neighborhood : city :: Machi :: cluster-of-clusters"
|
||||
|
||||
Analogy: The word "machi" in Japanese means small town or
|
||||
neighborhood. As the Tokyo Metropolitan Area is built from many
|
||||
machis and smaller cities, therefore a big, partitioned file store can
|
||||
be built out of many small Machi clusters.
|
||||
|
||||
** The reader is familiar with the random slicing technique
|
||||
|
||||
I'd done something very-very-nearly-identical for the Hibari database
|
||||
6 years ago. But the Hibari technique was based on stuff I did at
|
||||
Sendmail, Inc, so it felt old news to me. {shrug}
|
||||
|
||||
The Hibari documentation has a brief photo illustration of how random
|
||||
slicing works, see [[http://hibari.github.io/hibari-doc/hibari-sysadmin-guide.en.html#chain-migration][Hibari Sysadmin Guide, chain migration]]
|
||||
|
||||
For a comprehensive description, please see these two papers:
|
||||
|
||||
#BEGIN_QUOTE
|
||||
Reliable and Randomized Data Distribution Strategies for Large Scale Storage Systems
|
||||
Alberto Miranda et al.
|
||||
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.226.5609
|
||||
(short version, HIPC'11)
|
||||
|
||||
Random Slicing: Efficient and Scalable Data Placement for Large-Scale
|
||||
Storage Systems
|
||||
Alberto Miranda et al.
|
||||
DOI: http://dx.doi.org/10.1145/2632230 (long version, ACM Transactions
|
||||
on Storage, Vol. 10, No. 3, Article 9, 2014)
|
||||
#END_QUOTE
|
||||
|
||||
** We use random slicing to map CoC file names -> Machi cluster ID/name
|
||||
|
||||
We will use a single random slicing map. This map (called "Map" in
|
||||
the descriptions below), together with the random slicing hash
|
||||
function (called "rs_hash()" below), will be used to map:
|
||||
|
||||
#+BEGIN_QUOTE
|
||||
CoC client-visible file name -> Machi cluster ID/name/thingie
|
||||
#+END_QUOTE
|
||||
|
||||
** Machi cluster ID/name management: TBD, but, really, should be simple
|
||||
|
||||
The mapping from:
|
||||
|
||||
#+BEGIN_QUOTE
|
||||
Machi CoC member ID/name/thingie -> ???
|
||||
#+END_QUOTE
|
||||
|
||||
... remains To Be Determined. But, really, this is going to be pretty
|
||||
simple. The ID/name/thingie will probably be a human-friendly,
|
||||
printable ASCII string, and the "???" will probably be a single Machi
|
||||
cluster projection data structure.
|
||||
|
||||
The Machi projection is enough information to contact any member of
|
||||
that cluster and, if necessary, request the most up-to-date projection
|
||||
information required to use that cluster.
|
||||
|
||||
It's likely that the projection given by this map will be out-of-date,
|
||||
so the client must be ready to use the standard Machi procedure to
|
||||
request the cluster's current projection, in any case.
|
||||
|
||||
* Goo
|
||||
|
||||
[[./migration-3to4.png]]
|
||||
|
||||
* Acknowledgements
|
||||
|
||||
The source for the "migration-3to4.png" image is from the [[http://hibari.github.io/hibari-doc/images/migration-3to4.png][HibariDB
|
||||
documentation]].
|
||||
|
||||
|
Loading…
Reference in a new issue