Added README.
This commit is contained in:
parent
21deb75622
commit
e9c48d362e
1 changed files with 275 additions and 0 deletions
275
README.md
Normal file
275
README.md
Normal file
|
@ -0,0 +1,275 @@
|
|||
Quorums
|
||||
=======
|
||||
|
||||
# Tutorial
|
||||
Given a set of nodes `X`, a _read-write quorum system_ is a pair `(R, W)` where
|
||||
`R` is a set of subsets of `X` called _read quorums_ and `W` is a set of
|
||||
subsets of `X` called _write quorums_. A read-write quorum system satisfies the
|
||||
property that every read quorum intersects every write quorum. This library
|
||||
allows us to construct and analyze arbitrary read-write quorum systems. First,
|
||||
we import the library.
|
||||
|
||||
```python
|
||||
from quorums import *
|
||||
```
|
||||
|
||||
Next, we specify the nodes in our quorum system. Our nodes can be strings,
|
||||
integers, IP addresses, anything!
|
||||
|
||||
```python
|
||||
a = Node('a')
|
||||
b = Node('b')
|
||||
c = Node('c')
|
||||
d = Node('d')
|
||||
e = Node('e')
|
||||
f = Node('f')
|
||||
```
|
||||
|
||||
Here, we construct a two by three grid of nodes. Every row is read quorum, and
|
||||
one element from every row is a write quorum. Note that when we construct a
|
||||
quorum system, we only have to specify the set of read quorums. The library
|
||||
figures out the optimal set of write quorums automatically.
|
||||
|
||||
```python
|
||||
grid = QuorumSystem(reads=a*b*c + d*e*f)
|
||||
```
|
||||
|
||||
This prints `{'a', 'b', 'c'}` and `{'d', 'e', 'f'}`.
|
||||
|
||||
```python
|
||||
for r in grid.read_quorums():
|
||||
print(r)
|
||||
```
|
||||
|
||||
This prints `{'a', 'd'}`, `{'a', 'e'}`, `{'b', 'f'}`, `{'b', 'd'}`, ...
|
||||
|
||||
```python
|
||||
for w in grid.write_quorums():
|
||||
print(w)
|
||||
```
|
||||
|
||||
Alternatively, we could specify the write quorums...
|
||||
|
||||
```python
|
||||
QuorumSystem(writes=(a + b + c) * (d + e + f))
|
||||
```
|
||||
|
||||
or both the read and write quorums.
|
||||
|
||||
```python
|
||||
QuorumSystem(reads=a*b*c + d*e*f, writes=(a + b + c) * (d + e + f))
|
||||
```
|
||||
|
||||
We can check whether a given set is a read or write quorum. Note that any
|
||||
superset of a quorum is also considered a quorum.
|
||||
|
||||
```python
|
||||
grid.is_read_quorum({'a', 'b', 'c'}) # True
|
||||
grid.is_read_quorum({'a', 'b', 'c', 'd'}) # True
|
||||
grid.is_read_quorum({'a', 'b', 'd'}) # False
|
||||
|
||||
grid.is_write_quorum({'a', 'd'}) # True
|
||||
grid.is_write_quorum({'a', 'd', 'd'}) # True
|
||||
grid.is_write_quorum({'a', 'b'}) # False
|
||||
```
|
||||
|
||||
The read resilience of our quorum system is the largest number `f` such that
|
||||
despite the failure of any `f` nodes, we still have at least one read quorum.
|
||||
Write resilience is defined similarly, and resilience is the minimum of read
|
||||
and write resilience.
|
||||
|
||||
```python
|
||||
grid.read_resilience() # 1
|
||||
grid.write_resilience() # 2
|
||||
grid.resilience() # 1
|
||||
```
|
||||
|
||||
A _strategy_ is a discrete probability distribution over the set of read and
|
||||
write quorums. A strategy gives us a way to pick quorums at random. The load of
|
||||
a node is the probability that the node is selected by the strategy, and the
|
||||
load of a strategy is the load of the most heavily loaded node. Using the
|
||||
`strategy` method, we get a load-optimal strategy, i.e. the strategy with the
|
||||
lowest possible load.
|
||||
|
||||
Typically in a distributed system, a read quorum of nodes is contacted to
|
||||
perform a read, and a write quorum of nodes is contacted to perform a write.
|
||||
Though we get to pick a strategy, we don't get to pick the fraction of
|
||||
operations that are reads and the fraction of operations that are writes. This
|
||||
is determined by the workload. When constructing a strategy, we have to specify
|
||||
the workload. The returned strategy is optimal only against this workload.
|
||||
Here, we construct a strategy assuming that 75% of all operations are reads.
|
||||
|
||||
```python
|
||||
strategy = grid.strategy(read_fraction=0.75)
|
||||
```
|
||||
|
||||
We can use the strategy to sample read and write quorums.
|
||||
|
||||
```python
|
||||
print(strategy.get_read_quorum())
|
||||
print(strategy.get_read_quorum())
|
||||
print(strategy.get_read_quorum())
|
||||
print(strategy.get_write_quorum())
|
||||
print(strategy.get_write_quorum())
|
||||
print(strategy.get_write_quorum())
|
||||
```
|
||||
|
||||
We can query the strategy's load.
|
||||
|
||||
```python
|
||||
strategy.load(read_fraction=0.75) # 0.458
|
||||
```
|
||||
|
||||
We can query the strategy's load on other workloads as well, though the
|
||||
strategy may not be optimal.
|
||||
|
||||
```python
|
||||
strategy.load(read_fraction=0) # 0.333
|
||||
strategy.load(read_fraction=0.5) # 0.416
|
||||
strategy.load(read_fraction=1) # 0.5
|
||||
```
|
||||
|
||||
This is a shorthand for
|
||||
`grid.strategy(read_fraction=0.25).load(read_fraction=0.25)`.
|
||||
|
||||
```python
|
||||
grid.load(read_fraction=0.25) # 0.375
|
||||
```
|
||||
|
||||
In the real world, we don't often have a fixed workload. Workloads change
|
||||
over time. Instead of specifying a fixed read fraction, we can provide a
|
||||
discrete probability distribution of read fractions. Here, we say that the
|
||||
read fraction is 10% half the time and 75% half the time. `strategy` will
|
||||
return the strategy that minimizes the expected load according to this
|
||||
distribution.
|
||||
|
||||
```python
|
||||
distribution = {0.1: 0.5, 0.75: 0.5}
|
||||
strategy = grid.strategy(read_fraction=distribution)
|
||||
strategy.load(read_fraction=distribution) # 0.404
|
||||
```
|
||||
|
||||
We can also specify the write fraction instead of the read fraction, if we
|
||||
prefer.
|
||||
|
||||
```python
|
||||
strategy = grid.strategy(write_fraction=0.75)
|
||||
strategy.load(write_fraction=distribution)
|
||||
```
|
||||
|
||||
In the real world, not all nodes are equal. We often run distributed systems on
|
||||
heterogenous hardware, so some nodes might be faster than others. To model
|
||||
this, we instatiate every node with its capacity. Here, nodes a, c, and e can
|
||||
process 1000 commands per second, while nodes b, d, and f can only process 500
|
||||
requests per second.
|
||||
|
||||
```python
|
||||
a = Node('a', capacity=1000)
|
||||
b = Node('b', capacity=500)
|
||||
c = Node('c', capacity=1000)
|
||||
d = Node('d', capacity=500)
|
||||
e = Node('e', capacity=1000)
|
||||
f = Node('f', capacity=500)
|
||||
```
|
||||
|
||||
Now, load can be interpreted as the inverse of the peak throughput of the
|
||||
quorum system. We can also call `capacity` to get this inverse directly.
|
||||
Here, our quorum system is capable of processing 1333 commands per second for
|
||||
a workload of 75% reads.
|
||||
|
||||
```python
|
||||
grid = QuorumSystem(reads=a*b*c + d*e*f)
|
||||
strategy = grid.strategy(read_fraction=0.75)
|
||||
strategy.load(read_fraction=0.75) # 0.00075
|
||||
strategy.capacity(read_fraction=0.75) # 1333
|
||||
```
|
||||
|
||||
Nodes might also process reads and writes at different speeds. We can specify
|
||||
the peak read and write throughput of every node separately. Here, we assume
|
||||
reads are ten times as fast as reads.
|
||||
|
||||
```python
|
||||
a = Node('a', write_capacity=1000, read_capacity=10000)
|
||||
b = Node('b', write_capacity=500, read_capacity=5000)
|
||||
c = Node('c', write_capacity=1000, read_capacity=10000)
|
||||
d = Node('d', write_capacity=500, read_capacity=5000)
|
||||
e = Node('e', write_capacity=1000, read_capacity=10000)
|
||||
f = Node('f', write_capacity=500, read_capacity=5000)
|
||||
```
|
||||
|
||||
With 100% reads, our quorum system can process 10,000 commands per second.
|
||||
This throughput decreases as we increase the fraction of writes.
|
||||
|
||||
```python
|
||||
grid = QuorumSystem(reads=a*b*c + d*e*f)
|
||||
grid.capacity(read_fraction=1) # 10,000
|
||||
grid.capacity(read_fraction=0.5) # 3913
|
||||
grid.capacity(read_fraction=0) # 2000
|
||||
```
|
||||
|
||||
Another real world complication is the fact that machines sometimes fail and
|
||||
are sometimes slow. If we contact a quorum of nodes, some of them may fail, and
|
||||
we'll get stuck waiting to hear back from them. Or, some of them may be
|
||||
stragglers, and we'll wait longer than we'd like. We can address this problem
|
||||
by contacting more than the bare minimum number of nodes.
|
||||
|
||||
Formally, we say a read quorum (or write quorum) q is _f-resilient_ if despite
|
||||
the failure of any f nodes, q still forms a read quorum (or write quorum). A
|
||||
strategy is f-resilient if it only selects f-resilient quorums. By default, we
|
||||
`strategy` returns 0-resilient quorums. We can pass in the `f` argument to get
|
||||
more resilient strategies.
|
||||
|
||||
```python
|
||||
strategy = grid.strategy(read_fraction=0.5, f=1)
|
||||
```
|
||||
|
||||
These sets are quorums even if 1 machine fails.
|
||||
|
||||
```python
|
||||
strategy.get_read_quorum()
|
||||
strategy.get_write_quorum()
|
||||
```
|
||||
|
||||
Putting everything together, we can use this library to pick quorum systems
|
||||
that are well suited to our workload. For example, say we're implementing a
|
||||
distributed file system and want to pick a 5 node quorum system with a
|
||||
resilience of 1 that has a good load on workloads that are 90% reads 90% of the
|
||||
time and 10% reads 10% of the time. We can try out three quorum systems: a
|
||||
simple majority quorum system, a crumbling walls quorum system, and a paths
|
||||
quorum system.
|
||||
|
||||
```python
|
||||
simple_majority = QuorumSystem(reads=majority([a, b, c, d, e]))
|
||||
crumbling_walls = QuorumSystem(reads=a*b + c*d*e)
|
||||
paths = QuorumSystem(reads=a*b + a*c*e + d*e + d*c*b)
|
||||
```
|
||||
|
||||
We make sure we have the desired resilience.
|
||||
|
||||
```python
|
||||
assert(simple_majority.resilience() >= 1)
|
||||
assert(crumbling_walls.resilience() >= 1)
|
||||
assert(paths.resilience() >= 1)
|
||||
```
|
||||
|
||||
We check the loads and see that the crumbling walls quorum system has the
|
||||
highest load, so we use the crumbling walls quorum system to implement our file
|
||||
system.
|
||||
|
||||
```python
|
||||
distribution = {0.9: 0.9, 0.1: 0.1}
|
||||
simple_majority.capacity(read_fraction=distribution) # 5089
|
||||
crumbling_walls.capacity(read_fraction=distribution) # 6824
|
||||
paths.capacity(read_fraction=distribution) # 5725
|
||||
```
|
||||
|
||||
Maybe some time later, we've experience high latency because of stragglers and
|
||||
want to switch to a 1-resilient strategy. We again compute the loads, but now
|
||||
see that the simple majority quorum system has the highest load, so we switch
|
||||
from the crumbling walls quorum system to the simple majority quorum system.
|
||||
|
||||
```python
|
||||
simple_majority.capacity(read_fraction=distribution, f=1) # 3816
|
||||
crumbling_walls.capacity(read_fraction=distribution, f=1) # 1908
|
||||
paths.capacity(read_fraction=distribution, f=1) # 1908
|
||||
```
|
Loading…
Reference in a new issue