Witness servers and CP mode clarification needed in design document #4
Labels
No labels
bug
duplicate
enhancement
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: greg/machi#4
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The role of witness servers in Section 11 of the chain manager document should be clarified.
From my initial reading, it seems that the technique of only accepting writes on the majority side of the partition can be presented completely with only "real" servers. "Witness" servers appear to be an optimization that allows continued operation in CP mode when only a minority of real servers are available, which I believe should be presented separately. Am I missing something obvious?
Only mentioned briefly and not presented clear enough is the reasoning for why witness servers should be placed at the front of the chain. Why is this required?
Finally, in Figure 3, given the reasoning in the document, it appears that the following:
will continue to accept writes on the majority partition side, given the presence of two witnesses and one real node. However, given the witnesses store no data, only metadata regarding the current projection, a single failure of S_1 before the partition heals results in data loss.
Shouldn't you require that a majority of the majority side of the partition be real servers for durability in CP mode? (instead of the requirement for only 1, which feels like it should be the invariant for AP mode, not CP mode)
Hi, Chris. Many thanks for putting this part of the doc under a microscope. ^_^ I'll try the questions out-of-order.
[SLF: Chris is referring to the doc at https://github.com/basho/machi/blob/master/doc/high-level-chain-mgr.pdf ]
Upon reflection after writing the "witnesses at the front of the chain" text a while ago, I don't believe that it's a hard requirement. However, it makes dealing with the implementation easier. For example, it isn't necessary to maintain strict ordering between epochs of witness servers in the UPI part of the chain (compared to how correctness can be broken if you reorder real servers in the UPI part of the chain). The current CP mode chain manager code feels a bit easier to work with, knowing that the witnesses are all in front. But perhaps that's ex post facto logic at work, I dunno.
If a CP mode chain manager tried to arrange the minority partition as:
... the manager won't because (by definition) a minority length chain isn't sufficient. So S_0 and S_2 will wedge themselves and wait until they can talk to somebody else.
The other partition:
... will be able to function because the chain is long enough (majority) and contains at least one real server, S_1. If we are in this situation, then we're already beyond the point of being able to operate in a 3 or 2 "real" server situation. We're lucky (and presumably happy) that we can operate at all -- the alternative is to be unavailable. Yes, we will be unavailable, as you point out, if S_1 fails before the partition heals. "Data loss" perhaps needs more specific definition: temporary unavailability (if S_1 returns to service eventually) or permanent unavailability (S_1 never returns to service with its data intact).
Hrm, I feel that the best answer is "it depends". If you want to operate a strongly consistent cluster that can tolerate 2 failures and not lose data permanently, then a scheme of 2 witnesses + 3 real servers is sufficient. In the case (above) of the two failures being S_0 and S_2, yeah, you're flirting with unavailability (temporary or permanent) if you have a third failure ... but if you wanted to tolerate a 3rd failure with same consistency & availability, then you ought to be running with 3 witnesses + 4 real servers instead.
If it's helpful, here's a message sequence diagram for the CORFU-style pattern of chain replication that allows strong consistency updates to the chain despite having only a single real server. If there's an epoch change underway, our two witnesses help form the majority quorum that will guarantee that at least 1 of the 3 will observe the change and thus send a negative response to the client.
Oops, I don't think I wrote about the first, question, sorry.
Hm, my memory of that section is dim, I'll have to go back and read it. If it's a chocolate + peanut butter description that would best be separate chocolate from the p.b., then I'll definitely consider separating them.
Oops, there was a typo, back two comments ago. I've edited that comment to fix it. The new text is:
So, I think we should work on rewriting this section and discuss CP mode without witnesses and then expand it to witnesses in a second section.
Proposal:
Also, add motivation details & eval details for AP: causal vs. a some other looser eventual consistency flavor. Ditto for CP: linearizable vs. the looser sequential.
Also: clarify the atomicity guarantees/lack-thereof of how appends & writes work, without and with the "extra bytes" option to the append op. (SLF: though this is a bit trickier, because the semantics of the Machi file service don't really belong in the chain manager docs. Hrrrm, needs more thought....)