{error, written} handling in cr client when writing #39
Labels
No labels
bug
duplicate
enhancement
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: greg/machi#39
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Spin off from #33 .
Superficial reason of
bad_return_value
is reply tag as explained by quick fix [1].But it seems like the root cause is double write as other two errors found.
bad_return_value
was occurred in call fromdo_append_midtail2
todo_repair_chunk
inmachi_cr_client
. The current code triggersrepair by
{error, written}
[2] but FLU returns ok when "correct" chunkis written [3].
The quick fix [1] will hide conflict in data and probably returns false ok.
[1]
903e9395c8
[2] https://github.com/basho/machi/blob/master/src/machi_cr_client.erl#L434
[3] https://github.com/basho/machi/blob/master/src/machi_file_proxy.erl#L683-L696
I think
do_repair_chunks/10
should be used instead ofdo_repair_chunk
at903e939
. And it looks like right fix regardless of the exact issue (though it's a kind of type mismatch that must be detected by dialyzer).Yes, I tried
-Wspecdiffs
, dialyzer ouput many lines ;)But
{error, written}
indicates many subtle situation, which you could find inmachi_file_proxy:handle_write()
so maybe we should take deeper insight on this. I look like a patch for this issue (which is being tested now).For debugging purpose, we'd better have context information passed from cr_client to file_proxy that a write is whether in an append (writes are not supposed to overlap in midtail, and it warns that something is wrong except trim) or in a repair (it's like a force-overwrite).
Although we have to use checksum given by client here, not generating by ourselves.
What I want to confirm/settle down first is what
{error, written}
means.Current FLU server side implementation, it is used in response to append/write requests to indicate "already different bytes are written". CR client code seems to consider it as "write attempts to midtails are already dome with the bytes I want to write by someone else, e.g. repair". Is it better to use different error atom to indicate two cases?
👍
It may be the same for reverse direction: FLU -> client. For exapmle,
{error, {bad_epoch, FLUEpoch, RequestedEpoch}}
.Currently the file_proxy returns
ok
when a write request indicates exactly same data that a flu already has - that said,{error, written}
indicates something is wrong (a requested bytes to write is different from what I have in disk), where we shouldn't trigger repair. If{error, trimmed}
is returned then a repair should be triggered.