Representing transformation events and timelines #200

New issue

Open

opened 2020-08-06 16:56:33 +00:00 by gburd · 0 comments

gburd commented

2020-08-06 16:56:33 +00:00

(Migrated from github.com)

A synchronized set of Mentat instances collaborate to build a linear timeline of states. They do so by rebasing and/or merging their changes into the remote primary timeline.

Each instance is able to perform a set of operations — assertion, retraction, excision, and schema addition/alteration — that introduce coordination points in this timeline.

Before we discuss those, it's worth mentioning that there are other operations that only affect whether clients can proceed:

Base format changes (#587) — e.g., the introduction of a new type — can lock out clients; older clients won't be able to interpret the wire format.
Non-backwards-compatible schema changes should prevent a remote timeline from being merged locally until the client upgrades; doing so would prevent the local client from operating on its own data.

Most of a client's operations are point-in-time:

Retraction (and change, which is retraction + assertion) represents an ordered state change as a datom. There is a conceptual division here between the state before and after the change is transacted; the retraction is only meaningful if it happens-after the assertion of the datom to be retracted. This is not true for assertion; Mentat transactions that consist only of assertions can be applied in any order.
Schema additions and changes similarly happen at a point on the timeline; other assertions can't use the vocabulary until it has been asserted, and some states on the timeline are invalid when viewed through the lens of a later altered vocabulary — e.g., altering the cardinality of an attribute.

The presence of one of these operations in a branch of history is important when considering merges and rebases. We must make sure that retractions and assertions are sequenced, and ensure that schema changes occur before data changes that depend on them.

One operation — excision (#21) — affects an existing region of timeline. An excision marker is placed in the local timeline and implies the removal of data from the device's local log.

Given that we want all clients to have the same materialized state after reaching the same point in the transaction log, it's crucial that excision processing is reproducible.

For a single client system, this is relatively simple: there is only one timeline, and all writes are fast-forward.

For a multi-client system, where a timeline might diverge and merge back together:

we must define which of the three timeline segments — shared history, new-left, new-right — are impacted by an excision on left or right. This definition must produce the same outcome when applied by either client.

Let's look at an example.

Imagine that two clients, A and B, both know about a URL in a browser's history, http://example.com/. This has ID 123. At point T1 it's linked to three visits: 200, 201, 202. Visits are component entities.

Now A and B diverge. A adds a visit: 300. B excises all visits for 123.

This is a classic syncing problem: what happens after a merge?

Usually one of the following occurs:

If A reaches the server first, it adds the visit. B detects a conflict and undoes its local deletion. This surprises the user.
… or B deletes all four visits. Sometimes it'll do this even if A's visit was later than B's excision, perhaps even months later, depending on when A and B sync. This surprises the user.
If B reaches the server first, it drops the first three visits. Depending on format, A will then reupload all four visits (which surprises the user), or just the new one (300).

Various approaches are used to try to make some of these operations durable, such as recording tombstones. That's really tricky.

One of the reasons it's tricky is that it's not clear what the deletion means, because the excision operation (in Firefox Sync, at least) isn't pinned to a point on a timeline — it floats at an instant in time or an order of interaction with the server, and that is very woolly indeed.

The most obvious meaning for excision is that it applies along the current parent 'route' back to the origin. B's excision doesn't apply to A's new data. If B merges first, its excision is already recorded when A comes to merge or rebase. If A merges first, B knows how to rewrite its excision to apply only to earlier data.

(Indeed, B might well automatically record the excision only for the merged data — {:db/excise 123, :db.excise/beforeT <last parent>} — and just directly drop non-merged excised datoms as if they had never been written. This is reminiscent of Mercurial's phases.)

This alone isn't enough. If A transacted like this:

{:page/url "http://example.com/"
 :page/visit {:visit/at 1234567890123}}

the actual datoms recorded would be:

[:db/add 123 :page/visit 300 tx]
[:db/add 300 :visit/at 1234567890123 tx]
[:db/add tx :db/txInstant 1234567999999]

B excised everything about 123, and here we wouldn't have reasserted the URL, and so the URL would be excised, leaving 300 as a visit to an unknown page. A process of reintroduction (or storing redundant data) might be necessary, or perhaps that behavior is actually desirable; it depends on how the data is modeled and how specific the retraction is.

A synchronized set of Mentat instances collaborate to build a linear timeline of states. They do so by rebasing and/or merging their changes into the remote primary timeline. Each instance is able to perform a set of operations — assertion, retraction, excision, and schema addition/alteration — that introduce coordination points in this timeline. Before we discuss those, it's worth mentioning that there are other operations that only affect whether clients can proceed: 1. Base format changes (#587) — _e.g._, the introduction of a new type — can lock out clients; older clients won't be able to interpret the wire format. 2. Non-backwards-compatible schema changes should prevent a remote timeline from being merged locally until the client upgrades; doing so would prevent the local client from operating on its own data. Most of a client's operations are point-in-time: 1. Retraction (and change, which is retraction + assertion) represents an ordered state change as a datom. There is a conceptual division here between the state before and after the change is transacted; the retraction is only meaningful if it happens-after the assertion of the datom to be retracted. This is not true for assertion; Mentat transactions that consist only of assertions can be applied in any order. 2. Schema additions and changes similarly happen at a point on the timeline; other assertions can't use the vocabulary until it has been asserted, and some states on the timeline are invalid when viewed through the lens of a later altered vocabulary — _e.g._, altering the cardinality of an attribute. The presence of one of these operations in a branch of history is important when considering merges and rebases. We must make sure that retractions and assertions are sequenced, and ensure that schema changes occur before data changes that depend on them. One operation — excision (#21) — affects an _existing region_ of timeline. An excision marker is placed in the local timeline and implies the removal of data from the device's local log. Given that we want all clients to have the same materialized state after reaching the same point in the transaction log, it's crucial that excision processing is reproducible. For a single client system, this is relatively simple: there is only one timeline, and all writes are fast-forward. For a multi-client system, where a timeline might diverge and merge back together: ![timeline-diverge](https://user-images.githubusercontent.com/91722/37786455-a963f9e2-2df4-11e8-847c-386f30bd169f.png) we must define which of the three timeline segments — shared history, new-left, new-right — are impacted by an excision on left or right. This definition must produce the same outcome when applied by either client. Let's look at an example. Imagine that two clients, `A` and `B`, both know about a URL in a browser's history, `http://example.com/`. This has ID `123`. At point `T1` it's linked to three visits: `200`, `201`, `202`. Visits are component entities. Now `A` and `B` diverge. `A` adds a visit: `300`. `B` excises all visits for `123`. This is a classic syncing problem: what happens after a merge? Usually one of the following occurs: - If `A` reaches the server first, it adds the visit. `B` detects a conflict and undoes its local deletion. This surprises the user. - … or `B` deletes all four visits. Sometimes it'll do this _even if `A`'s visit was later than `B`'s excision_, perhaps even months later, depending on when `A` and `B` sync. This surprises the user. - If `B` reaches the server first, it drops the first three visits. Depending on format, `A` will then reupload all four visits (which surprises the user), or just the new one (`300`). Various approaches are used to try to make some of these operations durable, such as recording tombstones. That's [really tricky](https://bugzilla.mozilla.org/show_bug.cgi?id=578694). One of the reasons it's tricky is that it's not clear what the deletion **means**, because the excision operation (in Firefox Sync, at least) isn't pinned to a point on a timeline — it floats at an instant in time or an order of interaction with the server, and that is very woolly indeed. The most obvious meaning for excision is that it applies along the current parent 'route' back to the origin. `B`'s excision doesn't apply to `A`'s new data. If `B` merges first, its excision is already recorded when `A` comes to merge or rebase. If `A` merges first, `B` knows how to rewrite its excision to apply only to earlier data. (Indeed, `B` might well automatically record the excision only for the merged data — `{:db/excise 123, :db.excise/beforeT <last parent>}` — and just directly drop non-merged excised datoms as if they had never been written. This is reminiscent of Mercurial's phases.) This alone isn't enough. If `A` transacted like this: ``` {:page/url "http://example.com/" :page/visit {:visit/at 1234567890123}} ``` the actual datoms recorded would be: ``` [:db/add 123 :page/visit 300 tx] [:db/add 300 :visit/at 1234567890123 tx] [:db/add tx :db/txInstant 1234567999999] ``` `B` excised _everything_ about `123`, and here we wouldn't have reasserted the URL, and so the URL would be excised, leaving `300` as a visit to an unknown page. A process of _reintroduction_ (or storing redundant data) might be necessary, or perhaps that behavior is actually desirable; it depends on how the data is modeled and how specific the retraction is.