0 Thoughts: what is a sync, really?
Richard Newman edited this page 2017-05-05 14:54:03 +00:00
This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Syncing consists of the following operations, typically in this order:

  • Replication in of remote state.
  • Deduction of equivalences.
  • Resolution of conflicts and rewriting or eliding local changes wrt equivalences and new remote state.
  • Optional flattening of resolved local changes à la git rebase.
  • Replication out of local state.

Original implementations of Firefox Sync conflated all of these stages:

  • Records are downloaded one-by-one.
  • Equivalences are calculated on the fly, in memory.
  • Conflicts are immediately resolved by overwriting local state in place or discarding incoming records.
  • Replication out is blind copying of post-resolution local state.

This is flawed in lots of ways, which are well documented in bugs.

(Being designed to support the original clients, the Firefox Sync server itself doesn't preserve the prior state of a record, or indeed exchange changes at all — after a client has resolved changes to some entity, the previous server state is entirely replaced. This makes backup/restore, cross-record consistency, tracking down bugs, data recovery, and less simplistic conflict resolution and detection nearly impossible.)

Improved versions of Sync began to tease these client-side stages apart:

  • Two-phase application — buffering downloaded records before applying them en mass — partly and temporarily separated out replication, which offers more options for detection and resolution of conflicts and inconsistencies.
  • Structural/transactional record application on iOS separated out equivalence and resolution (albeit only within the scope of a sync, with no permanent record of those states.)

There's another area in which Firefox Sync conflates some different things.

Firefox Sync doesn't keep any kind of historical record. To a system that does, there is a difference between:

  • Establishing a new copy of an existing history (git clone, git checkout).
  • Reconciling incoming changes to a linear shared history against local interim changes (git rebase).
  • Combining two separate linear histories into a shared state with an ongoing linear shared history (git merge).

In Sync's case, the first is simply a local wipe followed by a sync. The second is a two-way merge, and so is the third: the only outcome of either is to update local and remote state to match the outcome of the merge.

Mentat offers the opportunity to separate these.

One approach

  • Replicate down remote datoms, storing them whole.¹
  • Derive equivalences and new facts by comparing local and remote data. Some of this can be explicit in the datom stream (e.g., by describing new entities via lookup refs), and some can be schema-derived (cardinality constraints).
  • Store those equivalences. These are part of the data model: theyll be used when merging datom streams, and are necessary for examining history. If two systems can both operate offline, then one of a dictionary or a history rewriting mechanism is necessary to merge data. Rewriting history is expensive, so…
  • Detect and resolve conflicts. This is relatively easy compared to Sync: the full history of changes on both sides (modulo excision and history rewriting) is available, so the only conflicts will be real conflicts. Ideally all of these conflicts will be schema-encoded: two cardinality-one assertions for the same entity, for example. Some can be domain-level and detected by code: prior art around automatic conflict detection doesn't convince me that domain-level conflict resolution is redundant.
  • Store appropriate assertions and retractions to record resolved conflicts. We now have a concrete, permanent record of exactly what happened during a sync!
  • Commit the transaction and make it available for replication. Now other devices can also see exactly how we resolved conflicts.

Another option

  • In the case of a merge, store appropriate assertions and retractions to record resolved conflicts. We now have a concrete, permanent record of exactly what happened during a sync!
  • In the case of a rebase, update local transactions to be non-conflicting, and replay them on top of the new shared state.
  • Identity remapping is still necessary.

¹ The presence of cardinality and uniqueness constraints implies that this isnt direct storage; in an RDF/OWL world, this would be direct storage! However, if we squint at Mentats concept of a transaction — which, after all, includes states that would be invalid if the transaction were split in pieces — then we might be able to achieve this. Think about SQLites PRAGMA defer_foreign_keys.