More words.

Richard Newman 2017-04-02 17:15:50 -07:00
parent abaa70b294
commit 8809267746

@ -15,18 +15,20 @@ Original implementations of Firefox Sync conflated all of these stages:
This is flawed in lots of ways, which are well documented in bugs.
Improved versions of Sync began to tease these apart:
(Being designed to support the original clients, the Firefox Sync server itself doesn't preserve the prior state of a record, or indeed exchange changes at all — after a client has resolved changes to some entity, the previous server state is entirely replaced. This makes backup/restore, cross-record consistency, tracking down bugs, data recovery, and less simplistic conflict resolution and detection nearly impossible.)
- Two-phase application — buffering downloaded records — partly and temporarily separated out replication, which offers more options for detection and resolution of conflicts and inconsistencies.
Improved versions of Sync began to tease these client-side stages apart:
- Two-phase application — buffering downloaded records before applying them _en mass_ — partly and temporarily separated out replication, which offers more options for detection and resolution of conflicts and inconsistencies.
- Structural/transactional record application on iOS separated out equivalence and resolution (albeit only within the scope of a sync, with no permanent record of those states.)
Mentat offers the opportunity to truly separate these:
- Replicate down remote datoms, storing them whole.[1]
- Derive equivalences and new facts by comparing local and remote data. Some of this can be explicit in the datom stream (e.g., by defining new entities with lookup refs), and some can be schema-derived (cardinality constraints).
- Store those equivalences. These are part of the data: theyll be used when merging datom streams, and are necessary for examining history. If two systems can both operate offline, then one of a dictionary or a history rewriting mechanism is necessary to merge data. Rewriting history is expensive, so…
- Detect and resolve conflicts. This is _relatively_ easy compared to Sync: the full history of changes on both sides (modulo excision and history rewriting) is available, so the only conflicts will be _real_ conflicts. Ideally all of these conflicts will be schema-encoded: two cardinality-one assertions for the same entity, for example. Some can be domain-level and detected by code.
- Store assertions and retractions to resolve conflicts. We now have a concrete, permanent record of exactly what happened during a sync!
- Replicate down remote datoms, storing them whole.¹
- Derive equivalences and new facts by comparing local and remote data. Some of this can be explicit in the datom stream (_e.g._, by describing new entities via lookup refs), and some can be schema-derived (cardinality constraints).
- Store those equivalences. These are part of the data model: theyll be used when merging datom streams, and are necessary for examining history. If two systems can both operate offline, then one of a dictionary or a history rewriting mechanism is necessary to merge data. Rewriting history is expensive, so…
- Detect and resolve conflicts. This is _relatively_ easy compared to Sync: the full history of changes on both sides (modulo excision and history rewriting) is available, so the only conflicts will be _real_ conflicts. Ideally all of these conflicts will be schema-encoded: two cardinality-one assertions for the same entity, for example. Some can be domain-level and detected by code: prior art around automatic conflict detection doesn't convince me that domain-level conflict resolution is redundant.
- Store appropriate assertions and retractions to record resolved conflicts. We now have a concrete, permanent record of exactly what happened during a sync!
- Commit the transaction and make it available for replication. Now other devices can also see exactly how we resolved conflicts.
[1] The presence of cardinality and uniqueness constraints implies that this isnt direct storage; in an RDF/OWL world, this _would_ be direct storage! However, if we squint at Mentats concept of a transaction — which, after all, includes states that would be invalid if the transaction were split in pieces — then we might be able to achieve this. Think about SQLites `PRAGMA defer_foreign_keys`.
¹ The presence of cardinality and uniqueness constraints implies that this isnt direct storage; in an RDF/OWL world, this _would_ be direct storage! However, if we squint at Mentats concept of a transaction — which, after all, includes states that would be invalid if the transaction were split in pieces — then we might be able to achieve this. Think about SQLites `PRAGMA defer_foreign_keys`.