mentat

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This document is very much a WIP/brain dump/unformed. Read with caution.

Replication from a single master alone is relatively straightforward:

https://github.com/RallySoftware/datomic-replication

On top of this we would need snapshotting and excision support.

Replication between two masters — two clients who can both write while partitioned — is less straightforward. Each will be generating their own transaction log, and each must collaborate to resolve their two (potentially conflicting) streams into one current state. This is a distributed multi-master system; answers will be found in the literature for distributed version control systems and multi-master replication for traditional databases.

In general, the solution space is broadened to the extent that we are willing to rewrite history, limit replication of obsolete history, and restrict schema changes.

We consider, for simplicity, a shared linearized transaction log, and two clients, local and remote. A peer-to-peer alternative is, at first glance, equivalent; the shared log is implicit in that case.

An alternative formulation is for each client to have their own transaction log, replicated to other devices. Each client is then responsible for taking changes on each stream and producing a current state.

If all data is event-shaped — that is, all assertions are for cardinality-many attributes, or aren't shared between clients — then syncing is trivial: simply take the union of all data. The more separable the data we store, the simpler our lives get: concurrent writes become non-concurrent writes.

Unfortunately, schema assertions themselves are not event-shaped. And we're likely to have cardinality-one shared data (e.g., the title of a page). So conflict resolution/log reordering is necessary.

A partial list of complications:

Retractions. What happens if the local client has already transacted identical datoms — do those get retracted, too? What happens if the incoming retraction will lead to an inconsistency (e.g., a local transaction that refers to a retracted component entity)? Even if we disallow 'template' retractions across client boundaries, it's still possible for a conflict to occur — locally asserting an attribute for a remotely retracted entity, for example.
Excisions. Do we excise matching local datoms that were created after the remote excision?
tx IDs are allocated by each client, and will overlap. These must be remapped.
Conflicting data changes: similar to a local upsert conflict, it’s possible for remote and local transactions to be structurally inconsistent when taken together.
Conflicting schema changes: additions or alterations of attributes on two clients since their last synchronization point.
Unlike commits in a DVCS, Datomish transactions have only a timestamp and a local ID. There's no parent commit, so ordering is implicit, and there's no global content-addressible identifier.

Two clients generate data according to their local schema, then begin synchronizing. Whichever client gets there first establishes the initial data set. The other client resolves the remote transaction sequence. The remote transactions might have txInstants earlier or later than the local schema (and those instants are from different clocks!), and might refer to an outdated or newer schema. This schema might not be able to represent local changes at all — even retracting a used attribute entirely.

One cannot necessarily simply replay the remote transaction log from 0 to now without hitting inconsistencies, but flattening the log loses valuable ordering context.

So, we want to:

Preserve as many transactions as possible (in the general case), even if we don’t replicate them to a remote. Preserving transactions is the only way to avoid divergence.
Preserve as much original transaction metadata — e.g., ordering — as possible.
Avoid conflicts.
Avoid reverting remote schema upgrades.
Avoid reverting local schema upgrades (perhaps by refusing to sync).

This problem might be solved with one of these tools and techniques:

We can describe and detect schemas separately from the transaction stream. Two clients can explicitly recognize when one is outdated, and we can detect outdated-yet-congruent states (like vN+1 adding a new attribute but otherwise not changing vN) to reduce the scope of conflicts. This doesn’t address conflicting assertions at all.
We could anchor/cross-link transaction log sections to schema versions, regardless of timestamps, resulting in compatible ‘eras’. Naturally this doesn’t solve the problem for cross-era multi-fragment transactions, but it might solve the problem for common cases, and it allows for some amount of congruent history preservation.
A flattening transform might exist for transaction logs that result in two compatible current states: essentially re-synthesizing a new transaction log from the current space of datoms, which will naturally refer to only current (shared) vocabulary. This is not a history-preserving operation. CouchDB’s multi-master replication does this: only the most recent version of a document is uploaded.
We might consider adding some kind of parent pointer to manage interleaving.

It’s worth acknowledging that a first sync is potentially a non-history-preserving operation for the new local client. It’s rebasing its own transaction log on top of (and within) the remote transaction log. That might give us a key: if we’re willing to flatten-and-rebase, then we reduce (or eliminate, if we’re willing to automate) the potential for conflict. So the algorithm might be:

Attempt to consider the local and remote transaction logs together. If they are non-overlapping, then we're done: push up the local changes.
Otherwise, elide datoms from the local transaction log that refer to non-current attributes — flatten and rebase. They no longer contribute to the current state, but can introduce conflicts.
Take the remote transaction log.
Append each new synthesized local transaction to yield the merged state.
Push the new transaction log additions to the remote, then advance the local synchronization point.

Incremental syncs would then rebase-and-apply the new local transaction stream on top of new transactions pulled since the last synchronization point.

This is very similar to how a DVCS does a pull-and-rebase.

If an incompatible schema is detected — in either direction — we can pause syncing until the older client updates, or we can push out schema changes and lock them out of their own database.