0 Proposal: application schema coordination and versioning
Richard Newman edited this page 2016-11-10 17:31:49 -08:00

Proposal: application schema coordination and versioning

This document briefly describes categories of applications and how they might coordinate vocabulary. It closes with a simple proposal for how to support such coordination.

An implementation of these ideas is at https://github.com/mozilla/datomish/pull/107.

Definitions

A datom store is a Datomish database. It consists of datoms, some of which describe vocabulary itself, and some of which use the vocabulary.

A collection of attributes — our vocabulary — is called a schema fragment. The collection of schema fragments in the datom store (including the built-in bootstrap vocabulary) constitute its schema.

An application is a piece of software that reads from or writes to a datom store. One example would be a Firefox add-on.

We expect that simple applications might use a single schema fragment, and more complicated applications might share some schema fragments. For example, a bookmark manager might use 'save' and 'page' fragments; a history tool might use a 'visit' fragment and the same 'page' fragment.

One application

In this case, no other application expects to be able to read from or write to the datom store.

You have three possible approaches to schema handling.

One is "head in the sand". If your vocabulary doesn't change, or only grows new attributes, then you can safely re-transact your schema fragments each time you open the datom store. This is likely to be fine during development, and perhaps even for longer periods if you're careful about your data modeling.

If your vocabulary changes, however, you will need a way to evolve it using the primitives available to you: altering attributes, renaming idents, and retracting and transacting schema fragments and data datoms.

The first step is to decide when to evolve the schema.

One practical approach to doing so is to track some kind of version identifier outside of the datom store. This is straightforward, but prone to error when the datom store and external version identifier don't change together (e.g., when a database file is restored from backup).

Another approach is to track the version identifier inside the datom store itself. This is more foolproof, but requires vocabulary for versioning. This is the approach taken by applications that use Conformity: schema fragments and migrations are named and included with the application, and the set of transacted fragments and migrations ("norms") is maintained in the store itself.

An application might also directly inspect the schema to come up with a migration plan, but this is likely to be error prone and inefficient.

Multiple applications with disjoint schema

These applications can entirely pretend that the other application doesn't exist, with the exception of patterns or expressions that either query schema datoms themselves, or match against wildcard patterns. For example, the following query will behave differently when another application begins writing fulltext-indexed datoms to the datom store:

[:find ?x :in $ :where [(fulltext $ :any "some text") [[?x]]]]

This should be viewed as a strength!

Multiple applications with read-only interrelationships

This is the case when multiple applications have well-managed vocabularies. Only one piece of software claims to 'own' an attribute, and other applications can rely on it having managed the schema correctly.

This is equivalent to having multiple applications with disjoint vocabularies, with the notable exception that the 'owner' might alter the schema in such a way that the reader's assumptions are rendered incorrect.

In this situation, the above schema version approach can be used: the 'owner' performs upgrades, and the 'reader' enters an error state if it sees a schema fragment version that it doesn't understand. The difficulty lies in detecting the altered schema.

Multiple applications with co-owned vocabularies

Consider two add-ons that both wish to use vocabulary like :page/url. There is no third-party add-on that can ensure that the datom store contains current vocabulary.

In this case, both add-ons take responsibility for transacting schema fragments. They also need to coordinate their upgrades and downgrades: they can't simply unilaterally decide to downgrade, because a loop can result. This strongly implies an in-store way of tracking and examining versions.

Statement of goals

  • Schema additions — new attributes and new schema fragments — should be trivial.
  • Simple schema changes — e.g., weakening constraints — should be easy.
  • Complex schema changes — e.g., tightening constraints or introducing reification — should be possible, and involve less risk than equivalent SQL changes.
  • Conflicts should be detectable.
  • Applications should be able to share copies/snapshots of schema fragments and migration code.

A modest proposal

We expose a simple default vocabulary for schema fragment management: :schema/name and :schema/version. Names should ideally be reverse domain notation. Versions should increment whenever a schema alteration is required. Notably, no version change is necessary when new attributes are added.

Schema names and versions are added or updated as schema fragments are transacted. Transacting the same schema fragment twice is a no-op.

Applications should ensure that no attribute is present in more than one schema fragment; an error will be thrown in that case.

These pieces of metadata are themselves stored in the datom store. Applications can listen for changes. In response to changes, they can pick up new vocabulary added by other applications.

The API exposes operations:

  • Check whether this schema fragment name is at this version. This allows for read-only applications to adjust their behavior accordingly.
  • Ensure that this fragment is at this version; if not:
    • Create it if needed.
    • Optionally, attempt to automatically transact the difference between the two schema fragments. For example, an attribute with :db/cardinality :db.cardinality/one can always be safely altered to :db.cardinality/many.
    • Optionally, run an upgrade step from the existing version to the desired version. Typically this will prepare the store for an automatic change.
    • Finally, raise an error on failure.

This is similar to the 'user version' functionality in SQLite, with important differences:

  • The datom store's schema consists of {name, version} pairs, not a single version.
  • Schema fragments have a globally unique identifier, allowing them to be shared across applications.
  • Applications are made aware at runtime when schema fragments change.
  • Many schema changes — adding attributes, altering indexing choices, or weakening constraints — can be performed automatically with no need to supply migration code.

Under this proposal different applications can each ship shared schema fragments, coordinate upgrades, avoid conflicts in a large majority of cases, and safely detect real conflicts when they arise.

An example

Let's start with a very simple core schema: pages have URLs and titles.

{:schema/name "org.mozilla.core.page"
 :schema/version 1
 :schema/attributes [
   {:db/id (d/id-literal :db.part/user)
    :db/ident              :page/url
    :db/valueType          :db.type/string          ; Because not all URLs are java.net.URIs. For JS we may want to use /uri.
    :db/fulltext           true
    :db/cardinality        :db.cardinality/one
    :db/unique             :db.unique/identity
    :db/doc                "A page's URL."
    :db.install/_attribute :db.part/db}
   {:db/id (d/id-literal :db.part/user)
    :db/ident              :page/title
    :db/valueType          :db.type/string
    :db/fulltext           true
    :db/cardinality        :db.cardinality/one      ; We supersede as we see new titles.
    :db/doc                "A page's title."
    :db.install/_attribute :db.part/db}]}

Adding an attribute

Adding an attribute is easy: the version number doesn't need to change.

{:schema/name "org.mozilla.core.page"
 :schema/version 1
 :schema/attributes [
   …   ;; Previous attributes elided for clarity.
   {:db/id (d/id-literal :db.part/user)
    :db/ident              :page/visit
    :db/valueType          :db.type/ref
    :db/cardinality        :db.cardinality/one
    :db/doc                "A visit to the page."
    :db.install/_attribute :db.part/db}]}

Making a safe alteration

But we erroneously marked :page/visit as cardinality-one: you can only ever visit a page once!

That's an easy automated fix: it's not possible for existing cardinality-one data to violate a cardinality-many restriction. So we fix it and bump the schema version. Datomish will weaken the constraint automatically.

{:schema/name "org.mozilla.core.page"
 :schema/version 2
 :schema/attributes [
   …
   {:db/id (d/id-literal :db.part/user)
    :db/ident              :page/visit
    :db/valueType          :db.type/ref
    :db/cardinality        :db.cardinality/many          ; Weaken this.
    :db/doc                "A visit to the page."
    :db.install/_attribute :db.part/db}]}

Making a potentially unsafe alteration

Then we recognize that we made another mistake. Visits should be unique: two pages can't share a single visit. We're confident in our application code, though, so we can again just change the schema without writing any code.

{:schema/name "org.mozilla.core.page"
 :schema/version 3
 :schema/attributes [
   …
   {:db/id (d/id-literal :db.part/user)
    :db/ident              :page/visit
    :db/valueType          :db.type/ref
    :db/unique             :db.unique/value              ; Add this.
    :db/cardinality        :db.cardinality/many
    :db/doc                "A visit to the page."
    :db.install/_attribute :db.part/db}]}

Preparing for an unsafe alteration (TODO)

If we wanted to be thorough, we would supply a query like this to run when transitioning to schema version 3:

;; This query is incorrect, but you get the idea.
[:find [?e ?a ?v ?t]
 :in $
 :where
 [?ex :page/visit ?v ?tx]
 [?e :page/visit ?v ?t]
 [(!= ?ex ?e)]
 [(ground :page/visit) ?a]]

This simply finds surplus datoms so they can be retracted prior to applying the schema change.

In the general case, code needs to run against the open datom store in order to perform the schema change — for example, one might want to preserve old data using new vocabulary before altering the old schema — so a simple approach isn't always enough.

The schema handling code performs a sequence of operations:

  • Run 'pre' code for the caller in total. Example: clean-up data across schema fragments, using the schema that's currently active in the store.
  • Run 'pre' code for each fragment. Example: clean up simple duplicates prior to altering an attribute, which is more efficient than rename-copy-retract.
  • Rename idents for each fragment. Example: moving an attribute that contains duplicates we can't yet fix.
  • Rename idents for the caller in total. Example: moving an attribute between two schema fragments that this application owns.
  • Automatically upgrade each schema fragment. The new schema will reflect existing renames or moves.
  • Run 'post' code for each fragment. Example: recovering data from a replaced attribute, or specifying default values.
  • Run 'post' code for the caller in total.

Most migrations will be very simple — some per-fragment pre- or post-upgrade code, if that — but this sequence exists if more complicated changes are required. Note that each fragment has pre/post/rename stages, as does the application itself. This should allow for simpler code sharing, avoiding the need for schema fragments to accommodate application logic.

For example, an imaginary upgrade sequence for developers who made a bunch of errors in a previous release might be:

From: page=2, save=1

To: page=3, save=2

  • App 'pre': clean up old history so we don't have to fix it.
  • Page 'pre': fix duplicate visits, prior to imposing a uniqueness constraint.
  • Save 'pre': delete saves for pages that no longer exist, prior to imposing a component constraint.
  • Page renames: none.
  • Save renames: rename :save/instant to :save/savedAt to fix a copy-paste error.
  • Page schema upgrade, 2->3: apply uniqueness constraint to :page/visit.
  • Save schema upgrade, 1->2: set isComponent to true to avoid orphans when history is deleted.
  • Page 'post': add inter-visit relationships to fixed pages using new vocabulary.
  • Save 'post': none.
  • App 'post': none.

Note that the page and save pre/rename/upgrade/post sequences are independent of each other and of the app sequence. Another application might share the page logic and perform this sequence first:

From: page=2

To: page=3

  • App 'pre': clean up old history so we don't have to fix it.
  • Page 'pre': fix duplicate visits, prior to imposing a uniqueness constraint.
  • Page renames: none.
  • Page schema upgrade, 2->3: apply uniqueness constraint to :page/visit.
  • Page 'post': add inter-visit relationships to fixed pages using new vocabulary.
  • App 'post': none.

leaving our first application to only upgrade the save schema fragment:

From: save=1

To: save=2

  • Save 'pre': delete saves for pages that no longer exist, prior to imposing a component constraint.
  • Save renames: rename :save/instant to :save/savedAt to fix a copy-paste error.
  • Save schema upgrade, 1->2: set isComponent to true to avoid orphans when history is deleted.
  • Save 'post': none.
  • App 'post': none.

After commit, the app updates its caches and other metadata, secure in the knowledge that the entire migration happened atomically and was persisted to disk.

Per-fragment options

Renames can be specified in the schema fragment itself:

{:schema/name "org.mozilla.core.page"
 :schema/version 3

 ;; Track upgrade steps.
 :schema/earliest 1       ; Upgrades will fail if the existing fragment is older than this.
 :schema/rename {
   ;; No need for an upgrade from 1: the attribute was unique!
   ;; If we're upgrading from 2, move the old data out of the way first. 
   2: {:page/visit :page/oldvisit}
 }

 :schema/attributes [
   …
   {:db/id (d/id-literal :db.part/user)
    :db/ident              :page/visit
    :db/valueType          :db.type/ref
    :db/unique             :db.unique/value              ; Now safe because of the rename.
    :db/cardinality        :db.cardinality/many
    :db/doc                "A visit to the page."
    :db.install/_attribute :db.part/db}]}

Pre- and post-upgrade functions, of course, can't easily be specified declaratively. They should instead be functions (f conn from to) that only have side effects on the provided connection, and raise on error.