0 Thoughts: compacting or rolling up history
Richard Newman edited this page 2017-02-08 11:10:50 -08:00

Preserving all history won't always be practical. Consumers with significant write volumes, where some of those writes are of cardinality-one properties, or are retractions, will grow their space consumption more than linearly in the number of extant datoms.

(The worst-case consumer is one that adds and retracts the same datom over and over, adding two rows to the transaction log each time, never growing the datoms table.)

Datomic provides one mechanism to address this: noHistory. This is fine for some attributes, but some workloads will instead want to keep as much history as is practical within a given space limit or timeframe. Browser history is a great example: Firefox throws away the oldest browsing history when the Places database gets too big, with limits computed at runtime based on the capacity of the device.

We can imagine at least two ways to do this:

  • For a given compaction threshold transaction T, find all matching add/retract pairs prior to T, and delete both. (cardinality-one updates are effected by retracting the old value and adding the new one, so this works.)

    Queries of states prior to T will see missing values for any retracted datoms, but preserved original timestamps for non-retracted datoms.

  • For a given snapshot transaction S, truncate all history prior to S, find all extant datoms with a transaction ID less than S, and re-add those datoms to the transaction log as having been added in S. (Equivalently, collapse all old datoms 'up' into S.)

    Querying of states prior to S is impossible: that history has been flattened completely. The truncated transaction log could be stored in cold storage if desired.

Compaction is a little more complex for consumers than noHistory: consumers know that noHistory attributes are hard to query at points in time, but they won't be expecting non-noHistory data to have existed at one point in time and not exist at that point in time after compaction. Careful documentation (and coordination between consumers!) will be needed.

It might make sense to conditionally compact based on transaction metadata, attribute, part, or schema fragment.