diff --git a/Thoughts:-compacting-rolling-up-history.md b/Thoughts:-compacting-rolling-up-history.md new file mode 100644 index 0000000..be8c49f --- /dev/null +++ b/Thoughts:-compacting-rolling-up-history.md @@ -0,0 +1,18 @@ +Preserving all history won't always be practical. Consumers with significant write volumes, where some of those writes are of cardinality-one properties, or are retractions, will grow their space consumption _more than linearly_ in the number of extant datoms. + +(The worst-case consumer is one that adds and retracts the same datom over and over, adding two rows to the transaction log each time, never growing the datoms table.) + +Datomic provides one mechanism to address this: `noHistory`. This is fine for some attributes, but some workloads will instead want to keep as much history as is practical within a given space limit or timeframe. Browser history is a great example: Firefox throws away the oldest browsing history when the Places database gets too big, with limits computed at runtime based on the capacity of the device. + +We can imagine at least two ways to do this: + +- For a given compaction threshold transaction T, find all matching add/retract pairs prior to T, and delete both. (cardinality-one updates are effected by retracting the old value and adding the new one, so this works.) + + Queries of states prior to T will see missing values for any retracted datoms, but preserved original timestamps for non-retracted datoms. +- For a given snapshot transaction S, truncate all history prior to S, find all extant datoms with a transaction ID less than S, and re-add those datoms to the transaction log as having been added in S. (Equivalently, collapse all old datoms 'up' into S.) + + Querying of states prior to S is impossible: that history has been flattened completely. The truncated transaction log could be stored in cold storage if desired. + +Compaction is a little more complex for consumers than `noHistory`: consumers know that `noHistory` attributes are hard to query at points in time, but they won't be expecting non-`noHistory` data to have existed at one point in time and not exist at that point in time after compaction. Careful documentation (and coordination between consumers!) will be needed. + +It might make sense to conditionally compact based on transaction metadata, attribute, part, or schema fragment. \ No newline at end of file