Write up thoughts on compacting history.
parent
8ac9f3891b
commit
b8a7fcf398
1 changed files with 18 additions and 0 deletions
18
Thoughts:-compacting-rolling-up-history.md
Normal file
18
Thoughts:-compacting-rolling-up-history.md
Normal file
|
@ -0,0 +1,18 @@
|
|||
Preserving all history won't always be practical. Consumers with significant write volumes, where some of those writes are of cardinality-one properties, or are retractions, will grow their space consumption _more than linearly_ in the number of extant datoms.
|
||||
|
||||
(The worst-case consumer is one that adds and retracts the same datom over and over, adding two rows to the transaction log each time, never growing the datoms table.)
|
||||
|
||||
Datomic provides one mechanism to address this: `noHistory`. This is fine for some attributes, but some workloads will instead want to keep as much history as is practical within a given space limit or timeframe. Browser history is a great example: Firefox throws away the oldest browsing history when the Places database gets too big, with limits computed at runtime based on the capacity of the device.
|
||||
|
||||
We can imagine at least two ways to do this:
|
||||
|
||||
- For a given compaction threshold transaction T, find all matching add/retract pairs prior to T, and delete both. (cardinality-one updates are effected by retracting the old value and adding the new one, so this works.)
|
||||
|
||||
Queries of states prior to T will see missing values for any retracted datoms, but preserved original timestamps for non-retracted datoms.
|
||||
- For a given snapshot transaction S, truncate all history prior to S, find all extant datoms with a transaction ID less than S, and re-add those datoms to the transaction log as having been added in S. (Equivalently, collapse all old datoms 'up' into S.)
|
||||
|
||||
Querying of states prior to S is impossible: that history has been flattened completely. The truncated transaction log could be stored in cold storage if desired.
|
||||
|
||||
Compaction is a little more complex for consumers than `noHistory`: consumers know that `noHistory` attributes are hard to query at points in time, but they won't be expecting non-`noHistory` data to have existed at one point in time and not exist at that point in time after compaction. Careful documentation (and coordination between consumers!) will be needed.
|
||||
|
||||
It might make sense to conditionally compact based on transaction metadata, attribute, part, or schema fragment.
|
Loading…
Reference in a new issue