Updated Transacting: upsert resolution algorithm (markdown)

Nick Alexander 2017-01-26 12:59:27 -08:00
parent 6b489e9d10
commit b0894b1558

@ -1 +1,39 @@
This will describe the upsert resolution algorithm.
The goal of the algorithm is to upsert temporary IDs to concrete numeric entids existing in the store where possible, or to determine that this is not possible and to allocate new entids.
The metaphor we use is that of _evolution_. At each evolutionary step, a _generation_ contains a number of different _populations_, each population containing a number of different transaction entities. Transaction entities in the process of being resolved are _evolved_ into successively simpler populations, until no further evolution is possible.
At the end of the evolutionary process, no transaction entity contains an unresolved temporary ID.
Throughout we use negative integers (`-1`, `-2`, etc) to denote temporary IDs that need to be resolved.
## Populations
We partition the set of transaction entities into the following populations.
| Population | Description |
| --- | --- |
| `upserts-e` | "Simple upserts" that look like `[:db/add -1 a v]` where `a` is `:db.unique/identity` |
| `upserts-ev` | "Complex upserts" that look like `[:db/add -1 a -2]` where `a` is `:db.unique/identity` |
| `allocations-ev` | Entities that look like `[:db/add -1 b -2]`, where `b` is *not* `:db.unique/identity` |
| `allocations-e` | Entities that look like `[:db/add -1 b v]`, where `b` is *not* `:db.unique/identity` |
| `allocations-v` | Entities that look like `[:db/add e b -2]`, where `b` is *not* `:db.unique/identity` |
| `inert` | Entities that are not `[:db/add ...]`, or that do not contain a temporary ID, like `[:db/add e a v]` |
## Evolutionary step
The population sets are iteratively refined as follows. Each evolutionary step, we do the following:
1. Try to upsert `upserts-e`. For each pair `[a v]`, we search for `[e a v]` in the store. If such a datom exists, we say entities like `[-1 a v]` have upserted, and we "resolve" the temporary ID `-1` to the entid `e`. Next generation, those entities will be in the `inert` population -- they can evolve no further. If such a datom `[e a v]` does not exist, we say entities like `[-1 a v]` require allocation. Next generation, those entities will be in the `allocations-e` population.
2. Using the temporary IDs that resolved in the previous step, try to evolve the all populations other than `upserts-e` further. That is, we might be able to evolve an entity `[-1 a -2]` in `upserts-ev` to `[-1 a e]` (in `upserts-e`) or to `[e a -2]` (in `allocations-v`) or event to `[e a v]` (in `inert`) next generation. Similarly, we may be able to evolve:
* some entities in `allocations-ev` to `allocations-e`, `allocations-v`, or `inert`;
* some entities in `allocations-e` to `inert`;
* some entities in `allocations-v` to `inert`.
Eventually, the population `upserts-e` will be empty. At this point, we evolve every complex upsert `[-1 a -2]` in `upserts-ev` to `allocations-ev`.
Finally, we allocate new entids for any temporary IDs unresolved in `allocations-ev`, `allocations-e`, and `allocations-v`. We're done!
It should not be difficult to write a formal proof of correctness of the algorithm, but I haven't done so.
As a future optimization, entities that upserted do not need to be inserted in the SQL store: they upserted, so they already exist in the DB. (We still need to verify uniqueness and ensure that no overlapping assertions occur.) Similarly, entities that contain temp IDs that required allocation do not need to be checked for existence before they are inserted into the SQL store, so they can be inserted more efficiently than arbitrary entities.