Starting to work out the project layout for sub-crates. The crate inside query-parser/ is "datomish-query-parser" and the core code in src/ depends on it.
This allows for code to run before and after a schema fragment is
added for the first time.
The anticipated use for this is twofold:
1. To do initial setup, e.g., defining global entities.
2. To 'adopt' unmanaged attributes already defined in the store.
This 'pre' would manually alter or retract attributes so that the
transact of the new schema datoms can complete.
For example, if properties :foo/bar and :foo/baz will be unchanged,
but :noo/zob needs to change from a string to an integer, the :none
pre-function can alter the ident, and the :none post-function can
migrate and clean up.
This generalizes the transactor loop to allow callers to run
an arbitrary function within an `in-transaction!` body.
Combined with exposing `<report-transact-tx-data!`, this allows
an admittedly sophisticated consumer to conditionally query and
transact in a consistent way -- for example, cleaning up inconsistent
data then transacting a new schema version.
Altering uniqueness and cardinality attributes works, with the exception
of enabling uniqueness from nothing.
:db/noHistory and :db/isComponent changes are implemented but untested,
and aren't really supported by Datomish anyway.
The metaphor we use is that of "evolution", where each "evolutionary
step" contains a number of different "generations". Entities in the
process of being resolved are increasingly "evolved" into simpler
generations, until no further evolution is possible.
The test would fail because we would have an [a v] pair with a string
value, but we were looking for the fulltext rowid in <avs. Using
all_datoms correctly looks up the string value, at the cost of crippling
the speed of <avs.
This sorts fulltext values inserted in a single transaction, not across
transactions. This makes the rowids assigned in the fulltext_values
table internally consistent, even as the order of entities and datoms
changes (as the transaction applying algorithm evolves over time). The
test changes simply make the fulltext values sort easily.
In theory, these fulltext values could be very large, and sorting might
be very expensive. In practice, we expect values to differ in their
first few characters, so that this is efficient (i.e., proportional to
the number of fulltext values inserted and not their size).
This uses a common table expression and multiple SQL calls rather than a
temporary table, since transactions with huge numbers of distinct
lookup-refs are likely to be very rare.
We mark lookup-refs with `lookup-ref`, which is a little awkward because
binding `(let [[a v] lookup-ref] ...)` doesn't directly work, but avoids
some ambiguity present in Datomic and DataScript around interpreting
lookup-refs as multiple value lists. (Which bit the tests in an earlier
version of this patch!)
There's no distinction made for fulltext attributes, since the values
found by the retractAttributes SELECT are already rowids into the
fulltext_values table and therefore need no additional mapping.
These temp files will almost certainly live in memory only, speeding our
test suite evaluation significantly. Before this patch, in a warmed
REPL environment I get:
Testing datomish.db-test
Ran 19 tests containing 97 assertions.
0 failures, 0 errors.
"Elapsed time: 1408.720681 msecs"
"Elapsed time: 1343.986464 msecs"
"Elapsed time: 1338.660762 msecs"
After this patch, in a warmed REPL environment I get:
Testing datomish.db-test
Ran 19 tests containing 97 assertions.
0 failures, 0 errors.
"Elapsed time: 587.605168 msecs"
"Elapsed time: 569.522333 msecs"
"Elapsed time: 589.080282 msecs"
We'd like this to be part of the query syntax itself, but doing so
requires extending DataScript's parser.
Instead we generalize our `args` to `options`, and take `:limit`
and `:order-by-vars`. The former must be an integer or nil, and the
latter is an array of `[var direction]` pairs.
This commit includes descriptive error messages and tests for success
and failure.
This caches a partition map per DB, which is helpful because it exposes
what the point in time DB partition state is, but is unhelpful because
the partition state can advance underneath the DB cache. This is
generally true of the approach -- this can happen to the ident/entid
maps, and the datoms themselves -- so we'll roll with it for now.
This reduces the number of SQL UPDATE operations from linear in the
number of id-literals used to constant in the number of known
partitions.