This should address #663, by re-inserting type checking in the
transactor stack after the entry point used by the term builder.
Before this commit, we were using an SQLite UNIQUE index to assert
that no `[e a]` pair, with `a` a cardinality one attribute, was
asserted more than once. However, that's not in line with Datomic,
which treats transaction inputs as a set and allows a single datom
like `[e a v]` to appear multiple times. It's both awkward and not
particularly efficient to look for _distinct_ repetitions in SQL, so
we accept some runtime cost in order to check for repetitions in the
transactor. This will allow us to address #532, which is really about
whether we treat inputs as sets. A side benefit is that we can
provide more helpful error messages when the transactor does detect
that the input truly violates the cardinality constraints of the
schema.
This commit builds a trie while error checking and collecting final
terms, which should be fairly efficient. It also allows a simpler
expression of input-provided :db/txInstant datoms, which in turn
uncovered a small issue with the transaction watcher, where-by the
watcher would not see non-input-provided :db/txInstant datoms.
This transition to Datomic-like input-as-set semantics allows us to
address #532. Previously, two tempids that upserted to the same entid
would produce duplicate datoms, and that would have been rejected by
the transactor -- correctly, since we did not allow duplicate datoms
under the input-as-list semantics. With input-as-set semantics,
duplicate datoms are allowed; and that means that we must allow
tempids to be equivalent, i.e., to resolve to the same tempid.
To achieve this, we:
- index the set of tempids
- identify tempid indices that share an upsert
- map tempids to a dense set of contiguous integer labels
We use the well-known union-find algorithm, as implemented by
petgraph, to efficiently manage the set of equivalent tempids.
Along the way, I've fixed and added tests for two small errors in the
transactor. First, don't drop datoms resolved by upsert (#679).
Second, ensure that complex upserts are allocated.
I don't know quite what happened here. The Clojure implementation
correctly kept complex upserts that hadn't resolved as complex
upserts (see
9a9dfb502a/src/common/datomish/transact.cljc (L436))
and then allocated complex upserts if they didn't resolve (see
9a9dfb502a/src/common/datomish/transact.cljc (L509)).
Based on the code comments, I think the Rust implementation must have
incorrectly tried to optimize by handling all complex upserts in at
most a single generation of evolution, and that's just not correct.
We're effectively implementing a topological sort, using very specific
domain knowledge, and its not true that a node in a topological sort
can be considered only once!
There are few reasons to do this:
- it's difficult to add symbol interning to combine-based parsers like
tx-parser -- literally every type changes to reflect the interner,
and that means every convenience macro we've built needs to chagne.
It's trivial to add interning to rust-peg-based parsers.
- combine has rolled forward to 3.2, and I spent a similar amount of
time investigating how to upgrade tx-parser (to take advantage of
the new parser! macros in combine that I think are necessary for
adapting to changing types) as I did just converting to rust-peg.
- it's easy to improve the error messages in rust-peg, where-as I have
tried twice to improve the nested error messages in combine and am
stumped.
- it's roughly 4x faster to parse strings directly as opposed to
edn::ValueAndSpan, and it'll be even better when we intern directly.
* Refactor AttributeCache populator code for use from pull.
* Pre: add to_value_rc to Cloned.
* Pre: add From<StructuredMap> for Binding.
* Pre: clarify Store::open_empty.
* Pre: StructuredMap cleanup.
* Pre: clean up a doc test.
* Split projector crate. Pass schema to projector.
* CLI support for printing bindings.
* Add and use ConjoiningClauses::derive_types_from_find_spec.
* Define pull types.
* Implement pull on top of the attribute cache layer.
* Add pull support to the projector.
* Parse pull expressions.
* Add simple pull support to connection objects.
* Tests for pull.
* Compile with Rust 1.25.
The only choice involved in this commit is that of replacing the
anonymous lifetime '_ with a named lifetime for the cache; since we're
accepting a Known, which includes the cache in question, I think it's
clear that we expect the function to apply to any given cache
lifetime.
* Review comments.
* Bail on unnamed attribute.
* Make assert_parse_failure_contains safe to use.
* Rework query parser to report better errors for pull.
* Test for mixed wildcard and simple attribute.
We don't yet have a logging system for production use, but I'd like to
start experimenting with log, which seems to be (close to) a Rust
standard. We're already using it in mentat_cli.
* Pre: clean up core/src/lib.rs.
* Pre: use indexmap 1.0 in db and query-projector.
* Change rel results to be a RelResult instance, not a Vec<Vec<TypedValue>>.
This avoids memory fragmentation and improves locality by using a single
heap-allocated vector for all bindings, rather than a separate
heap-allocated vector for each row.
We hide this abstraction behind the `RelResult` type, which tracks the
stride length (width) of each row.
* Don't allocate temporary vectors when projecting RelResults.
Simplify.
This has a watcher collect txid -> AttributeSet mappings each time a
transact occurs. On commit we retrieve those mappings and hand them over
to the observer service, which filters them and packages them up for
dispatch.
Tidy up
* Use the cache to make constant queries super fast.
* Fix translate tests to match: we no longer generate SQL for many of them!
* Accumulate additions and removals into the cache.
* Make attribute cache clone-on-write; store it in Metadata.
* Allow caching of fulltext attributes, interning strings.
You can use this in conjunction with setting SQLITE3_LIB_DIR to control which SQLite is used.
See https://github.com/jgallagher/rusqlite for more.
Also add recent contributors to the authors array.
* Update some dependencies.
* Update rusqlite to 0.12.
* Update error-chain to a forked version that implements Sync.
* Fix some compiler warnings.
* Remove unused imports in tests.
* Parse errors no longer naturally print with the expected symbol.
* Part 1: added limits feature to rusqlite dependencies.
* Part 2: replace references to SQLITE_MAX_VARIABLE_NUMBER with sqlite3_limit.
* Move assertion check for correct number of variables in repeat_values to before call as this is where the variable is defined.
* Part 3: add tests
* Pre: Drop unneeded tx0 from search results.
* Pre: Don't require a schema in some of the DB code.
The idea is to separate the transaction applying code, which is
schema-aware, from the concrete storage code, which is just concerned
with getting bits onto disk.
* Pre: Only reference Schema, not DB, in debug module.
This is part of a larger separation of the volatile PartitionMap,
which is modified every transaction, from the stable Schema, which is
infrequently modified.
* Pre: Fix indentation.
* Extract part of DB to new SchemaTypeChecking trait.
* Extract part of DB to new PartitionMapping trait.
* Pre: Don't expect :db.part/tx partition to advance when tx fails.
This fails right now, because we allocate tx IDs even when we shouldn't.
* Sketch a db interface without DB.
* Add ValueParseError; use error-chain in tx-parser.
This can be simplified when
https://github.com/Marwes/combine/issues/86 makes it to a published
release, but this unblocks us for now. This converts the `combine`
error type `ParseError<&'a [edn::Value]>` to a type with owned
`Vec<edn::Value>` collections, re-using `edn::Value::Vector` for
making them `Display`.
* Pre: Accept Borrow<Schema> instead of just &Schema in debug module.
This makes it easy to use Rc<Schema> or Arc<Schema> without inserting
&* sigils throughout the code.
* Use error-chain in query-parser.
There are a few things to point out here:
- the fine grained error types have been flattened into one crate-wide
error type; it's pretty easy to regain the granularity as needed.
- edn::ParseError is automatically lifted to
mentat_query_parser::errors::Error;
- we use mentat_parser_utils::ValueParser to maintain parsing error
information from `combine`.
* Patch up top-level.
* Review comment: Only `borrow()` once.
* Pre: Add some value conversion tests.
This is follow-up to earlier work. Turn TypedValue::Keyword into
edn::Value::NamespacedKeyword. Don't take a reference to
value_type_tag.
* Pre: Add repeat_values.
Requires itertools, so this commit is not stand-alone.
* Pre: Expose the first transaction ID as bootstrap::TX0.
This is handy for testing.
* Pre: Improve debug module.
* Pre: Bump rusqlite version for https://github.com/jgallagher/rusqlite/issues/211.
* Pre: Use itertools.
* Start implementing bulk SQL insertion algorithms. (#214)
This is slightly simpler re-expression of the existing Clojure
implementation.
* Post: Start generic data-driven transaction testing. (#188)
* Review comment: `use ::{SYMBOL}` instead of `use {SYMBOL}`.
* Review comment: Prefer bindings_per_statement to values_per_statement.
* Make Variable::from_symbol public.
* Implement basic parsing of queries.
* Use pinned dependencies the hard way to fix Travis.
* Bump ordered-float dependency to 0.4.0.
* Error coercions to use ?, and finishing the find interface.
* Start installing the SQLite store and bootstrapping the datom store.
* Review comment: Decomplect V2_IDENTS.
* Review comment: Decomplect V2_PARTS.
* Review comment: Pre: Expose Clojure's merge on Value instances.
* Review comment: Decomplect V2_SYMBOLIC_SCHEMA.
* Review comment: Decomplect V1_STATEMENTS.
* Review comment: Prefer ? to try!.
* Review comment: Fix typos; format; add TODOs.
* Review comment: Assert that Mentat `Schema` is valid upon creation.
* Review comment: Improve conversion to and from SQL values.
This patch factors the fundamental SQL conversion maps
between (rusqlite::Value, value_type_tag) and (edn::Value, ValueType)
through a new Mentat TypedValue. (A future patch might rename this
fundamental type mentat::Value.)
To make certain conversion functions infallible, I removed
placeholders for :db.type/{instant,uuid,uri}. (We could panic
instead, but there's no need to do that right now.)
* Review comment: Always uses bundled SQLite in rusqlite.
This avoids (runtime) failures in Travis CI due to old SQLite
versions. See 432966ac77.
* Review comment: Move semantics in `from_sql_value_pair`.
* Review comment: DB_EXCISE_BEFORE_T instead of ...BEFORET (no underscore).
* Review comment: Move overview notes to the Wiki.