This is a big deck-chair re-arrangement. This puts FindQuery into
query-algebrizer and puts the validation from ParsedFindQuery ->
FindQuery their as well.
Some tests were re-homed for this.
In addition, the little-used maplit crate dependency was replaced with
inline expressions.
This uses a `SELECT *` from an inner subselect to filter potentially `NULL` aggregates.
The alternative is to handle `NULL` values throughout the projector, which is simple but loses a valuable invariant: Mentat SQL queries produce values that are not `NULL`.
* Make properties on NamespacedKeyword/NamespacedSymbol private
* Use only a single String for NamespacedKeyword/NamespacedSymbol
* Review comments.
* Remove unsafe code in namespaced_name.
Benchmarking shows approximately zero change.
* Allow the types of ns and name to differ when constructing a NamespacedName.
* Make symbol namespaces optional.
* Normalize names of keyword/symbol constructors.
This will make the subsequent refactor much less painful.
* Use expect not unwrap.
* Merge Keyword and NamespacedKeyword.
* Refactor AttributeCache populator code for use from pull.
* Pre: add to_value_rc to Cloned.
* Pre: add From<StructuredMap> for Binding.
* Pre: clarify Store::open_empty.
* Pre: StructuredMap cleanup.
* Pre: clean up a doc test.
* Split projector crate. Pass schema to projector.
* CLI support for printing bindings.
* Add and use ConjoiningClauses::derive_types_from_find_spec.
* Define pull types.
* Implement pull on top of the attribute cache layer.
* Add pull support to the projector.
* Parse pull expressions.
* Add simple pull support to connection objects.
* Tests for pull.
* Compile with Rust 1.25.
The only choice involved in this commit is that of replacing the
anonymous lifetime '_ with a named lifetime for the cache; since we're
accepting a Known, which includes the cache in question, I think it's
clear that we expect the function to apply to any given cache
lifetime.
* Review comments.
* Bail on unnamed attribute.
* Make assert_parse_failure_contains safe to use.
* Rework query parser to report better errors for pull.
* Test for mixed wildcard and simple attribute.
* Pre: eliminate some occurrences of Rc, largely through the magic of Into.
* Pre: introduce FromRc to convert between refcounted types.
* Introduce ValueRc as an abstraction over Rc/Arc choice.
* Move Cloned to core.
* Move CString-creation methods to TypedValue.
* Finish transition.
* Pre: clean up core/src/lib.rs.
* Pre: use indexmap 1.0 in db and query-projector.
* Change rel results to be a RelResult instance, not a Vec<Vec<TypedValue>>.
This avoids memory fragmentation and improves locality by using a single
heap-allocated vector for all bindings, rather than a separate
heap-allocated vector for each row.
We hide this abstraction behind the `RelResult` type, which tracks the
stride length (width) of each row.
* Don't allocate temporary vectors when projecting RelResults.
`tx-ids` allows to enumerate transaction IDs efficiently.
`tx-data` allows to extract transaction log data efficiently.
We might eventually allow to filter by impacted attribute sets as well.
* Pre: use debugcli in VSCode.
* Pre: wrap subqueries in parentheses in output SQL.
* Pre: add ExistingColumn.
This lets us make reference to columns by name, rather than only
pointing to qualified aliases.
* Pre: add Into for &str to TypedValue.
* Pre: add Store.transact.
* Pre: cleanup.
* Parse and algebrize simple aggregates. (#312)
* Follow-up: print aggregate columns more neatly in the CLI.
* Useful ValueTypeSet helpers.
* Allow for entity inequalities.
* Add 'differ', which is a ref-specialized not-equals.
* Add 'unpermute', a function for getting unique, distinct pairs from bindings.
* Review comments.
* Add 'the' pseudo-aggregation operator.
This allows for a corresponding value to be returned when a query
includes one 'min' or 'max' aggregate.
* Use the cache to make constant queries super fast.
* Fix translate tests to match: we no longer generate SQL for many of them!
* Accumulate additions and removals into the cache.
* Make attribute cache clone-on-write; store it in Metadata.
* Allow caching of fulltext attributes, interning strings.
This puts caching in mentat_db, adds a reverse lookup capability for
unique attributes, and populates bidirectional caches with a single
SQL cursor walk.
Differentiate between begin_read and begin_uncached_read.
Note that we still allow toggling within InProgress, because there might be
transient local state that makes starting a new transaction impossible.
Pre: export AttributeBuilder from mentat_db.
Pre: fix module-level comment for tx/src/entities.rs.
Pre: rename some `to_` conversions to `into_`.
Pre: make AttributeBuilder::unique less verbose.
Pre: split out a HasSchema trait to abstract over Schema.
Pre: rename SchemaMap/schema_map to AttributeMap/attribute_map.
Pre: TypedValue/NamespacedKeyword conversions.
Pre: turn Unique and ValueType into TypedValue::Keyword.
Pre: export IntoResult.
Pre: export NamespacedKeyword from mentat_core.
Pre: use intern_set in tx.
Pre: add InternSet::len.
Pre: comment gardening.
Pre: remove inaccurate TODO from TxReport comment.
* Pre: rename begin_transaction to begin_tx_application.
* Take an EXCLUSIVE transaction when bootstrapping, and an IMMEDIATE transaction when writing.
This avoids the remote possibility of another write sneaking in the door
while we're preparing to write, avoids us needing to upgrade locks, etc.
After a BEGIN IMMEDIATE, no other database connection will be able to write
to the database or do a BEGIN IMMEDIATE or BEGIN EXCLUSIVE. Other processes
can continue to read from the database, however.
An exclusive transaction causes EXCLUSIVE locks to be acquired on all
databases. After a BEGIN EXCLUSIVE, no other database connection except for
read_uncommitted connections will be able to read the database and no other
connection without exception will be able to write the database until the
transaction is complete.
* Hacky implementation of atomic multi-tx.
* Hold the last report, returning the InProgress from each operation.
* Rewrite transact in terms of InProgress.
* Test rollback.
* Remove unused imports.
* Don't use Rc for transaction reports.
* Pre: break out USER0 as a part boundary constant.
* Export TX0 and USER0 from mentat_db. This is for testing.
* Review comments: commenting.
* Test tempid allocation and rollback.
* Pre: make FindQuery, FindSpec, and Element non-Clone.
* Pre: make query translator return a Result.
* Pre: make projection return a Result.
* Pre: refactor query parser in preparation for parsing aggregates.
* Pre: rename PredicateFn -> QueryFunction.
* Pre: expose more about bound variables from CC.
* Pre: move ValueTypeSet to core.
* Update some dependencies.
* Update rusqlite to 0.12.
* Update error-chain to a forked version that implements Sync.
* Fix some compiler warnings.
* Remove unused imports in tests.
* Parse errors no longer naturally print with the expected symbol.
This version removes nalexander's lovely matrix code. It turned out
that scalar and tuple bindings are sufficiently different from coll
and rel -- they can directly apply as values in the query -- that
there was no point in jumping through hoops to turn those single
values into a matrix.
Furthermore, I've standardized us on a Vec<TypedValue>
representation for rectangular matrices, which should be much
more efficient, but would have required rewriting that code.
Finally, coll and rel are sufficiently different from each other
-- coll doesn't require processing nested collections -- that
my attempts to share code between them fell somewhat flat. I had
lots of nice ideas about zipping together cycles and such, but
ultimately I ended up with relatively straightforward, if a bit
repetitive, code.
The next commit will demonstrate the value of this work -- tests
that exercised scalar and tuple grounding now collapse down to
the simplest possible SQL.
This isn't perfect -- we still need to clone in a couple of cases -- but it avoids us
passing duplicate strings down into SQLite whenever the same value is mentioned more
than once in a query.
* Pre: unused import in translate.rs.
* Part 2: take a dependency on rusqlite for query arguments.
* Part 1: flatten V2 schema into V1. Add UUID and URI.
Bump expected ident and bootstrap datom count in tests.
* Part 5: parse edn::Value::Uuid.
* Part 3: extend ValueType and TypedValue to include Uuid.
* Part 4: add Uuid to query arguments.
* Part 6: extend db to support Uuid.
* Part 8: add a tx-parser test for #f NaN and #uuid.
* Part 7: parse and algebrize UUIDs in queries.
* Part 1: parse #inst in EDN and throughout query engine.
* Part 3: handle instants in db.
* Part 2: instants never matches integers in queries.
* Part 4: use DateTime for tx_instants.
* Add a test for adding and querying UUIDs and instants.
* Review comments.
* Part 1 - Parse `not` and `not-join`
* Part 2 - Validate `not` and `not-join` pre-algebrization
* Address review comments rnewman.
* Remove `WhereNotClause` and populate `NotJoin` with `WhereClause`.
* Fix validation for `not` and `not-join`, removing tests that were invalid.
* Address rustification comments.
* Rebase against `rust` branch.
* Part 3 - Add required types for NotJoin.
* Implement `PartialEq` for
`ConjoiningClauses` so `ComputedTable` can be included inside `ColumnConstraint::NotExists`
* Part 4 - Implement `apply_not_join`
* Part 5 - Call `apply_not_join` from inside `apply_clause`
* Part 6 - Translate `not-join` into `NOT EXISTS` SQL
* Address review comments.
* Rename `projected` to `unified` to better describe the fact that we are not projecting any variables.
* Check for presence of each unified var in either `column_bindings` or `input_bindings` and bail if not there.
* Copy over `input_bindings` for each var in `unified`.
* Only copy over the first `column_binding` for each variable in `unified` rather than the whole list.
* Update tests.
* Address review comments.
* Make output from Debug for NotExists more useful
* Clear up misunderstanding. Any single failing clause in the not will cause the entire not to be considered empty
* Address review comments.
* Remove Limit requirement from cc_to_exists.
* Use Entry.or_insert instead of matching on the entry to add to column_bindings.
* Move addition of value_bindings to before apply_clauses on template.
* Tidy up tests with some variable reuse.
* Addressed nits,
* Address review comments.
* Move addition of column_bindings to above apply_clause.
* Update tests.
* Add test to ensure that unbound vars fail
* Improve test for unbound variable to check for correct variable and error
* address nits
* Part 1: define ValueTypeSet.
We're going to use this instead of `HashSet<ValueType>` so that we can clearly express
the empty set and the set of all types, and also to encapsulate a switch to `EnumSet`."
* Part 2: use ValueTypeSet.
* Part 3: fix type expansion.
* Part 4: add a test for type extraction from nested `or`.
* Review comments.
* Review comments: simplify ValueTypeSet.
* Pre: put query parts in alphabetical order.
* Pre: rename 'input' to 'query' in translate tests.
* Part 1: parse :limit.
* Part 2: validate and escape variable parameters in SQL.
* Part 3: algebrize and translate limits.
This adds an `:order` keyword to `:find`.
If present, the results of the query will be an ordered set, rather than
an unordered set; rows will appear in an ordered defined by each
`:order` entry.
Each can be one of three things:
- A var, `?x`, meaning "order by ?x ascending".
- A pair, `(asc ?x)`, meaning "order by ?x ascending".
- A pair, `(desc ?x)`, meaning "order by ?x descending".
Values will be ordered in this sequence for asc, and in reverse for desc:
1. Entity IDs, in ascending numerical order.
2. Booleans, false then true.
3. Timestamps, in ascending numerical order.
4. Longs and doubles, intermixed, in ascending numerical order.
5. Strings, in ascending lexicographic order.
6. Keywords, in ascending lexicographic order, considering the entire
ns/name pair as a single string separated by '/'.
Subcommits:
Pre: make bound_value public.
Pre: generalize ErrorKind::UnboundVariable for use in order.
Part 1: parse (direction, var) pairs.
Part 2: parse :order clause into FindQuery.
Part 3: include order variables in algebrized query.
We add order variables to :with, so we can reuse its type tag projection
logic, and so that we can phrase ordering in terms of variables rather
than datoms columns.
Part 4: produce SQL for order clauses.
* Pre: refactor projector code.
* Part 1: maintain 'with' variables in AlgebrizedQuery.
* Part 2: include necessary 'with' variables in SQL projection list.
The test produces projection elements for `:with`, even though there are
no aggregates in the query. This test will need to be adjusted when we
optimize this away!
This commit turns complex `or` -- `or`s in which not all variables are
unified, or in which not all arms are the same shape -- into a
computed table.
We do this by building a template CC that shares some state with the
destination CC, applying each arm of the `or` to a copy of the template
as if it were a standalone query, then building a projection list and
creating a `ComputedTable::Union`. This is pushed into the destination
CC's `computed_tables` list.
Finally, the variables projected from the UNION are bound in the
destination CC, so that unification occurs, and projection of the
outermost query can use bindings established by the `or-join`.
This commit includes projection of type codes from heterogeneous `UNION`
arms: we compute a list of variables for which a definite type is
unknown in at least one arm, and force all arms to project either a type
tag column or a fixed type. It's important that each branch of a UNION
project the same columns in the same order, hence the projection of
fixed values.
The translator is similarly extended to project the type tag column name
or the known value_type_tag to support this.
Review comment: clarify union type extraction.
This commit:
- Defines a new kind of column, distinct from the eavt columns in
`DatomsColumn`, to model the rows projected from subqueries. These
always name one of two things: a variable, or a variable's type tag.
Naturally the two cases are thus `Variable` and `VariableTypeTag`.
These are cheap to clone, given that `Variable` is an `Rc<String>`.
- Defines `Column` as a wrapper around `DatomsColumn` and
`VariableColumn`. Everywhere we used to use `DatomsColumn` we now
allow `Column`: particularly in constraints and projections.
- Broadens the definition of a table list in the intermediate
"query-sql" representation to include a SQL UNION. A UNION is
represented as a list of queries and an alias.
- Implements translation from a `ComputedTable` to the query-sql
representation. In this commit we only project vars, not type tags.
Review comment: discuss bind_column_to_var for ValueTypeTag.
Review comment: implement From<Vec<T>> for ConsumableVec<T>.
Part 1, core: use Rc for String and Keyword.
Part 2, query: use Rc for Variable.
Part 3, sql: use Rc for args in SQLiteQueryBuilder.
Part 4, query-algebrizer: use Rc.
Part 5, db: use Rc.
Part 6, query-parser: use Rc.
Part 7, query-projector: use Rc.
Part 8, query-translator: use Rc.
Part 9, top level: use Rc.
Part 10: intern Ident and IdentOrKeyword.
* Add a failing test for EDN parsing '…'.
* Expose a SQLValueType trait to get value_type_tag values out of a ValueType.
* Add accessors to FindSpec.
* Implement querying.
* Implement rudimentary projection.
* Export mentat_db::new_connection.
* Export symbols from mentat.
* Add rudimentary end-to-end query tests.