mentat

Author	SHA1	Message	Date
Richard Newman	19fc7cddf1	[query] Widen `known_types` correctly in complex `or`. (#424 ) r=nalexander * Part 1: define ValueTypeSet. We're going to use this instead of `HashSet<ValueType>` so that we can clearly express the empty set and the set of all types, and also to encapsulate a switch to `EnumSet`." * Part 2: use ValueTypeSet. * Part 3: fix type expansion. * Part 4: add a test for type extraction from nested `or`. * Review comments. * Review comments: simplify ValueTypeSet.	2017-04-24 14:15:26 -07:00
Richard Newman	bc63744aba	Add :limit to queries (#420 ) r=nalexander * Pre: put query parts in alphabetical order. * Pre: rename 'input' to 'query' in translate tests. * Part 1: parse :limit. * Part 2: validate and escape variable parameters in SQL. * Part 3: algebrize and translate limits.	2017-04-19 16:16:19 -07:00
Richard Newman	bffefe7e6b	Review comments for #418 .	2017-04-18 13:50:58 -07:00
Nick Alexander	ff0147e89c	Review comments: downgrade to error-chain 0.8.1 for Send + Sync bound; use combine::primitive::Error.	2017-04-18 13:19:50 -07:00
Richard Newman	60c082b61e	Part 4: pass inputs through algebrizing and execution. (#418 ) This also adds a test that an `UnboundVariables` error is raised if a variable mentioned in the `:in` clause isn't bound.	2017-04-18 13:19:50 -07:00
Richard Newman	dfc846e483	Part 3: define keep_intersected_keys. We'll use this to drop unneeded values from input maps, if lazy callers reuse a general-purpose map for multiple queries.	2017-04-18 13:19:50 -07:00
Richard Newman	651308f721	Part 2: define a type to encapsulate query inputs. This is for two reasons. Firstly, we need to track the types of inputs, their values, and also the input variables; adding a struct gives us a little more clarity. Secondly, when we come to implement prepared statements, we'll be algebrizing queries without having the values available. We'll be able to do a better job of algebrizing, and also do more validating, if we allow callers to specify the types of variables in advance, even if the values aren't known.	2017-04-18 13:19:50 -07:00
Richard Newman	35d73d5541	Implement :order. (#415 ) (#416 ) r=nalexander This adds an `:order` keyword to `:find`. If present, the results of the query will be an ordered set, rather than an unordered set; rows will appear in an ordered defined by each `:order` entry. Each can be one of three things: - A var, `?x`, meaning "order by ?x ascending". - A pair, `(asc ?x)`, meaning "order by ?x ascending". - A pair, `(desc ?x)`, meaning "order by ?x descending". Values will be ordered in this sequence for asc, and in reverse for desc: 1. Entity IDs, in ascending numerical order. 2. Booleans, false then true. 3. Timestamps, in ascending numerical order. 4. Longs and doubles, intermixed, in ascending numerical order. 5. Strings, in ascending lexicographic order. 6. Keywords, in ascending lexicographic order, considering the entire ns/name pair as a single string separated by '/'. Subcommits: Pre: make bound_value public. Pre: generalize ErrorKind::UnboundVariable for use in order. Part 1: parse (direction, var) pairs. Part 2: parse :order clause into FindQuery. Part 3: include order variables in algebrized query. We add order variables to :with, so we can reuse its type tag projection logic, and so that we can phrase ordering in terms of variables rather than datoms columns. Part 4: produce SQL for order clauses.	2017-04-17 11:30:31 -07:00
Richard Newman	64acc6a7ee	Support :with (#311 ) (#414 ) r=nalexander * Pre: refactor projector code. * Part 1: maintain 'with' variables in AlgebrizedQuery. * Part 2: include necessary 'with' variables in SQL projection list. The test produces projection elements for `:with`, even though there are no aggregates in the query. This test will need to be adjusted when we optimize this away!	2017-04-17 09:23:55 -07:00
Richard Newman	758ab8b476	Part 5: add more tests for complex `or`.	2017-04-12 19:21:56 -07:00
Richard Newman	d8075aa07d	Part 3: finish expansion and translation of complex `or`. This commit turns complex `or` -- `or`s in which not all variables are unified, or in which not all arms are the same shape -- into a computed table. We do this by building a template CC that shares some state with the destination CC, applying each arm of the `or` to a copy of the template as if it were a standalone query, then building a projection list and creating a `ComputedTable::Union`. This is pushed into the destination CC's `computed_tables` list. Finally, the variables projected from the UNION are bound in the destination CC, so that unification occurs, and projection of the outermost query can use bindings established by the `or-join`. This commit includes projection of type codes from heterogeneous `UNION` arms: we compute a list of variables for which a definite type is unknown in at least one arm, and force all arms to project either a type tag column or a fixed type. It's important that each branch of a UNION project the same columns in the same order, hence the projection of fixed values. The translator is similarly extended to project the type tag column name or the known value_type_tag to support this. Review comment: clarify union type extraction.	2017-04-12 19:21:45 -07:00
Richard Newman	08d2c613a4	Part 2: expand the definition of a table to include computed tables. This commit: - Defines a new kind of column, distinct from the eavt columns in `DatomsColumn`, to model the rows projected from subqueries. These always name one of two things: a variable, or a variable's type tag. Naturally the two cases are thus `Variable` and `VariableTypeTag`. These are cheap to clone, given that `Variable` is an `Rc<String>`. - Defines `Column` as a wrapper around `DatomsColumn` and `VariableColumn`. Everywhere we used to use `DatomsColumn` we now allow `Column`: particularly in constraints and projections. - Broadens the definition of a table list in the intermediate "query-sql" representation to include a SQL UNION. A UNION is represented as a list of queries and an alias. - Implements translation from a `ComputedTable` to the query-sql representation. In this commit we only project vars, not type tags. Review comment: discuss bind_column_to_var for ValueTypeTag. Review comment: implement From<Vec<T>> for ConsumableVec<T>.	2017-04-12 19:21:33 -07:00
Richard Newman	7948788936	Part 1: define ComputedTable. Complex `or`s are translated to SQL as a subquery -- in particular, a subquery that's a UNION. Conceptually, that subquery is a computed table: `all_datoms` and `datoms` yield rows of e/a/v/tx, and each computed table yields rows of variable bindings. The table itself is a type, `ComputedTable`. Its `Union` case contains everything a subquery needs: a `ConjoiningClauses` and a projection list, which together allow us to build a SQL subquery, and a list of variables that need type code extraction. (This is discussed further in a later commit.) Naturally we also need a way to refer to columns in a computed table. We model this by a new enum case in `DatomsTable`, `Computed`, which maintains an integer value that uniquely identifies a computed table.	2017-04-12 11:13:58 -07:00
Richard Newman	79ccd818f3	Pre: use ..Default approach for use_as_template and make_receptacle. I decided this was more efficient (no temporary attributes and mutability) and less confusing.	2017-04-12 11:12:49 -07:00
Richard Newman	98ac559894	Pre: allow initialization of a CC with an arbitrary counter value. Useful for testing.	2017-04-12 11:12:48 -07:00
Richard Newman	33fa1261b8	Pre: clone alias_counter into concretes. This ensures that concrete CC clones don't have overlapping counts.	2017-04-12 11:11:56 -07:00
Richard Newman	b9f9b4ff58	Pre: make extracted_types pub so the projector and translator can use it.	2017-04-12 11:11:56 -07:00
Richard Newman	e280811243	Part 7: use RcCounter to implement aliasing in ConjoiningClauses. This allows us to share a counter between templates produced from a CC.	2017-04-07 12:46:34 -07:00
Richard Newman	2b61944f09	Part 6: track why an empty or-join failed.	2017-04-07 12:46:30 -07:00
Richard Newman	b693385495	Part 5: eliminate is_known_empty in favor of empty_because and an accessor.	2017-04-07 12:46:26 -07:00
Richard Newman	a07efc0a9e	Part 4: look up attributes for bound variables when making type determinations.	2017-04-07 12:46:26 -07:00
Richard Newman	72977f52e4	Part 3: reinstate extracted type pruning. When we started expanding and narrowing type sets, it became impossible to conclusively know during pattern application whether a type was known. We now figure that out at the end: if a variable has only a single known type, we don't need to extract its type tag.	2017-04-07 12:46:26 -07:00
Richard Newman	0639c94468	Part 2: implement simple `or`.	2017-04-07 12:46:25 -07:00
Richard Newman	9df18e4286	Part 1: implement type narrowing and broadening.	2017-04-07 12:44:03 -07:00
Richard Newman	a5023c70cb	Use Rc for TypedValue, Variable, and query Ident keywords. (#395 ) r=nalexander Part 1, core: use Rc for String and Keyword. Part 2, query: use Rc for Variable. Part 3, sql: use Rc for args in SQLiteQueryBuilder. Part 4, query-algebrizer: use Rc. Part 5, db: use Rc. Part 6, query-parser: use Rc. Part 7, query-projector: use Rc. Part 8, query-translator: use Rc. Part 9, top level: use Rc. Part 10: intern Ident and IdentOrKeyword.	2017-04-02 21:38:36 -07:00
Richard Newman	2b2b5cf696	Part 6: implement decision tree for processing simple alternation.	2017-03-30 19:13:40 -07:00
Richard Newman	74f188df9b	Part 5b: rename also/instead to add_intersection and add_alternate.	2017-03-30 19:13:20 -07:00
Richard Newman	9e5c735460	Part 5: split cc.rs into a 'clauses' module. mod.rs defines the module and ConjoiningClauses itself, complete with methods to record facts and ask it questions. pattern.rs, predicate.rs, resolve.rs, and or.rs include particular functionality around accumulating certain kinds of patterns. Only `or.rs` includes significant new code; the rest is just split.	2017-03-30 19:13:20 -07:00
Richard Newman	ce3c4f0dca	Part 3: have table_for_places return a Result, not an Option.	2017-03-30 19:13:20 -07:00
Richard Newman	01ca0ae5c1	Part 2: add an EmptyBecause case for fulltext/non-string type mismatch.	2017-03-30 19:13:19 -07:00
Richard Newman	997df0b776	Part 1: introduce ColumnIntersection and ColumnAlternation. This provides a limited form of OR and AND for column constraints, allowing simple 'or-join' queries to be expressed on a single table alias.	2017-03-30 19:13:19 -07:00
Richard Newman	95a5326e23	Pre: move EmptyBecause into types.rs.	2017-03-30 18:03:03 -07:00
Richard Newman	8adb6d97fd	Add validation for or-join. r=nalexander	2017-03-27 16:32:45 -07:00
Richard Newman	88df7b3b33	Correctly generate DISTINCT and LIMIT. (#386 ) r=nalexander	2017-03-22 14:02:00 -07:00
Richard Newman	5e971f3b22	Post: simplify type set narrowing.	2017-03-22 11:32:32 -07:00
Richard Newman	cb4ba9e68f	Post: reorganize cc.rs.	2017-03-22 11:32:32 -07:00
Richard Newman	7024978517	Track ever-shrinking sets of types for variables, not a single type. (#381 ) r=nalexander	2017-03-22 11:30:16 -07:00
Richard Newman	97749833d0	Algebrize and translate numeric constraints. (#306 ) r=nalexander	2017-03-22 10:19:47 -07:00
Richard Newman	1c4e30a906	Pre: switch to taking Patterns by move, not by reference, when algebrizing.	2017-03-22 10:14:15 -07:00
Richard Newman	f5aa6b2c2c	Pre: add mentat_query_algebrizer::errors.	2017-03-22 10:14:15 -07:00
Richard Newman	d8d36140a9	Pre: add tests for CC constraint intersection. Also add a failing test for #373.	2017-03-22 10:14:15 -07:00
Richard Newman	fe307f8b7a	Pre: remove dead code in cc.rs.	2017-03-22 10:13:58 -07:00
Richard Newman	3d66cb5d0f	Pre: move query algebrizer types to their own file.	2017-03-22 10:13:45 -07:00
Nick Alexander	15b4195a6e	Schema alteration. Fixes #294 and #295 . (#370 ) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.	2017-03-20 13:18:59 -07:00
Richard Newman	70e5759b5f	Ensure that variable bindings are used when selecting a table. r=nalexander,etoop For queries like ```edn [:find ?x :where [?x _ "hello"]] [:find [?v ...] :where [_ ?a ?v]] ``` we'll query `all_datoms` to handle fulltext strings, which is expensive. If `?a` is bound, we can avoid this — resolve any keyword binding, ensure that the value is an attribute, and use the appropriate table.	2017-03-14 13:47:22 +00:00
Richard Newman	6109a63249	Support input bindings in ConjoiningClauses. r=nalexander	2017-03-10 19:01:56 -08:00
Richard Newman	bf38105fef	(#362 ) Part 4: handle unknown attributes by expanding type codes. r=nalexander Also, don't run any SQL at all if an algebrized query is known to return no results.	2017-03-08 17:44:27 -08:00
Richard Newman	b5867e9131	(#362 ) Part 3: implement querying against simple keywords. r=nalexander	2017-03-08 17:44:19 -08:00
Richard Newman	ce3a9bdf87	(#362 ) Part 2: use constrain_attribute. r=nalexander	2017-03-08 17:44:11 -08:00
Richard Newman	8935d6a8a5	(#362 ) Part 1: if a variable's type becomes known, don't extract it. r=nalexander This is necessary because we process patterns sequentially; a later pattern might tell us the type of a variable (e.g., by having a constant attribute), at which point we can do less work.	2017-03-08 17:44:00 -08:00

1 2

59 commits