Commit graph

565 commits

Author SHA1 Message Date
Richard Newman
19fc7cddf1 [query] Widen known_types correctly in complex or. (#424) r=nalexander
* Part 1: define ValueTypeSet.

We're going to use this instead of `HashSet<ValueType>` so that we can clearly express
the empty set and the set of all types, and also to encapsulate a switch to `EnumSet`."

* Part 2: use ValueTypeSet.

* Part 3: fix type expansion.

* Part 4: add a test for type extraction from nested `or`.

* Review comments.

* Review comments: simplify ValueTypeSet.
2017-04-24 14:15:26 -07:00
Richard Newman
bc63744aba Add :limit to queries (#420) r=nalexander
* Pre: put query parts in alphabetical order.
* Pre: rename 'input' to 'query' in translate tests.
* Part 1: parse :limit.
* Part 2: validate and escape variable parameters in SQL.
* Part 3: algebrize and translate limits.
2017-04-19 16:16:19 -07:00
Brian Grinstead
cd860ae68d Add an initial benchmark for the tx-parser crate. (#406) (#413) r=nalexander 2017-04-19 13:54:24 -07:00
Richard Newman
bffefe7e6b Review comments for #418. 2017-04-18 13:50:58 -07:00
Richard Newman
aa14a71019 Parse :in, pass inputs through to querying. (#418) r=nalexander
This commit downgrades error_chain to 0.8.1 in order to fix trait bounds
on errors.
2017-04-18 13:20:00 -07:00
Nick Alexander
ff0147e89c Review comments: downgrade to error-chain 0.8.1 for Send + Sync bound; use combine::primitive::Error. 2017-04-18 13:19:50 -07:00
Richard Newman
60c082b61e Part 4: pass inputs through algebrizing and execution. (#418)
This also adds a test that an `UnboundVariables` error is raised if a
variable mentioned in the `:in` clause isn't bound.
2017-04-18 13:19:50 -07:00
Richard Newman
dfc846e483 Part 3: define keep_intersected_keys.
We'll use this to drop unneeded values from input maps, if lazy callers
reuse a general-purpose map for multiple queries.
2017-04-18 13:19:50 -07:00
Richard Newman
651308f721 Part 2: define a type to encapsulate query inputs.
This is for two reasons.

Firstly, we need to track the types of inputs, their values, and also
the input variables; adding a struct gives us a little more clarity.

Secondly, when we come to implement prepared statements, we'll be
algebrizing queries without having the values available. We'll be able
to do a better job of algebrizing, and also do more validating, if we
allow callers to specify the types of variables in advance, even if the
values aren't known.
2017-04-18 13:19:50 -07:00
Richard Newman
a9a82ea1a7 Part 1: parse :in.
We also at this point switch from using `Vec<Variable>` to
`BTreeSet<Variable>`. This allows us to guarantee no duplicates later;
we'll reject duplicates at parse time.
2017-04-18 13:19:50 -07:00
Richard Newman
01af45ab3f Pre: define Display for ValueType. 2017-04-18 13:19:50 -07:00
Richard Newman
8ddbc834ae Pre: take Variables instead of Strings in public API, for now. 2017-04-18 13:19:50 -07:00
Richard Newman
5718ce0155 Pre: add two checks to translate tests to fix unused var warning. 2017-04-18 13:19:50 -07:00
Richard Newman
5cd53aff44 Pre: unused imports. 2017-04-18 13:19:50 -07:00
Brian Grinstead
99b7e89116 Make struct Conn public. (#419) r=rnewman 2017-04-18 11:04:44 -07:00
Richard Newman
35d73d5541 Implement :order. (#415) (#416) r=nalexander
This adds an `:order` keyword to `:find`.

If present, the results of the query will be an ordered set, rather than
an unordered set; rows will appear in an ordered defined by each
`:order` entry.

Each can be one of three things:

- A var, `?x`, meaning "order by ?x ascending".
- A pair, `(asc ?x)`, meaning "order by ?x ascending".
- A pair, `(desc ?x)`, meaning "order by ?x descending".

Values will be ordered in this sequence for asc, and in reverse for desc:

1. Entity IDs, in ascending numerical order.
2. Booleans, false then true.
3. Timestamps, in ascending numerical order.
4. Longs and doubles, intermixed, in ascending numerical order.
5. Strings, in ascending lexicographic order.
6. Keywords, in ascending lexicographic order, considering the entire
   ns/name pair as a single string separated by '/'.

Subcommits:

Pre: make bound_value public.
Pre: generalize ErrorKind::UnboundVariable for use in order.
Part 1: parse (direction, var) pairs.
Part 2: parse :order clause into FindQuery.
Part 3: include order variables in algebrized query.

We add order variables to :with, so we can reuse its type tag projection
logic, and so that we can phrase ordering in terms of variables rather
than datoms columns.

Part 4: produce SQL for order clauses.
2017-04-17 11:30:31 -07:00
Richard Newman
64acc6a7ee Support :with (#311) (#414) r=nalexander
* Pre: refactor projector code.
* Part 1: maintain 'with' variables in AlgebrizedQuery.
* Part 2: include necessary 'with' variables in SQL projection list.

The test produces projection elements for `:with`, even though there are
no aggregates in the query. This test will need to be adjusted when we
optimize this away!
2017-04-17 09:23:55 -07:00
Richard Newman
d8f761993d Implement complex or joins. (#410) r=nalexander 2017-04-12 19:23:40 -07:00
Richard Newman
758ab8b476 Part 5: add more tests for complex or. 2017-04-12 19:21:56 -07:00
Richard Newman
bca8b7e322 Part 4: correct projection of type tags in the outermost projector. 2017-04-12 19:21:56 -07:00
Richard Newman
d8075aa07d Part 3: finish expansion and translation of complex or.
This commit turns complex `or` -- `or`s in which not all variables are
unified, or in which not all arms are the same shape -- into a
computed table.

We do this by building a template CC that shares some state with the
destination CC, applying each arm of the `or` to a copy of the template
as if it were a standalone query, then building a projection list and
creating a `ComputedTable::Union`. This is pushed into the destination
CC's `computed_tables` list.

Finally, the variables projected from the UNION are bound in the
destination CC, so that unification occurs, and projection of the
outermost query can use bindings established by the `or-join`.

This commit includes projection of type codes from heterogeneous `UNION`
arms: we compute a list of variables for which a definite type is
unknown in at least one arm, and force all arms to project either a type
tag column or a fixed type. It's important that each branch of a UNION
project the same columns in the same order, hence the projection of
fixed values.

The translator is similarly extended to project the type tag column name
or the known value_type_tag to support this.

Review comment: clarify union type extraction.
2017-04-12 19:21:45 -07:00
Richard Newman
08d2c613a4 Part 2: expand the definition of a table to include computed tables.
This commit:

- Defines a new kind of column, distinct from the eavt columns in
  `DatomsColumn`, to model the rows projected from subqueries. These
  always name one of two things: a variable, or a variable's type tag.
  Naturally the two cases are thus `Variable` and `VariableTypeTag`.
  These are cheap to clone, given that `Variable` is an `Rc<String>`.
- Defines `Column` as a wrapper around `DatomsColumn` and
  `VariableColumn`. Everywhere we used to use `DatomsColumn` we now
  allow `Column`: particularly in constraints and projections.
- Broadens the definition of a table list in the intermediate
  "query-sql" representation to include a SQL UNION. A UNION is
  represented as a list of queries and an alias.
- Implements translation from a `ComputedTable` to the query-sql
  representation. In this commit we only project vars, not type tags.

Review comment: discuss bind_column_to_var for ValueTypeTag.
Review comment: implement From<Vec<T>> for ConsumableVec<T>.
2017-04-12 19:21:33 -07:00
Richard Newman
7948788936 Part 1: define ComputedTable.
Complex `or`s are translated to SQL as a subquery -- in particular, a
subquery that's a UNION. Conceptually, that subquery is a computed
table: `all_datoms` and `datoms` yield rows of e/a/v/tx, and each
computed table yields rows of variable bindings.

The table itself is a type, `ComputedTable`. Its `Union` case contains
everything a subquery needs: a `ConjoiningClauses` and a projection
list, which together allow us to build a SQL subquery, and a list of
variables that need type code extraction. (This is discussed further in
a later commit.)

Naturally we also need a way to refer to columns in a computed table.
We model this by a new enum case in `DatomsTable`, `Computed`, which
maintains an integer value that uniquely identifies a computed table.
2017-04-12 11:13:58 -07:00
Richard Newman
79ccd818f3 Pre: use ..Default approach for use_as_template and make_receptacle.
I decided this was more efficient (no temporary attributes and
mutability) and less confusing.
2017-04-12 11:12:49 -07:00
Richard Newman
98ac559894 Pre: allow initialization of a CC with an arbitrary counter value. Useful for testing. 2017-04-12 11:12:48 -07:00
Richard Newman
33fa1261b8 Pre: clone alias_counter into concretes.
This ensures that concrete CC clones don't have overlapping counts.
2017-04-12 11:11:56 -07:00
Richard Newman
b9f9b4ff58 Pre: make extracted_types pub so the projector and translator can use it. 2017-04-12 11:11:56 -07:00
Richard Newman
e984e02529 Pre: comment RcCounter. 2017-04-12 11:11:54 -07:00
Richard Newman
1636134a72 Algebrize simple or joins. (#304) r=nalexander 2017-04-07 12:47:02 -07:00
Richard Newman
e280811243 Part 7: use RcCounter to implement aliasing in ConjoiningClauses.
This allows us to share a counter between templates produced from a CC.
2017-04-07 12:46:34 -07:00
Richard Newman
2b61944f09 Part 6: track why an empty or-join failed. 2017-04-07 12:46:30 -07:00
Richard Newman
b693385495 Part 5: eliminate is_known_empty in favor of empty_because and an accessor. 2017-04-07 12:46:26 -07:00
Richard Newman
a07efc0a9e Part 4: look up attributes for bound variables when making type determinations. 2017-04-07 12:46:26 -07:00
Richard Newman
72977f52e4 Part 3: reinstate extracted type pruning.
When we started expanding and narrowing type sets, it became impossible
to conclusively know during pattern application whether a type was
known. We now figure that out at the end: if a variable has only a
single known type, we don't need to extract its type tag.
2017-04-07 12:46:26 -07:00
Richard Newman
0639c94468 Part 2: implement simple or. 2017-04-07 12:46:25 -07:00
Richard Newman
9df18e4286 Part 1: implement type narrowing and broadening. 2017-04-07 12:44:03 -07:00
Richard Newman
b117e2c463 Implement a cloneable shared counter. (#407) r=nalexander 2017-04-07 12:43:50 -07:00
Nick Alexander
5369f03464 Improve parsing of nested edn::ValueAndSpan streams. r=rnewman (#393)
* Pre: Expose more in edn.

* Pre: Make it easier to work with ValueAndSpan.

with_spans() is a temporary hack, needed only because I don't care to
parse the bootstrap assertions from text right now.

* Part 1a: Add `value_and_span` for parsing nested `edn::ValueAndSpan` instances.

I wasn't able to abstract over `edn::Value` and `edn::ValueAndSpan`;
there are multiple obstacles.  I chose to roll with
`edn::ValueAndSpan` since it exposes the additional span information
that we will want to form good error messages in the future.

* Part 1b: Add keyword_map() parsing an `edn::Value::Vector` into an `edn::Value::map`.

* Part 1c: Add `Log`/`.log(...)` for logging parser progress.

This is a terrible hack, but it sure helps to debug complicated nested
parsers.  I don't even know what a principled approach would look
like; since our parser combinators are so frequently expressed in
code, it's hard to imagine a data-driven interpreter that can help
debug things.

* Part 2: Use `value_and_span` apparatus in tx-parser/.

I break an abstraction boundary by returning a value column
`edn::ValueAndSpan` rather than just an `edn::Value`.  That is, the
transaction processor shouldn't care where the `edn::Value` it is
processing arose -- even we care to track that information we should
bake it into the `Entity` type.  We do this because we need to
dynamically parse the value column to support nested maps, and parsing
requires a full `edn::ValueAndSpan`.  Alternately, we could cheat and
fake the spans when parsing nested maps, but that's potentially
expensive.

* Part 3: Use `value_and_span` apparatus in query-parser/.

* Part 4: Use `value_and_span` apparatus in root crate.

* Review comment: Make Span and SpanPosition Copy.

* Review comment: nits.

* Review comment: Make `or` be `or_exactly`.

I baked the eof checking directly into the parser, rather than using
the skip and eof parsers.  I also took the time to restore some tests
that were mistakenly commented out.

* Review comment: Extract and use def_matches_* macros.

* Review comment: .map() as late as possible.
2017-04-06 10:06:28 -07:00
Richard Newman
a5023c70cb Use Rc for TypedValue, Variable, and query Ident keywords. (#395) r=nalexander
Part 1, core: use Rc for String and Keyword.
Part 2, query: use Rc for Variable.
Part 3, sql: use Rc for args in SQLiteQueryBuilder.
Part 4, query-algebrizer: use Rc.
Part 5, db: use Rc.
Part 6, query-parser: use Rc.
Part 7, query-projector: use Rc.
Part 8, query-translator: use Rc.
Part 9, top level: use Rc.
Part 10: intern Ident and IdentOrKeyword.
2017-04-02 21:38:36 -07:00
Richard Newman
8ae8466cf9 Make InternSet::intern accept Into<Rc<T>>. Add a test. r=nalexander 2017-03-31 09:59:58 -07:00
Richard Newman
92cdd72500 Diagnose and prepare simple and complex or joins. (#396) r=nalexander 2017-03-30 19:14:05 -07:00
Richard Newman
2b2b5cf696 Part 6: implement decision tree for processing simple alternation. 2017-03-30 19:13:40 -07:00
Richard Newman
74f188df9b Part 5b: rename also/instead to add_intersection and add_alternate. 2017-03-30 19:13:20 -07:00
Richard Newman
9e5c735460 Part 5: split cc.rs into a 'clauses' module.
mod.rs defines the module and ConjoiningClauses itself, complete with
methods to record facts and ask it questions.

pattern.rs, predicate.rs, resolve.rs, and or.rs include particular
functionality around accumulating certain kinds of patterns.

Only `or.rs` includes significant new code; the rest is just split.
2017-03-30 19:13:20 -07:00
Richard Newman
72eeedec74 Part 4: add OrJoin::is_fully_unified.
This allows us to tell if all the variables in a valid `or` join are to
be unified, which is necessary for simple joins.
2017-03-30 19:13:20 -07:00
Richard Newman
ce3c4f0dca Part 3: have table_for_places return a Result, not an Option. 2017-03-30 19:13:20 -07:00
Richard Newman
01ca0ae5c1 Part 2: add an EmptyBecause case for fulltext/non-string type mismatch. 2017-03-30 19:13:19 -07:00
Richard Newman
997df0b776 Part 1: introduce ColumnIntersection and ColumnAlternation.
This provides a limited form of OR and AND for column constraints, allowing
simple 'or-join' queries to be expressed on a single table alias.
2017-03-30 19:13:19 -07:00
Richard Newman
460fdac252 Pre: add Variable::from_valid_name, TypedValue::{typed_string,typed_ns_keyword}. 2017-03-30 19:13:19 -07:00
Richard Newman
439f3a2283 Pre: add some 'am I a pattern?' helper predicates to clause types. 2017-03-30 19:06:33 -07:00