Discuss algebrizing and query predictability.

Richard Newman 2017-02-10 21:41:36 -08:00
parent ea8f126858
commit feb91b3f77

@ -3,8 +3,8 @@ Query evaluation follows this conceptual sequence:
1. EDN parsing.
2. Query parsing: turning EDN into an abstract representation of a valid query.
3. Expansion/planning: interpreting a parsed query into a set of relations and data structures within the context of a database.
4. Query translation: turning the plan into SQL.
3. Algebrizing: interpreting a parsed query into a set of relations and data structures within the context of a database.
4. Query translation: turning the plan into SQL. This is, in a sense, a simple form of query planning.
5. Projection translation: turning the plan into a set of projectors/accumulators that will turn SQL result rows into Datalog results.
6. (Implicit) preparation: creating views, tables, indices, or even whole databases or connections upon which the query will be run.
7. Execution: running the translated SQL against the database.
@ -18,14 +18,14 @@ This entire step can take place in advance (indeed, at compile-time): it doesn't
The second sub-stage of parsing — from EDN to a specific representation — involves some validation, such that all parsed queries make some amount of sense: for example, that the projected variables intersect with the variables present in the `:in` and `:where` clauses of the query; that all branches of an `or` work with the same variables; that forms are the correct lengths. The type system will help to encode and enforce many (but not all) of these constraints.
## Expansion
## Algebrizing
This is the point in the process at which the contents of the database — in particular, its schema and ident mappings — are first used.
The parsed query is walked and used to populate a `Context` and a nested set of `ConjoiningClauses`. These collect known types (_e.g._, if `[?x :foo/bar 5]` appears in a query, we know that `?x` must be an entity; if `[(> ?y 5)]` appears, we know that `?y` must be of some numeric type), value constraints, and the beginnings of SQL-oriented planning.
The parsed query is walked and (in conjunction with the schema and ident map) used to populate a `Context` and a nested set of `ConjoiningClauses`. These collect known types (_e.g._, if `[?x :foo/bar 5]` appears in a query, we know that `?x` must be an entity; if `[(> ?y 5)]` appears, we know that `?y` must be of some numeric type), value constraints, and the beginnings of SQL-oriented planning.
During this processing we might encounter inconsistencies: a query might imply that `?y` must be both an entity and an `instant`. The entire clause is then unsatisfiable, and we can prune as we go. If an entire query is known to return no results prior to execution, we might report this as an error.
During this algebrizing we might encounter inconsistencies: a query might imply that `?y` must be both an entity and an `instant`. The entire clause is then unsatisfiable, and we can prune as we go. If an entire query is known to return no results prior to execution, we might report this as an error.
Naturally these constraints are tied to the store's schema at the time of query execution: `:foo/bar` might be a unique integer attribute in one store, and a fulltext-indexed string attribute in another. They are also tied to any external bindings, which leaves us an implementation choice: to re-plan each query whenever a binding changes (more expensive, simpler, produces a more efficient query), or to produce a planned query that can accommodate any kind of binding (allows reuse, but leads to runtime limitations).
Naturally these constraints are tied to the store's schema at the time of query execution: `:foo/bar` might be a unique integer attribute in one store, and a fulltext-indexed string attribute in another. They are also tied to any external bindings, which leaves us an implementation choice: to re-plan each query whenever a binding changes (more expensive, simpler, produces a more efficient query), or to produce a planned query that can accommodate any kind of binding (allows reuse, but leads to runtime limitations). See #278 for that.
## Query translation
This is a good time to briefly cover how data is stored in SQLite.
@ -67,6 +67,8 @@ SQLite will likely implement this as an `avet` index walk over `datoms` binding
A reader with some database experience will notice at this point that two other obvious compositions exist: we could have a separate datoms table for each known attribute, or for each _part_; or we could combine values for related sets of attributes into multiple columns on the same table, producing something that's much closer to a traditional SQL schema. These are relatively straightforward extensions that we plan to explore later, safely hidden behind our API abstractions.
We attempt to make the produced query _predictable_: there is a correspondence between the phrasing of the input query (even down to clause order) and the phrasing of the generated SQL, allowing users to iteratively explore query performance. We expect that most queries will be run more often than they will be written, and so at this stage it's more important to allow for predictable tuning than to aim for virtuoso plans.
## Projection
A SQL query returns SQL values. But we work with a richer type system — booleans, dates, keywords — and so some translation is necessary. Further, we will support extra-SQL aggregates (simple aggregates are already directly translated into SQL), and other additions like evaluating pull expressions on query results.