[query] Constraint-based algebrizing #147

New issue

Open

opened 2020-08-06 16:55:43 +00:00 by gburd · 0 comments

gburd commented

2020-08-06 16:55:43 +00:00

(Migrated from github.com)

The current algebrizer — mainly ConjoiningClauses — is deliberately procedural and predictable. Paired with the simple query translator, we quite directly translate Datalog into SQL.

CC performs two roles: it tracks the metadata it needs to not be completely clumsy (the types deduced for variables and known bindings), and it accumulates some information that should really be part of a planner: table allocations and mappings between their columns and variables.

Again, that works fine for simple and predictable translation.

Extending this approach, however, to attain more power gets… hacky.

Query simplification/optimization (e.g., removal of redundant patterns and constraints).
Algebrizing queries down to static results (e.g., [:find ?x :where [?x :db/ident :foo/bar]]).
Value space intersection and optimization (e.g., in [:find ?x :where [?x :foo/age ?z][?y :foo/age ?o] [(!= ?x ?y)] [(> ?z 20)] [(< ?o 30)] [(> ?o ?z)]] we can derive a lower bound for ?o).
Deducing attributes from their ranges (e.g., [:find ?x :where [?x ?y :foo/bar]] can deduce the value of ?y in the current database from the schema alone, and [:find ?x :where [?x ?y true] [?z ?y ?z]] can never bind a single ?y).

At some point it might be worthwhile to more fully separate algebrize+translate into algebrize+plan+translate, with the algebrizing stage being primarily a constraint system used to derive something like CC.

The risk of doing so is that the planner input would be less ordered, and thus the translated output would likely be less predictable and less related to the input. Small changes to the input query could cause large changes to what we can deduce.

The current algebrizer — mainly `ConjoiningClauses` — is deliberately _procedural_ and _predictable_. Paired with the simple query translator, we quite directly translate Datalog into SQL. CC performs two roles: it tracks the metadata it needs to not be completely clumsy (the types deduced for variables and known bindings), and it accumulates some information that should really be part of a _planner_: table allocations and mappings between their columns and variables. Again, that works fine for simple and predictable translation. Extending this approach, however, to attain more power gets… hacky. - Query simplification/optimization (_e.g._, removal of redundant patterns and constraints). - Algebrizing queries down to static results (_e.g._, `[:find ?x :where [?x :db/ident :foo/bar]]`). - Value space intersection and optimization (_e.g._, in `[:find ?x :where [?x :foo/age ?z][?y :foo/age ?o] [(!= ?x ?y)] [(> ?z 20)] [(< ?o 30)] [(> ?o ?z)]]` we can derive a lower bound for `?o`). - Deducing attributes from their ranges (_e.g._, `[:find ?x :where [?x ?y :foo/bar]]` can deduce the value of `?y` in the current database from the schema alone, and `[:find ?x :where [?x ?y true] [?z ?y ?z]]` can never bind a single `?y`). At some point it might be worthwhile to more fully separate algebrize+translate into algebrize+plan+translate, with the algebrizing stage being primarily a constraint system used to derive something like CC. The risk of doing so is that the planner input would be less ordered, and thus the translated output would likely be less predictable and less related to the input. Small changes to the input query could cause large changes to what we can deduce.