[query] Implement pull expressions in queries #92

Open
opened 2020-08-06 16:54:47 +00:00 by gburd · 0 comments
gburd commented 2020-08-06 16:54:47 +00:00 (Migrated from github.com)

Datomic splits query logic between a relatively simple "find" part, to narrow down to a set of bindings, and a relatively expressive "pull" part that fans out to retrieve data.

This separation is useful: these are two different concerns, and one of the tricky parts of complex SQL is untangling navigation and refinement from retrieval.

It's also how Datomic most naturally expresses optional values — they're simply missing from the pulled output. Optional values inside find aren't possible; the closest you can get is get-else (#308).

Pull expressions can be arbitrarily complex: recursive up to a fixed or arbitrary depth; returning a limited number of results; reversed; nested; and fetching component values even when attributes are not known in advance.

Additionally, pull results are structured: they don't fit easily into the standard rel/coll/scalar TypedValue model.

Simple pull queries like:

[:find (pull ?john [*])
 :where [?john :person/name "John"]]

can compile to a particular kind of projector and a SQL query like:

SELECT a, v, value_type_tag FROM datoms
WHERE e IN (SELECT datoms00.e AS `?john` FROM datoms datoms00 WHERE a = 99 AND v = $v0)

Clearly it's infeasible to produce a single SQL query for a pull query like:

[:find ?person ?employer (pull ?person [:person/email {:person/friend 3} (limit :person/pets 5)])
 :where [?person :person/employer ?employer]

which might return something like:

[[12345 23456 {:person/email "joe@example.com"
               :person/friend [12346 12347 12349]
               :person/pet [{:pet/kind "chicken", :pet/name "Kylo Hen"}]}]]

For these queries we need to compile the pull expression into a collection of parameterized queries which can be prepared and run with bindings from the main find query. These results can be processed and combined with the simple bindings.

Work in this issue will consist of:

  • Parsing simple pull expressions.
  • Defining a structured, nested pull results format — indeed, four formats.
  • Generating SQL from simple pull expressions.
  • Detecting the simplest pull expressions (scalar, or those without additional bindings), compiling those into a single (nested) SQL query.
  • Writing a projector to accumulate rows into structure.
  • Extending the query interface to return a suitable format.
  • Handling reverse attributes in pull expressions.
  • Implementing a prepared-statement-based approach to non-trivial pull expressions.
  • Figuring out how to do recursion, including limited recursion.

Just tackling the simplest case will be useful.

Datomic splits query logic between a relatively simple "find" part, to narrow down to a set of bindings, and a relatively expressive "pull" part that fans out to retrieve data. This separation is useful: these are two different concerns, and one of the tricky parts of complex SQL is untangling _navigation_ and _refinement_ from _retrieval_. It's also how Datomic most naturally expresses optional values — they're simply missing from the pulled output. Optional values inside `find` aren't possible; the closest you can get is `get-else` (#308). Pull expressions can be arbitrarily complex: recursive up to a fixed or arbitrary depth; returning a limited number of results; reversed; nested; and fetching component values _even when attributes are not known in advance_. Additionally, pull results are _structured_: they don't fit easily into the standard rel/coll/scalar `TypedValue` model. Simple pull queries like: ```edn [:find (pull ?john [*]) :where [?john :person/name "John"]] ``` can compile to a particular kind of projector and a SQL query like: ```sql SELECT a, v, value_type_tag FROM datoms WHERE e IN (SELECT datoms00.e AS `?john` FROM datoms datoms00 WHERE a = 99 AND v = $v0) ``` Clearly it's infeasible to produce a single SQL query for a pull query like: ```edn [:find ?person ?employer (pull ?person [:person/email {:person/friend 3} (limit :person/pets 5)]) :where [?person :person/employer ?employer] ``` which might return something like: ```edn [[12345 23456 {:person/email "joe@example.com" :person/friend [12346 12347 12349] :person/pet [{:pet/kind "chicken", :pet/name "Kylo Hen"}]}]] ``` For these queries we need to compile the pull expression into a collection of parameterized queries which can be prepared and run with bindings from the main `find` query. These results can be processed and combined with the simple bindings. Work in this issue will consist of: - [x] Parsing simple pull expressions. - [x] Defining a structured, nested pull results format — indeed, four formats. - [x] Generating SQL from simple pull expressions. - [x] Detecting the simplest pull expressions (scalar, or those without additional bindings), compiling those into a single (nested) SQL query. - [x] Writing a projector to accumulate rows into structure. - [x] Extending the query interface to return a suitable format. - [ ] Handling reverse attributes in pull expressions. - [ ] Implementing a prepared-statement-based approach to non-trivial pull expressions. - [ ] Figuring out how to do recursion, including limited recursion. Just tackling the simplest case will be useful.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: greg/mentat#92
No description provided.