[query] Field retrieval optimization #158

Open
opened 2020-08-06 16:55:54 +00:00 by gburd · 0 comments
gburd commented 2020-08-06 16:55:54 +00:00 (Migrated from github.com)

Consider a query like

[:find [?url ?description]
 :where
                    (or-join [?page]
                      [?page :page/url "http://foo.com/"]
                      [?page :page/title "Foo"])
                    [?page :page/url ?url]
                    [?page :page/description ?description]]

In other words: find bindings for ?page, then retrieve some values for that binding.

We currently produce the following SQL:

SELECT `datoms01`.v AS `?url`,
       `datoms02`.v AS `?description`
FROM `datoms` AS `datoms00`,
     `datoms` AS `datoms01`,
     `datoms` AS `datoms02`
WHERE ((`datoms00`.a = 97 AND `datoms00`.v = $v0)
    OR (`datoms00`.a = 98 AND `datoms00`.v = $v1))
   AND `datoms01`.a = 97
   AND `datoms02`.a = 99
   AND `datoms00`.e = `datoms01`.e
   AND `datoms00`.e = `datoms02`.e
LIMIT 1"

If :page/url and :page/description are unique, or — as in this case — we're returning a single tuple, we could instead produce two subqueries to represent the values, and filter out nulls:

SELECT `?page`,
       (SELECT v FROM `datoms` WHERE a = 97 AND e = `?page` LIMIT 1) AS `?url`,
       (SELECT v FROM `datoms` WHERE a = 98 AND e = `?page` LIMIT 1) AS `?description`
FROM (
  SELECT `datoms00`.e AS `?page`
  FROM `datoms` AS `datoms00`
  WHERE ((`datoms00`.a = 97 AND `datoms00`.v = $v0)
      OR (`datoms00`.a = 98 AND `datoms00`.v = $v1)))
LIMIT 1

(obviously eliminating the interior LIMIT for unique attributes).

I'm not sure if there's a speed win there, but it would be interesting to see if alternative representations of individual patterns could yield better queries.

Consider a query like ```edn [:find [?url ?description] :where (or-join [?page] [?page :page/url "http://foo.com/"] [?page :page/title "Foo"]) [?page :page/url ?url] [?page :page/description ?description]] ``` In other words: find bindings for `?page`, then retrieve some values for that binding. We currently produce the following SQL: ```sql SELECT `datoms01`.v AS `?url`, `datoms02`.v AS `?description` FROM `datoms` AS `datoms00`, `datoms` AS `datoms01`, `datoms` AS `datoms02` WHERE ((`datoms00`.a = 97 AND `datoms00`.v = $v0) OR (`datoms00`.a = 98 AND `datoms00`.v = $v1)) AND `datoms01`.a = 97 AND `datoms02`.a = 99 AND `datoms00`.e = `datoms01`.e AND `datoms00`.e = `datoms02`.e LIMIT 1" ``` If `:page/url` and `:page/description` are unique, or — as in this case — we're returning a single tuple, we could instead produce two subqueries to represent the values, and filter out nulls: ```sql SELECT `?page`, (SELECT v FROM `datoms` WHERE a = 97 AND e = `?page` LIMIT 1) AS `?url`, (SELECT v FROM `datoms` WHERE a = 98 AND e = `?page` LIMIT 1) AS `?description` FROM ( SELECT `datoms00`.e AS `?page` FROM `datoms` AS `datoms00` WHERE ((`datoms00`.a = 97 AND `datoms00`.v = $v0) OR (`datoms00`.a = 98 AND `datoms00`.v = $v1))) LIMIT 1 ``` (obviously eliminating the interior `LIMIT` for unique attributes). I'm not sure if there's a speed win there, but it would be interesting to see if alternative representations of individual patterns could yield better queries.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: greg/mentat#158
No description provided.