[query] Second algebrizing phase for bound variables in prepared statements #121
Labels
No labels
A-build
A-cli
A-core
A-design
A-edn
A-ffi
A-query
A-sdk
A-sdk-android
A-sdk-ios
A-sync
A-transact
A-views
A-vocab
P-Android
P-desktop
P-iOS
bug
correctness
dependencies
dev-ergonomics
discussion
documentation
duplicate
enhancement
enquiry
good first bug
good first issue
help wanted
hygiene
in progress
invalid
question
ready
size
speed
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: greg/mentat#121
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
In the ClojureScript implementation of Mentat, we processed a query from EDN through to SQL results each time it was run.
This isn't ideal for real-world use: we much prefer to use prepared statements, which allow us to parse a query once, plan it infrequently, and run it many times.
Nevertheless, in the ClojureScript implementation we did things right: external inputs (bound variables) were threaded into the mix after algebrizing, at the point of translation from query context to SQL, rather than being substituted before algebrizing.
At present, our Rust algebrizing phase (part-implemented!) deliberately ignores bound variables, just as in CLJS.
That allows algebrizing to be done once for each schema: a query can be 'prepared' once, and only needs to be re-prepared if the set of idents or attributes changes. Values that match any deduced type constraints can be substituted into the same prepared query. (Values that don't match will cause the query to automatically return no results.)
For many uses of bound (
:in
) variables, this is fine: queries likewon't benefit from further analysis at the point prior to execution when
?name
is known. But queries likecertainly would — knowing
?a
allows us to know both which table to query and the expected type of?z
when projected. A query planned with an unbound?a
must queryall_datoms
and project?z
'svalue_type_tag
to do runtime type projection, both of which are expensive.We should do two things:
:in
variables whose types are unknown", or perhaps "one of the:in
variables names an attribute".Queries with no bound parameters, or that we're happy not re-algebrizing, can be translated directly into SQLite statements for each connection. Other queries will generate a new SQL query each time (though obviously memoization is possible), and so the potential speedups of a more precise query formulation must be traded off against the loss of preparation. This might be a choice we leave to users, just as SQL databases give the option of keeping prepared statements.
I expect queries with certain kinds of
:in
parameters will always need a second phase of processing, or need to be entirely processed prior to execution. A query with a source var%foo
needs to know which tables provide%foo
.Queries can also take a
?log
variable representing the database transaction log. That's used in atx-ids
ortx-data
expression. It's not clear to me how we'll handle that yet.This issue tracks figuring out this work when the first stage of the algebrizer is complete.