[query] Decide on an approach for re-algebrizing and caching queries #154

Open
opened 2020-08-06 16:55:51 +00:00 by gburd · 0 comments
gburd commented 2020-08-06 16:55:51 +00:00 (Migrated from github.com)

A query goes through several stages between input and execution.

Most likely we will want to cache/prepare up to four of these:

  • A pre-parsed query, ready to re-algebrize if the schema changes.
  • The algebrized query, ready to be re-algebrized with bound query inputs.
  • The resultant SQL query and its inputs, ready to be re-prepared against a new SQLite connection.
  • The SQLite statement object (and its bound inputs, if necessary), ready to be re-run against the same connection against which it was prepared.

In the first and second cases, we hit an issue with Rc, which isn't Send: a query cache must be per-thread.

In the third case, we also hit a snag with Rc: we can't use Rc in query arguments, so the definition of SQLQuery will have to change a little. (We'll also hit this if we need to keep args around in the fourth case.)

We should decide on our path forward; we will hit this as soon as we begin to write a multi-threaded query consumer. Our options are:

  • Limit prepared statements and parsed queries to a single thread. This is the case with SQLite. Further, don't allow re-algebrizing. This is pretty much the current state of affairs.
  • Allow portability of one of the four formats: e.g., phrase SQLQuery in terms of Arc or nothing at all, allowing it to be reused, but otherwise not allowing these internal states to leak across threads.
  • Use Arc instead of Rc. We don't mutate the reference count very often.
  • Use an explicit interning approach instead of refcounting.
A query goes through [several stages between input and execution](https://github.com/mozilla/mentat/wiki/Querying). Most likely we will want to cache/prepare up to four of these: - A pre-parsed query, ready to re-algebrize if the schema changes. - The algebrized query, ready to be re-algebrized with bound query inputs. - The resultant SQL query and its inputs, ready to be re-prepared against a new SQLite connection. - The SQLite statement object (and its bound inputs, if necessary), ready to be re-run against the same connection against which it was prepared. In the first and second cases, we hit an issue with `Rc`, which isn't `Send`: a query cache must be per-thread. In the third case, we also hit a snag with `Rc`: we can't use `Rc` in query arguments, so the definition of `SQLQuery` will have to change a little. (We'll also hit this if we need to keep args around in the fourth case.) We should decide on our path forward; we will hit this as soon as we begin to write a multi-threaded query consumer. Our options are: - Limit prepared statements and parsed queries to a single thread. This is the case with SQLite. Further, don't allow re-algebrizing. This is pretty much the current state of affairs. - Allow portability of one of the four formats: _e.g._, phrase `SQLQuery` in terms of `Arc` or nothing at all, allowing it to be reused, but otherwise not allowing these internal states to leak across threads. - Use `Arc` instead of `Rc`. We don't mutate the reference count very often. - Use an explicit interning approach instead of refcounting. - …
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: greg/mentat#154
No description provided.