[query] Decide on an approach for re-algebrizing and caching queries #154

New issue

Open

opened 2020-08-06 16:55:51 +00:00 by gburd · 0 comments

gburd commented

2020-08-06 16:55:51 +00:00

(Migrated from github.com)

A query goes through several stages between input and execution.

Most likely we will want to cache/prepare up to four of these:

A pre-parsed query, ready to re-algebrize if the schema changes.
The algebrized query, ready to be re-algebrized with bound query inputs.
The resultant SQL query and its inputs, ready to be re-prepared against a new SQLite connection.
The SQLite statement object (and its bound inputs, if necessary), ready to be re-run against the same connection against which it was prepared.

In the first and second cases, we hit an issue with Rc, which isn't Send: a query cache must be per-thread.

In the third case, we also hit a snag with Rc: we can't use Rc in query arguments, so the definition of SQLQuery will have to change a little. (We'll also hit this if we need to keep args around in the fourth case.)

We should decide on our path forward; we will hit this as soon as we begin to write a multi-threaded query consumer. Our options are:

Limit prepared statements and parsed queries to a single thread. This is the case with SQLite. Further, don't allow re-algebrizing. This is pretty much the current state of affairs.
Allow portability of one of the four formats: e.g., phrase SQLQuery in terms of Arc or nothing at all, allowing it to be reused, but otherwise not allowing these internal states to leak across threads.
Use Arc instead of Rc. We don't mutate the reference count very often.
Use an explicit interning approach instead of refcounting.
…

A query goes through [several stages between input and execution](https://github.com/mozilla/mentat/wiki/Querying). Most likely we will want to cache/prepare up to four of these: - A pre-parsed query, ready to re-algebrize if the schema changes. - The algebrized query, ready to be re-algebrized with bound query inputs. - The resultant SQL query and its inputs, ready to be re-prepared against a new SQLite connection. - The SQLite statement object (and its bound inputs, if necessary), ready to be re-run against the same connection against which it was prepared. In the first and second cases, we hit an issue with `Rc`, which isn't `Send`: a query cache must be per-thread. In the third case, we also hit a snag with `Rc`: we can't use `Rc` in query arguments, so the definition of `SQLQuery` will have to change a little. (We'll also hit this if we need to keep args around in the fourth case.) We should decide on our path forward; we will hit this as soon as we begin to write a multi-threaded query consumer. Our options are: - Limit prepared statements and parsed queries to a single thread. This is the case with SQLite. Further, don't allow re-algebrizing. This is pretty much the current state of affairs. - Allow portability of one of the four formats: _e.g._, phrase `SQLQuery` in terms of `Arc` or nothing at all, allowing it to be reused, but otherwise not allowing these internal states to leak across threads. - Use `Arc` instead of `Rc`. We don't mutate the reference count very often. - Use an explicit interning approach instead of refcounting. - …