Transaction result processing: full-text values #192

Open
opened 2020-08-06 16:56:26 +00:00 by gburd · 0 comments
gburd commented 2020-08-06 16:56:26 +00:00 (Migrated from github.com)

In two places — the cache update observer (#566) and in sync — we retrieve values from a committed transaction.

Those values include references into the fulltext values table. These values have ValueType::String but a non-string value. That means you can't successfully make a TypedValue out of them.

The query engine knows what to do with these and automatically translates them back into strings in results sets.

For cache updates and sync we'll need to do similarly. In some situations we'll want to quietly and automatically substitute in a copy of the string retrieved from the values table.

In general, though, we'll want these consumers to be able to lazily handle fulltext strings. Typically they'll be on the larger side, perhaps unused (e.g., in change observers that just trigger a refresh), and potentially duplicated (the fulltext values table dedupes on insert). Ideally sync will itself be able to do content-addressable handling of large strings, reducing bytes stored and bytes transmitted.

That means we need an API for resolving (v, value_type_tag) pairs against the database, and also perhaps some logic to avoid using it in the simple case.

@grigoryk, this is the thing you mentioned on Slack this morning.

In two places — the cache update observer (#566) and in sync — we retrieve values from a committed transaction. Those values include _references_ into the fulltext values table. These values have `ValueType::String` but a non-string value. That means you can't successfully make a `TypedValue` out of them. The query engine knows what to do with these and automatically translates them back into strings in results sets. For cache updates and sync we'll need to do similarly. In some situations we'll want to quietly and automatically substitute in a copy of the string retrieved from the values table. In general, though, we'll want these consumers to be able to _lazily_ handle fulltext strings. Typically they'll be on the larger side, perhaps unused (_e.g._, in change observers that just trigger a refresh), and potentially duplicated (the fulltext values table dedupes on insert). Ideally sync will itself be able to do content-addressable handling of large strings, reducing bytes stored and bytes transmitted. That means we need an API for resolving `(v, value_type_tag)` pairs against the database, and also perhaps some logic to avoid using it in the simple case. @grigoryk, this is the thing you mentioned on Slack this morning.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: greg/mentat#192
No description provided.