[query] Intern values in query parser #153

Open
opened 2020-08-06 16:55:50 +00:00 by gburd · 0 comments
gburd commented 2020-08-06 16:55:50 +00:00 (Migrated from github.com)

It's currently difficult for us to add any kind of interning to the EDN parser: see https://github.com/kevinmehall/rust-peg/issues/84.

However, in #395 I'm about to make Variable and TypedValue wrappers around an Rc. We thus have an opportunity to intern query parts within the query parser itself, even if the EDN value stream itself contains duplicate strings

This has some immediate value: not only do we get cloneable ConjoiningClauses (and other consumers of Variable and TypedValue) — the point of #395 — but also as we drop the repeated [edn::Value] parser inputs we can prune some memory.

To do this involves maintaining state for the duration of our combine parse: probably a little struct around a few InternSet<PlainSymbol> and InternSet<String> instances.

This would be threaded into the top parser (Find::find), and then down into each parser it creates. I think the simplest way to do that — avoiding lifetime and mutability issues — is to wrap our interner in an Rc and pass it by cloning. (It's theoretically possible to use a ThreadLocal for this, but global state is a bit of a downer.)

We'd discard the interner when we're done with the parse. A future optimization is to keep it around….

It's currently difficult for us to add any kind of interning to the EDN parser: see https://github.com/kevinmehall/rust-peg/issues/84. However, in #395 I'm about to make `Variable` and `TypedValue` wrappers around an `Rc`. We thus have an opportunity to intern query parts within the query parser itself, even if the EDN value stream itself contains duplicate strings This has some immediate value: not only do we get cloneable `ConjoiningClauses` (and other consumers of `Variable` and `TypedValue`) — the point of #395 — but also as we drop the repeated `[edn::Value]` parser inputs we can prune some memory. To do this involves maintaining state for the duration of our `combine` parse: probably a little struct around a few `InternSet<PlainSymbol>` and `InternSet<String>` instances. This would be threaded into the top parser (`Find::find`), and then down into each parser it creates. I think the simplest way to do that — avoiding lifetime and mutability issues — is to wrap our interner in an `Rc` and pass it by cloning. (It's theoretically possible to use a `ThreadLocal` for this, but global state is a bit of a downer.) We'd discard the interner when we're done with the parse. A future optimization is to keep it around….
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: greg/mentat#153
No description provided.