Directly intern values from an input str slice #135

Open
opened 2020-08-06 16:55:30 +00:00 by gburd · 0 comments
gburd commented 2020-08-06 16:55:30 +00:00 (Migrated from github.com)

Each of our structs keeps an owned value.

To avoid excessive duplication, we wish to use Rc here to avoid having thousands of duplicated strings.

But even with InternSet and Rc, we end up creating garbage:

  • Parser sees a &str.
  • Parser creates a struct like Keyword, which clones the slice into a new String.
  • Parser or consumer looks up the Keyword (wrapped in an Rc) in InternSet, which drops the new keyword and returns the existing one.

It would be good to come up with a strategy for going straight from the &str to the Rc<Keyword>. This might be a factory method provided to the parser. It might be an extended EDN representation (edn::Value<InternedString>?). It might be a kind of Borrow hook.

It's also possible that LLVM will optimize away that allocation… but I doubt it, particularly in the way we currently work (which means collecting all of these Keywords and getting rid of the duplicates later).

This ticket involves a fair amount of good Rust judgment, so it's not suitable for beginners.

Each of our structs keeps an owned value. To avoid excessive duplication, we wish to use `Rc` here to avoid having thousands of duplicated strings. But even with `InternSet` and `Rc`, we end up creating garbage: - Parser sees a `&str`. - Parser creates a struct like `Keyword`, which clones the slice into a new `String`. - Parser or consumer looks up the `Keyword` (wrapped in an `Rc`) in `InternSet`, which drops the new keyword and returns the existing one. It would be good to come up with a strategy for going straight from the `&str` to the `Rc<Keyword>`. This might be a factory method provided to the parser. It might be an extended EDN representation (`edn::Value<InternedString>`?). It might be a kind of `Borrow` hook. It's also possible that LLVM will optimize away that allocation… but I doubt it, particularly in the way we currently work (which means collecting all of these `Keyword`s and getting rid of the duplicates later). This ticket involves a fair amount of good Rust judgment, so it's not suitable for beginners.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: greg/mentat#135
No description provided.