Future: support for hardening parts of a schema into a tabular representation #72

Open
opened 2020-08-06 16:54:26 +00:00 by gburd · 0 comments
gburd commented 2020-08-06 16:54:26 +00:00 (Migrated from github.com)

We present a tradeoff:

  • Put all of your data in Datomish, getting expressivity but losing out on compactness and query performance for some kinds of regular data.
  • Put all of your data in a SQLite database, getting some significant performance improvements at a cost of expressivity.

It's not all that feasible to have a consumer manually do both. Information is messy and interconnected, which is one of the motivations for doing this work in the first place!

We ourselves split the difference in a couple of areas: for example, we store value tags and values next to each other, rather than being fully normalized.

Some consumers can guarantee that some of their data will be of a particular non-sparse shape. We should consider supporting specialized storage for subparts of a schema which puts values into columns, generating query clauses and processing transactions accordingly.

Restrictions might be:

  • Attributes must all be cardinality: one.
  • All attribute values must be present in the same transaction? Not strictly necessary…
  • Tables cannot be modified after creation.

For example, we might denote a page visit like this:

{:db/ident :page/visitTime
 :db/valueType :db.type/instant
 :db/cardinality :db.cardinality/one
 :db/index true}
{:db/ident :page/visitDevice
 :db/valueType :db.type/string
 :db/cardinality :db.cardinality/one}
{:db/ident :page/visitType
 :db/valueType :db.type/keyword
 :db/cardinality :db.cardinality/one}

{:table "visits"
 :columns [:page/visitTime :page/visitDevice :page/visitType]}

and we expect that to produce a single table and index pair:

CREATE TABLE datom_table_visits 
(e INTEGER NOT NULL,
 pageVisitTime INTEGER,    -- No tag needed!
 pageVisitDevice TEXT,
 pageVisitType TEXT);

CREATE INDEX datom_table_visits_pageVisitTime ON datom_table_visits(pageVisitTime);
CREATE INDEX datom_table_visits_e ON datom_table_visits(e);

and then queries like

[:find ?page ?type :in $ :where
 [?page :page/visitTime ?time]
 [(> ?time 1234567890)]
 [?page :page/visitDevice "abcdefg"]
 [?page :page/visitType ?type]]

would turn into

SELECT d123.e AS page, d123.pageVisitType AS type
FROM datom_table_visits AS d123
WHERE d123.pageVisitTime > 1234567890 AND
      d123.pageVisitDevice = "abcdefg" AND
      d123.pageVisitType IS NOT NULL;        -- NOT NULL no longer implied for these columns.

and churn through Datomish's existing projection pipeline.

Consumers can still make additional non-tabular references to visits, both in transacts and queries, but get the performance benefits of a SQL-like table structure where it makes sense.

We present a tradeoff: - Put all of your data in Datomish, getting expressivity but losing out on compactness and query performance for some kinds of regular data. - Put all of your data in a SQLite database, getting some significant performance improvements at a cost of expressivity. It's not all that feasible to have a consumer manually do both. Information is messy and interconnected, which is one of the motivations for doing this work in the first place! We ourselves split the difference in a couple of areas: for example, we store value tags and values next to each other, rather than being fully normalized. Some consumers can guarantee that some of their data will be of a particular non-sparse shape. We should consider supporting specialized storage for subparts of a schema which puts values into columns, generating query clauses and processing transactions accordingly. Restrictions might be: - Attributes must all be cardinality: one. - All attribute values must be present in the same transaction? Not strictly necessary… - Tables cannot be modified after creation. For example, we might denote a page visit like this: ``` {:db/ident :page/visitTime :db/valueType :db.type/instant :db/cardinality :db.cardinality/one :db/index true} {:db/ident :page/visitDevice :db/valueType :db.type/string :db/cardinality :db.cardinality/one} {:db/ident :page/visitType :db/valueType :db.type/keyword :db/cardinality :db.cardinality/one} {:table "visits" :columns [:page/visitTime :page/visitDevice :page/visitType]} ``` and we expect that to produce a single table and index pair: ``` CREATE TABLE datom_table_visits (e INTEGER NOT NULL, pageVisitTime INTEGER, -- No tag needed! pageVisitDevice TEXT, pageVisitType TEXT); CREATE INDEX datom_table_visits_pageVisitTime ON datom_table_visits(pageVisitTime); CREATE INDEX datom_table_visits_e ON datom_table_visits(e); ``` and then queries like ``` [:find ?page ?type :in $ :where [?page :page/visitTime ?time] [(> ?time 1234567890)] [?page :page/visitDevice "abcdefg"] [?page :page/visitType ?type]] ``` would turn into ``` SELECT d123.e AS page, d123.pageVisitType AS type FROM datom_table_visits AS d123 WHERE d123.pageVisitTime > 1234567890 AND d123.pageVisitDevice = "abcdefg" AND d123.pageVisitType IS NOT NULL; -- NOT NULL no longer implied for these columns. ``` and churn through Datomish's existing projection pipeline. Consumers can still make additional non-tabular references to visits, both in transacts and queries, but get the performance benefits of a SQL-like table structure where it makes sense.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: greg/mentat#72
No description provided.